Skip to content
View udayvimal's full-sized avatar

Block or report udayvimal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
udayvimal/README.md
Uday Vimal

Uday Vimal

Data Analyst & AI Engineer

Typing SVG


LinkedIn GitHub Gmail


πŸ‘¨β€πŸ’» What I Do

I work across the full analytics stack β€” translating raw, messy data into decisions that drive business outcomes.

Area What I Deliver
πŸ“Š Product & Business Analytics Funnel analysis, A/B experimentation, user segmentation, churn modeling
πŸ“£ Marketing & Campaign Analytics Incrementality testing, ROI measurement, A/B significance pipelines, influencer ROI
πŸ—„ SQL & Data Engineering Complex queries (CTEs, window functions, JOINs), ETL pipelines, PySpark, Hive
πŸ“ˆ BI & Dashboards Self-service Tableau / Power BI dashboards built for business stakeholders
πŸ€– AI & LLM Systems LLM-powered reporting automation, agentic workflows, n8n pipelines

Currently open to Product Analytics / Business Analytics / Marketing Analytics roles


πŸ† Impact at a Glance

πŸ“Š 15,000+ πŸ’° 18.2% πŸ›’ 40–63% πŸ€– 6+
UPI transactions analyzed & fraud-patterned Gender pay gap quantified across 8K Indian tech records Delivery fee surge exposed on rain days LLM agents built end-to-end
⚑ 13M+ πŸ“‰ 51pp 🎯 137.8% πŸ• 30s
Incrementality test records processed in PySpark Retention gap exposed after a single failed fintech transaction Portfolio ROI on influencer campaign tracker Executive reporting cycle cut from 2–4 hrs to 30 sec with LLM

πŸ›  Skills

Languages & Query

Python SQL PySpark JavaScript

Analytics & BI

Tableau Power BI Google Sheets Looker

Data Engineering

Apache Airflow Apache Hive PostgreSQL Docker FastAPI

ML & AI

scikit-learn TensorFlow PyTorch OpenAI Groq CrewAI

AI Dev Tools

Cursor Claude Code n8n


πŸ“Š GitHub Stats


πŸ“š Table of Contents


πŸ“£ Marketing & Campaign Analytics

Business Problem: How do you measure what a campaign actually caused β€” not just what happened during it?

End-to-end marketing analytics pipelines: incrementality testing, A/B significance at scale, influencer ROI, and LLM-powered reporting β€” built with PySpark, Hive, pandas, and Groq.

Project Problem Solved Tools Key Result
πŸ“ˆ Acquisition Campaign Incrementality & ROI Separate true campaign lift from baseline conversion noise across 13M+ test records PySpark, Hive, chi-square, t-test Ranked targeting segments by verified incremental ROI; isolated statistically significant lift vs organic baseline
πŸ§ͺ A/B Test Statistical Significance Pipeline Run concurrent marketing experiments and auto-rank results: scale, stop, or inconclusive PySpark, chi-square, t-test, logistic regression 9/10 experiments reached significance; EXP_005 showed a negative effect (βˆ’0.4pp) flagged for immediate stop
πŸ€– AI Business Reporting & Sentiment Analysis Cut 2–4 hour manual reporting cycles down to 30 seconds using LLM automation Python, pandas, Groq (Llama 3.3 70B), n8n Executive summary + VoC sentiment report auto-generated from raw sales CSV; 40% positive, 30% negative sentiment surfaced
πŸ“Έ Influencer Campaign Performance Tracker Measure reach, engagement, and ROI across 40 influencer partnerships and flag who to scale or drop Python, pandas, Groq LLM, matplotlib 137.8% portfolio ROI, 1.65x ROMI; Groq LLM auto-flagged 11 SCALE and 12 DROP partnerships

🎯 Product Analytics

Business Problem: Where in the user journey are you losing people, and what does the data say you should fix?

Quantitative funnel analysis combined with structured qualitative synthesis β€” demonstrating how product analysts translate user pain points into prioritised, data-backed recommendations.

Project Problem Solved Tools Key Finding
πŸ’³ Fintech User Funnel Analytics Identify where a youth payments app (ages 11–19) loses users and which transactions fail most Python, pandas, matplotlib, seaborn 51pp retention gap after a failed first transaction (92% vs 41%); first-time UPI failure rate 30.7% vs 9.1% repeat; 3 prioritised product recommendations with success metrics

βš™οΈ Data Engineering

Business Problem: How do you reliably move, transform, and serve data at scale without manual intervention?

Automated pipelines built with Airflow, PostgreSQL, Docker, and FastAPI β€” containerised and production-ready.

Project Problem Solved Tools Key Result
🌦 Weather Data ETL Pipeline Manual weather data collection was slow and error-prone Python, Airflow, PostgreSQL, Docker, Open-Meteo API Fully automated daily ingestion pipeline β€” zero manual effort, containerised for one-command deployment
πŸ§₯ Apparel Analytics Hub Apparel retail data sat siloed with no aggregated reporting Python, Airflow, Flask Modular DAG-driven ETL serving aggregated retail metrics through a Flask web app
🧠 AI Context Engine AI conversation context was lost between browser sessions FastAPI, Chrome Extension, WebSockets Full-stack extension capturing and restoring full AI session context across sessions

πŸ—„ SQL

Business Problem: How do you extract decision-relevant signals from complex, multi-table datasets?

Advanced SQL across fraud detection, pay equity, consumer pricing, and job market analysis β€” using CTEs, window functions, and complex JOINs.

Project Problem Solved Approach Key Finding
πŸ’‘ 8-Week SQL Challenge Demonstrate SQL depth across 8 real-world business scenarios CTEs, window functions, cohort analysis, KPI reporting Solved all 8 case studies from #8WeekSQLChallenge
πŸ’Ό Job Market Analytics Which skills, geographies, and salary bands have the highest demand? YoY/MoM trend comparisons, window functions Tracked skill demand shifts with automated Tableau refresh
πŸ’³ UPI Fraud Analysis What transaction patterns predict fraud in UPI payments? 5 anomaly detection rules across 15,000 transactions Late-night transactions: 19.2% fraud rate vs 0.8% mornings; amounts near β‚Ή4,999/β‚Ή9,999 show 28–31% fraud rates
πŸ’° Indian Salary Gap Analysis How large is the gender pay gap in Indian tech, and what drives it? Segmentation by role, city, company type across 8,000 records 18.2% gender pay gap β€” Bangalore commands 32–35% city premium; MNCs pay 40.3% more than Indian corporates
πŸ” Swiggy Dark Patterns Are food delivery platforms using pricing dark patterns against consumers? Controlled comparisons isolating weather, time, and demand as pricing levers 40–63% delivery fee surge during rain; weekend fees 24.9% higher β€” costing users β‚Ή864/year

🐍 Python & EDA

Business Problem: What patterns, anomalies, and segments hide in raw data that descriptive stats alone won't surface?

Exploratory analysis tackling user behavior, competitive pricing, workforce analytics, and environmental prediction.

Project Problem Solved Approach Key Insight
πŸ’³ Fintech User Funnel Analytics Where does a youth fintech app lose users, and which segments churn hardest? Funnel drop-off analysis, cohort split (11–14 vs 15–19), failure-rate segmentation 51pp retention cliff after failed first transaction; age 15–19 transacts at 2.3Γ— the value of 11–14 cohort
πŸ“Έ Influencer Campaign Tracker Which of 40 influencer partnerships are worth scaling, and which should be cut? ROI, ROMI, CPA, engagement rate computed per influencer; LLM-generated recommendations Identified 11 SCALE and 12 DROP creators; TikTok macro tier at 8.3x ROMI, Instagram macro at sub-1x
🌱 HR Analytics What drives employee attrition, and where are the pay inequities? Segmented 8,950 employees by education, department, and performance Education strongly predicts earnings: high school β‚Ή62K avg β†’ PhD β‚Ή92K; attrition clustered in specific department-seniority bands
πŸ›’ Zepto vs Blinkit Which quick-commerce platform offers better value, and where does each dominate? Compared pricing, discounts, delivery times across 5,000+ products Zepto 11.8 min vs Blinkit 16.5 min delivery; β‚Ή34.94 vs β‚Ή17.52 avg discount β€” picked up 6,000+ LinkedIn views
πŸ’³ UPI Fraud Analysis Which user behavioural patterns signal fraudulent UPI activity? Built 5 rule-based fraud signals from 15K transactions Time-of-day, recipient novelty, retry behaviour, and amount clustering are strongest fraud predictors
πŸ’° Indian Salary Gap Analysis How do role, seniority, city, and company type compound the pay gap? Statistical breakdown across 8,000 Indian tech compensation records MNCs pay 40.3% more; seniority and city amplify the gender gap significantly
πŸ” Swiggy Dark Patterns How much extra are consumers paying due to platform-driven pricing manipulation? Isolated weather, time-of-day, and demand as independent pricing levers r = βˆ’0.71 correlation between delivery time and satisfaction; users overcharged β‚Ή864/year via opaque fees
🌫 AQI Prediction Can air quality be predicted spatially from vegetation and geography data? Spatial interpolation across geographic regions Interactive HTML map visualising AQI vs vegetation β€” surfaced high-risk zones

πŸ€– Machine Learning

Business Problem: Can we build predictive systems that act on patterns too complex for manual rules?

End-to-end ML projects covering classification, regression, NLP, computer vision, and deployed systems.

Project Problem Solved Model / Approach Result
πŸ€– ML Portfolio Demonstrate end-to-end ML across 10 diverse business domains Logistic regression, CNN, YOLO, TF-IDF, collaborative filtering 10 complete projects: churn, vision, NLP, forecasting, recommendation
πŸ” User Authentication ML System How can we detect anomalous login behaviour in real time? Decision tree + JWT auth, deployed on Render + Netlify Production-deployed full-stack app with real-time ML inference
🏨 Hotel Booking Prediction Which bookings are likely to cancel, and when? Logistic regression + tree-based models on booking records Enabled proactive demand management by predicting cancellation probability
🎬 Movie Recommendation System How do you personalise content recommendations at scale? TF-IDF vectorisation + cosine similarity on plot/genre metadata Content-based recommendations without user history dependency
πŸ“° Fake News Predictor Can NLP distinguish misinformation from legitimate news? TF-IDF + ML classifier on labelled article datasets Classifier trained to detect misinformation across news topics
πŸ“Ί YouTube Ads View Prediction Which engagement signals best predict ad view volume? Regression on likes, comments, shares Quantified the relationship between engagement metrics and ad performance
πŸ“· Real-Time Object Detection Can we detect multiple object classes in a live video stream? YOLOv8 with custom YAML-defined classes Real-time detection pipeline with configurable object categories

🧠 AI & LLM Agents

Business Problem: Which repetitive analytical and reasoning workflows can be handed off to autonomous AI agents?

Agentic systems built with CrewAI, OpenAI, Groq, and FastAPI β€” developed using Cursor and Claude Code.

Project Problem Solved Tools Key Outcome
πŸ“Š AI Business Reporting & Sentiment Auto-generate executive summaries and VoC sentiment reports from raw sales + review CSVs Python, pandas, Groq Llama 3.3 70B, n8n 2–4 hour manual reporting cycle β†’ under 30 seconds; real Groq output committed as sample evidence
πŸ“Έ Influencer Campaign Tracker Auto-flag 40 influencer partnerships as SCALE / MONITOR / OPTIMIZE / DROP using live campaign data Python, pandas, Groq LLM, matplotlib Portfolio ROI 137.8%, 1.65x ROMI; LLM-generated strategic recommendations with data grounding
🧠 LLM Portfolio Build a diverse suite of production-grade agentic AI systems CrewAI, OpenAI, Gemini, HuggingFace, FastAPI, Gradio 6 agents: medical vision+voice, finance analyst, image recognition, symptom analysis
🩺 Blood Test Analyser Automate interpretation of blood test PDFs into plain-English summaries CrewAI, OpenAI, Python Multi-agent system reading PDF biomarkers and generating patient-friendly medical summaries
πŸ“° Financial News Dashboard Surface the most relevant financial news from a high-volume feed Python, Streamlit, FAISS, Embeddings Semantic similarity search over embedded news articles
πŸ“ Text Summarizer Automate extractive and abstractive summarisation with CI/CD Python, Docker, GitHub Actions Containerised pipeline with automated deployment via GitHub Actions
🧠 AI Context Engine Prevent AI conversation context from being lost between sessions FastAPI, Chrome Extension, WebSockets Full-stack browser tool that captures and restores AI session history across tabs

πŸ“ˆ Tableau & Dashboards

Business Problem: How do you turn processed data into self-service tools that non-technical stakeholders can act on?

Interactive dashboards built for HR, job markets, and quick-commerce intelligence.

Project Problem Solved Key Metrics Tracked Links
πŸ‘₯ HR Analytics Dashboard Give HR teams a single view of workforce health across 8,950 employees Headcount, attrition rate, gender distribution, department performance, education-pay correlation Dashboard
πŸ›’ Zepto vs Blinkit Help consumers and analysts compare platform pricing and delivery performance Category dominance, discount patterns, pricing elasticity, delivery speed Dashboard Β· BI Presentation
πŸ’Ό Job Market Analytics Track which skills, roles, and geographies are growing or declining in demand Skill demand trends, salary comparisons, geographic breakdown, YoY growth Coming Soon

πŸ€– How I Build

I use Cursor and Claude Code as core development tools β€” enabling faster prototyping, automated code review, and AI-assisted debugging. I also use n8n for workflow automation and Google Sheets for quick ad-hoc analytics and stakeholder-ready reporting.


πŸ“¬ Let's Connect

Open to Product Analytics, Marketing Analytics, Business Analytics, and Data Analyst roles.

LinkedIn Gmail

Feel free to explore any project β€” each one has a real business problem behind it.

Pinned Loading

  1. shl shl Public

    Python

  2. ab-test-significance-pipeline ab-test-significance-pipeline Public

    Automated PySpark + Hive pipeline for hypothesis testing (chi-square, t-test, regression) across concurrent marketing experiments at scale.

    Python 1

  3. acquisition-campaign-incrementality-roi acquisition-campaign-incrementality-roi Public

    PySpark + Hive pipeline measuring campaign incrementality, statistical significance, and ROI on 13M+ user incrementality-test records (Criteo dataset).

    Python 1