udayvimal

Uday Vimal

Data Analyst & AI Engineer

👨‍💻 What I Do

I work across the full analytics stack — translating raw, messy data into decisions that drive business outcomes.

Area	What I Deliver
📊 Product & Business Analytics	Funnel analysis, A/B experimentation, user segmentation, churn modeling
📣 Marketing & Campaign Analytics	Incrementality testing, ROI measurement, A/B significance pipelines, influencer ROI
🗄 SQL & Data Engineering	Complex queries (CTEs, window functions, JOINs), ETL pipelines, PySpark, Hive
📈 BI & Dashboards	Self-service Tableau / Power BI dashboards built for business stakeholders
🤖 AI & LLM Systems	LLM-powered reporting automation, agentic workflows, n8n pipelines

Currently open to Product Analytics / Business Analytics / Marketing Analytics roles

🏆 Impact at a Glance

📊 15,000+	💰 18.2%	🛒 40–63%	🤖 6+
UPI transactions analyzed & fraud-patterned	Gender pay gap quantified across 8K Indian tech records	Delivery fee surge exposed on rain days	LLM agents built end-to-end

⚡ 13M+	📉 51pp	🎯 137.8%	🕐 30s
Incrementality test records processed in PySpark	Retention gap exposed after a single failed fintech transaction	Portfolio ROI on influencer campaign tracker	Executive reporting cycle cut from 2–4 hrs to 30 sec with LLM

🛠 Skills

Languages & Query

Analytics & BI

Data Engineering

ML & AI

AI Dev Tools

📊 GitHub Stats

📣 Marketing & Campaign Analytics

Business Problem: How do you measure what a campaign actually caused — not just what happened during it?

End-to-end marketing analytics pipelines: incrementality testing, A/B significance at scale, influencer ROI, and LLM-powered reporting — built with PySpark, Hive, pandas, and Groq.

Project	Problem Solved	Tools	Key Result
📈 Acquisition Campaign Incrementality & ROI	Separate true campaign lift from baseline conversion noise across 13M+ test records	PySpark, Hive, chi-square, t-test	Ranked targeting segments by verified incremental ROI; isolated statistically significant lift vs organic baseline
🧪 A/B Test Statistical Significance Pipeline	Run concurrent marketing experiments and auto-rank results: scale, stop, or inconclusive	PySpark, chi-square, t-test, logistic regression	9/10 experiments reached significance; EXP_005 showed a negative effect (−0.4pp) flagged for immediate stop
🤖 AI Business Reporting & Sentiment Analysis	Cut 2–4 hour manual reporting cycles down to 30 seconds using LLM automation	Python, pandas, Groq (Llama 3.3 70B), n8n	Executive summary + VoC sentiment report auto-generated from raw sales CSV; 40% positive, 30% negative sentiment surfaced
📸 Influencer Campaign Performance Tracker	Measure reach, engagement, and ROI across 40 influencer partnerships and flag who to scale or drop	Python, pandas, Groq LLM, matplotlib	137.8% portfolio ROI, 1.65x ROMI; Groq LLM auto-flagged 11 SCALE and 12 DROP partnerships

🎯 Product Analytics

Business Problem: Where in the user journey are you losing people, and what does the data say you should fix?

Quantitative funnel analysis combined with structured qualitative synthesis — demonstrating how product analysts translate user pain points into prioritised, data-backed recommendations.

Project	Problem Solved	Tools	Key Finding
💳 Fintech User Funnel Analytics	Identify where a youth payments app (ages 11–19) loses users and which transactions fail most	Python, pandas, matplotlib, seaborn	51pp retention gap after a failed first transaction (92% vs 41%); first-time UPI failure rate 30.7% vs 9.1% repeat; 3 prioritised product recommendations with success metrics

⚙️ Data Engineering

Business Problem: How do you reliably move, transform, and serve data at scale without manual intervention?

Automated pipelines built with Airflow, PostgreSQL, Docker, and FastAPI — containerised and production-ready.

Project	Problem Solved	Tools	Key Result
🌦 Weather Data ETL Pipeline	Manual weather data collection was slow and error-prone	Python, Airflow, PostgreSQL, Docker, Open-Meteo API	Fully automated daily ingestion pipeline — zero manual effort, containerised for one-command deployment
🧥 Apparel Analytics Hub	Apparel retail data sat siloed with no aggregated reporting	Python, Airflow, Flask	Modular DAG-driven ETL serving aggregated retail metrics through a Flask web app
🧠 AI Context Engine	AI conversation context was lost between browser sessions	FastAPI, Chrome Extension, WebSockets	Full-stack extension capturing and restoring full AI session context across sessions

🗄 SQL

Business Problem: How do you extract decision-relevant signals from complex, multi-table datasets?

Advanced SQL across fraud detection, pay equity, consumer pricing, and job market analysis — using CTEs, window functions, and complex JOINs.

Project	Problem Solved	Approach	Key Finding
💡 8-Week SQL Challenge	Demonstrate SQL depth across 8 real-world business scenarios	CTEs, window functions, cohort analysis, KPI reporting	Solved all 8 case studies from #8WeekSQLChallenge
💼 Job Market Analytics	Which skills, geographies, and salary bands have the highest demand?	YoY/MoM trend comparisons, window functions	Tracked skill demand shifts with automated Tableau refresh
💳 UPI Fraud Analysis	What transaction patterns predict fraud in UPI payments?	5 anomaly detection rules across 15,000 transactions	Late-night transactions: 19.2% fraud rate vs 0.8% mornings; amounts near ₹4,999/₹9,999 show 28–31% fraud rates
💰 Indian Salary Gap Analysis	How large is the gender pay gap in Indian tech, and what drives it?	Segmentation by role, city, company type across 8,000 records	18.2% gender pay gap — Bangalore commands 32–35% city premium; MNCs pay 40.3% more than Indian corporates
🍔 Swiggy Dark Patterns	Are food delivery platforms using pricing dark patterns against consumers?	Controlled comparisons isolating weather, time, and demand as pricing levers	40–63% delivery fee surge during rain; weekend fees 24.9% higher — costing users ₹864/year

🐍 Python & EDA

Business Problem: What patterns, anomalies, and segments hide in raw data that descriptive stats alone won't surface?

Exploratory analysis tackling user behavior, competitive pricing, workforce analytics, and environmental prediction.

Project	Problem Solved	Approach	Key Insight
💳 Fintech User Funnel Analytics	Where does a youth fintech app lose users, and which segments churn hardest?	Funnel drop-off analysis, cohort split (11–14 vs 15–19), failure-rate segmentation	51pp retention cliff after failed first transaction; age 15–19 transacts at 2.3× the value of 11–14 cohort
📸 Influencer Campaign Tracker	Which of 40 influencer partnerships are worth scaling, and which should be cut?	ROI, ROMI, CPA, engagement rate computed per influencer; LLM-generated recommendations	Identified 11 SCALE and 12 DROP creators; TikTok macro tier at 8.3x ROMI, Instagram macro at sub-1x
🌱 HR Analytics	What drives employee attrition, and where are the pay inequities?	Segmented 8,950 employees by education, department, and performance	Education strongly predicts earnings: high school ₹62K avg → PhD ₹92K; attrition clustered in specific department-seniority bands
🛒 Zepto vs Blinkit	Which quick-commerce platform offers better value, and where does each dominate?	Compared pricing, discounts, delivery times across 5,000+ products	Zepto 11.8 min vs Blinkit 16.5 min delivery; ₹34.94 vs ₹17.52 avg discount — picked up 6,000+ LinkedIn views
💳 UPI Fraud Analysis	Which user behavioural patterns signal fraudulent UPI activity?	Built 5 rule-based fraud signals from 15K transactions	Time-of-day, recipient novelty, retry behaviour, and amount clustering are strongest fraud predictors
💰 Indian Salary Gap Analysis	How do role, seniority, city, and company type compound the pay gap?	Statistical breakdown across 8,000 Indian tech compensation records	MNCs pay 40.3% more; seniority and city amplify the gender gap significantly
🍔 Swiggy Dark Patterns	How much extra are consumers paying due to platform-driven pricing manipulation?	Isolated weather, time-of-day, and demand as independent pricing levers	r = −0.71 correlation between delivery time and satisfaction; users overcharged ₹864/year via opaque fees
🌫 AQI Prediction	Can air quality be predicted spatially from vegetation and geography data?	Spatial interpolation across geographic regions	Interactive HTML map visualising AQI vs vegetation — surfaced high-risk zones

🤖 Machine Learning

Business Problem: Can we build predictive systems that act on patterns too complex for manual rules?

End-to-end ML projects covering classification, regression, NLP, computer vision, and deployed systems.

Project	Problem Solved	Model / Approach	Result
🤖 ML Portfolio	Demonstrate end-to-end ML across 10 diverse business domains	Logistic regression, CNN, YOLO, TF-IDF, collaborative filtering	10 complete projects: churn, vision, NLP, forecasting, recommendation
🔐 User Authentication ML System	How can we detect anomalous login behaviour in real time?	Decision tree + JWT auth, deployed on Render + Netlify	Production-deployed full-stack app with real-time ML inference
🏨 Hotel Booking Prediction	Which bookings are likely to cancel, and when?	Logistic regression + tree-based models on booking records	Enabled proactive demand management by predicting cancellation probability
🎬 Movie Recommendation System	How do you personalise content recommendations at scale?	TF-IDF vectorisation + cosine similarity on plot/genre metadata	Content-based recommendations without user history dependency
📰 Fake News Predictor	Can NLP distinguish misinformation from legitimate news?	TF-IDF + ML classifier on labelled article datasets	Classifier trained to detect misinformation across news topics
📺 YouTube Ads View Prediction	Which engagement signals best predict ad view volume?	Regression on likes, comments, shares	Quantified the relationship between engagement metrics and ad performance
📷 Real-Time Object Detection	Can we detect multiple object classes in a live video stream?	YOLOv8 with custom YAML-defined classes	Real-time detection pipeline with configurable object categories

🧠 AI & LLM Agents

Business Problem: Which repetitive analytical and reasoning workflows can be handed off to autonomous AI agents?

Agentic systems built with CrewAI, OpenAI, Groq, and FastAPI — developed using Cursor and Claude Code.

Project	Problem Solved	Tools	Key Outcome
📊 AI Business Reporting & Sentiment	Auto-generate executive summaries and VoC sentiment reports from raw sales + review CSVs	Python, pandas, Groq Llama 3.3 70B, n8n	2–4 hour manual reporting cycle → under 30 seconds; real Groq output committed as sample evidence
📸 Influencer Campaign Tracker	Auto-flag 40 influencer partnerships as SCALE / MONITOR / OPTIMIZE / DROP using live campaign data	Python, pandas, Groq LLM, matplotlib	Portfolio ROI 137.8%, 1.65x ROMI; LLM-generated strategic recommendations with data grounding
🧠 LLM Portfolio	Build a diverse suite of production-grade agentic AI systems	CrewAI, OpenAI, Gemini, HuggingFace, FastAPI, Gradio	6 agents: medical vision+voice, finance analyst, image recognition, symptom analysis
🩺 Blood Test Analyser	Automate interpretation of blood test PDFs into plain-English summaries	CrewAI, OpenAI, Python	Multi-agent system reading PDF biomarkers and generating patient-friendly medical summaries
📰 Financial News Dashboard	Surface the most relevant financial news from a high-volume feed	Python, Streamlit, FAISS, Embeddings	Semantic similarity search over embedded news articles
📝 Text Summarizer	Automate extractive and abstractive summarisation with CI/CD	Python, Docker, GitHub Actions	Containerised pipeline with automated deployment via GitHub Actions
🧠 AI Context Engine	Prevent AI conversation context from being lost between sessions	FastAPI, Chrome Extension, WebSockets	Full-stack browser tool that captures and restores AI session history across tabs

📈 Tableau & Dashboards

Business Problem: How do you turn processed data into self-service tools that non-technical stakeholders can act on?

Interactive dashboards built for HR, job markets, and quick-commerce intelligence.

Project	Problem Solved	Key Metrics Tracked	Links
👥 HR Analytics Dashboard	Give HR teams a single view of workforce health across 8,950 employees	Headcount, attrition rate, gender distribution, department performance, education-pay correlation	Dashboard
🛒 Zepto vs Blinkit	Help consumers and analysts compare platform pricing and delivery performance	Category dominance, discount patterns, pricing elasticity, delivery speed	Dashboard · BI Presentation
💼 Job Market Analytics	Track which skills, roles, and geographies are growing or declining in demand	Skill demand trends, salary comparisons, geographic breakdown, YoY growth	Coming Soon

🤖 How I Build

I use Cursor and Claude Code as core development tools — enabling faster prototyping, automated code review, and AI-assisted debugging. I also use n8n for workflow automation and Google Sheets for quick ad-hoc analytics and stakeholder-ready reporting.

📬 Let's Connect

Open to Product Analytics, Marketing Analytics, Business Analytics, and Data Analyst roles.

Feel free to explore any project — each one has a real business problem behind it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly