๐ Data Science @ UC Berkeley (expected May 2027) ๐ฌ Interested in Machine learning,Data analysis,and building data-driven projects ๐ฑ Currently studying SQL, statistics, and machine learning ๐ซ Open to Data Science / Data Analyst internships
Spam Classifier โ Text classification on the Enron email dataset (~33K emails)
- Compared Baseline LR, TF-IDF + LR, and fine-tuned DistilBERT
- Best F1: 0.9927 (DistilBERT), with a latency vs. accuracy trade-off analysis
- Investigated 9.6% duplicate data as a potential leakage source and re-validated results