Software Engineer | Data Scientist | ML Engineer | AI Engineer
Building production ML systems, scalable APIs, and cloud-native AI applications. I specialize in designing end-to-end machine learning systems from data pipelines and feature engineering to model deployment, monitoring, and real-time analytics.
- Complete ML system predicting SaaS customer churn with 91-94% ROC-AUC. Features: 10K customers, 60+ engineered features, XGBoost model, multi-horizon predictions (30/60/90d), ROI-driven retention strategies, REST API (FastAPI), interactive dashboard (Streamlit), Docker deployment. $1.1M revenue at risk identified.
π https://github.com/Sakshi3027/saas-churn-prediction
- Engineered a distributed big data analytics platform processing 100,000+ advertising impressions using Hadoop MapReduce and Apache Kafka, simulating PubMatic's real-world SSP infrastructure for publisher revenue optimization and fraud detection
- Developed 4 custom MapReduce jobs in Java analyzing publisher revenue ($2,456 total), device performance (Mobile 46%, Desktop 46%, Tablet 8%), campaign ROI (650 profitable campaigns), and fraud patterns (47 suspicious users identified with 5% fraud rate)
- Implemented real-time event streaming pipeline using Apache Kafka processing 10 events/second with live analytics dashboard, deployed on 13-container Docker cluster with HDFS distributed storage and YARN resource management
- Created interactive Tableau Public dashboard with 7 visualizations including revenue analysis, CTR correlation matrix, device breakdown, and fraud detection patterns, featuring dynamic filtering and drill-down capabilities across 50 publishers
- Designed fraud detection algorithm identifying suspicious patterns (click spam, impression flooding, multi-publisher bots) with pattern recognition achieving 8% click concentration from 0.5% of users, saving estimated $85/month in ad fraud π https://github.com/Sakshi3027/PubMatic-Ad-Analytics
- Developed a full-stack A/B testing platform analyzing 40K+ users, leveraging Bayesian/Frequentist methods and 100K+ Monte Carlo simulations to quantify revenue impact ($125K+).
- Automated detection of statistical pitfalls (Simpson's paradox, peeking bias, multiple testing) with interactive dashboards in Streamlit and Plotly. π https://github.com/Sakshi3027/ecommerce-ab-testing-platform
- Developed and deployed full-stack web application enabling users to query CSV data using natural language, serving 100+ demo users
- Engineered pattern-matching algorithm converting English questions to SQL, achieving 95%+ query accuracy
- Implemented RESTful API with Express.js and SQLite for real-time data processing and analysis
- Integrated Chart.js for dynamic data visualizations (bar/pie charts) with responsive frontend design
- Deployed on Railway with CI/CD pipeline, automatic builds on Git push π https://github.com/Sakshi3027/natural-language-analyst
- Built an end-to-end semantic video search system enabling natural language search across video content, processing 14-min videos in ~95s using faster-Whisper transcription, sentence-transformers embeddings (384-dim), and Pinecone vector database with cosine similarity search returning results in under 600ms.
- Engineered a production REST API with FastAPI featuring 8 endpoints, async video processing via Celery + Redis job queue, PostgreSQL metadata storage, and free cross-encoder re-ranking pipeline (cross-encoder/ms-marco-MiniLM-L-6-v2) with automatic query expansion achieving zero inference cost.
- Deployed full stack on GCP Compute Engine using Docker Compose orchestrating 5 containers; built Next.js frontend deployed on Vercel and Streamlit analytics dashboard with auto-chapter generation and YouTube timestamp deep links. π https://github.com/Sakshi3027/semantic-video-search
- Built end-to-end microservices e-commerce platform with 4 services processing real-time transactions for 100+ users, implementing event-driven architecture via Apache Kafka achieving <1s order-to-analytics latency and 85% Redis cache hit ratio.
- Engineered ML recommendation engine using scikit-learn collaborative filtering on PostgreSQL order history, computing product similarity matrices with cosine similarity to deliver personalized suggestions in <50ms; integrated with ClickHouse OLAP for 10x faster analytics queries.
- Deployed production infrastructure with Docker Compose managing 11 containers (databases, Kafka, Redis); developed React dashboard with real-time sales visualization, established GitHub Actions CI/CD with automated testing and security scanning (CodeQL, Trivy). π https://github.com/Sakshi3027/cloudcart-analytics-platform
- Research Assistant β I worked with a faculty member to make ML experiments run faster. I cleaned up the data pipelines and automated repetitive steps, which cut down experimentation time by about 30%.
- Teaching Assistant β I ran the lab sessions for an ML course around 60 students. I helped them work through the hands-on parts and graded their assignments.
- Data Analyst Intern β I worked with healthcare data, Medicare and Medicaid claims. I wrote SQL queries to clean and validate the data, then built models and dashboards in Python to help the business make better decisions.
- Data Science Intern β I built a model that predicted which customers were most likely to buy, and segmented them so the marketing team could target them better. That ended up increasing conversion rates by 20%.
"The goal is to turn data into information, and information into insight." β Carly Fiorina
