SAKSHI CHAVAN Sakshi3027

Hi, I'm Sakshi Chavan 👋

Software Engineer | Data Scientist | ML Engineer | AI Engineer

Building production ML systems, scalable APIs, and cloud-native AI applications. I specialize in designing end-to-end machine learning systems from data pipelines and feature engineering to model deployment, monitoring, and real-time analytics.

Tech Stack

Languages

AI · ML · Data

Full-Stack Engineering

Cloud · Databases · DevOps

Data Engineering · Big Data

📊 GitHub Activity

Featured Projects

SaaS Churn Prediction System

Complete ML system predicting SaaS customer churn with 91-94% ROC-AUC. Features: 10K customers, 60+ engineered features, XGBoost model, multi-horizon predictions (30/60/90d), ROI-driven retention strategies, REST API (FastAPI), interactive dashboard (Streamlit), Docker deployment. $1.1M revenue at risk identified.
🔗 https://github.com/Sakshi3027/saas-churn-prediction

Real-Time Advertising Analytics Platform

Engineered a distributed big data analytics platform processing 100,000+ advertising impressions using Hadoop MapReduce and Apache Kafka, simulating PubMatic's real-world SSP infrastructure for publisher revenue optimization and fraud detection
Developed 4 custom MapReduce jobs in Java analyzing publisher revenue ($2,456 total), device performance (Mobile 46%, Desktop 46%, Tablet 8%), campaign ROI (650 profitable campaigns), and fraud patterns (47 suspicious users identified with 5% fraud rate)
Implemented real-time event streaming pipeline using Apache Kafka processing 10 events/second with live analytics dashboard, deployed on 13-container Docker cluster with HDFS distributed storage and YARN resource management
Created interactive Tableau Public dashboard with 7 visualizations including revenue analysis, CTR correlation matrix, device breakdown, and fraud detection patterns, featuring dynamic filtering and drill-down capabilities across 50 publishers
Designed fraud detection algorithm identifying suspicious patterns (click spam, impression flooding, multi-publisher bots) with pattern recognition achieving 8% click concentration from 0.5% of users, saving estimated $85/month in ad fraud 🔗 https://github.com/Sakshi3027/PubMatic-Ad-Analytics

E-commerce A/B Testing Platform

Developed a full-stack A/B testing platform analyzing 40K+ users, leveraging Bayesian/Frequentist methods and 100K+ Monte Carlo simulations to quantify revenue impact ($125K+).
Automated detection of statistical pitfalls (Simpson's paradox, peeking bias, multiple testing) with interactive dashboards in Streamlit and Plotly. 🔗 https://github.com/Sakshi3027/ecommerce-ab-testing-platform

Natural Language Analyst (NL → SQL App)

Developed and deployed full-stack web application enabling users to query CSV data using natural language, serving 100+ demo users
Engineered pattern-matching algorithm converting English questions to SQL, achieving 95%+ query accuracy
Implemented RESTful API with Express.js and SQLite for real-time data processing and analysis
Integrated Chart.js for dynamic data visualizations (bar/pie charts) with responsive frontend design
Deployed on Railway with CI/CD pipeline, automatic builds on Git push 🔗 https://github.com/Sakshi3027/natural-language-analyst

Semantic Video Search Engine

Built an end-to-end semantic video search system enabling natural language search across video content, processing 14-min videos in ~95s using faster-Whisper transcription, sentence-transformers embeddings (384-dim), and Pinecone vector database with cosine similarity search returning results in under 600ms.
Engineered a production REST API with FastAPI featuring 8 endpoints, async video processing via Celery + Redis job queue, PostgreSQL metadata storage, and free cross-encoder re-ranking pipeline (cross-encoder/ms-marco-MiniLM-L-6-v2) with automatic query expansion achieving zero inference cost.
Deployed full stack on GCP Compute Engine using Docker Compose orchestrating 5 containers; built Next.js frontend deployed on Vercel and Streamlit analytics dashboard with auto-chapter generation and YouTube timestamp deep links. 🔗 https://github.com/Sakshi3027/semantic-video-search

CloudCart Analytics Platform

Built end-to-end microservices e-commerce platform with 4 services processing real-time transactions for 100+ users, implementing event-driven architecture via Apache Kafka achieving <1s order-to-analytics latency and 85% Redis cache hit ratio.
Engineered ML recommendation engine using scikit-learn collaborative filtering on PostgreSQL order history, computing product similarity matrices with cosine similarity to deliver personalized suggestions in <50ms; integrated with ClickHouse OLAP for 10x faster analytics queries.
Deployed production infrastructure with Docker Compose managing 11 containers (databases, Kafka, Redis); developed React dashboard with real-time sales visualization, established GitHub Actions CI/CD with automated testing and security scanning (CodeQL, Trivy). 🔗 https://github.com/Sakshi3027/cloudcart-analytics-platform

Experience

Research Assistant – I worked with a faculty member to make ML experiments run faster. I cleaned up the data pipelines and automated repetitive steps, which cut down experimentation time by about 30%.
Teaching Assistant – I ran the lab sessions for an ML course around 60 students. I helped them work through the hands-on parts and graded their assignments.
Data Analyst Intern – I worked with healthcare data, Medicare and Medicaid claims. I wrote SQL queries to clean and validate the data, then built models and dashboards in Python to help the business make better decisions.
Data Science Intern – I built a model that predicted which customers were most likely to buy, and segmented them so the marketing team could target them better. That ended up increasing conversion rates by 20%.

Currently Exploring

Let's Connect

📧 sakshchavan30@gmail.com

"The goal is to turn data into information, and information into insight." — Carly Fiorina

Provide feedback

Saved searches

Use saved searches to filter your results more quickly