Skip to content
View Sakshi3027's full-sized avatar
  • Boston, USA

Block or report Sakshi3027

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sakshi3027/README.md

Boston, MA STEM OPT Active No Sponsorship Required Immediately Available


M.S. Data Science B.Tech CS UMass Dartmouth Software Engineering Data Scientist AI Engineer ML Engineering

Hi, I'm Sakshi Chavan πŸ‘‹

Software Engineer | Data Scientist | ML Engineer | AI Engineer

Building production ML systems, scalable APIs, and cloud-native AI applications. I specialize in designing end-to-end machine learning systems from data pipelines and feature engineering to model deployment, monitoring, and real-time analytics.


Tech Stack

Languages

Python JavaScript TypeScript R SQL Bash

AI Β· ML Β· Data

PyTorch TensorFlow scikit-learn XGBoost LangChain HuggingFace MLflow CrewAI Pandas NumPy PySpark

Full-Stack Engineering

React Next.js Node.js FastAPI Flask Streamlit

Cloud Β· Databases Β· DevOps

AWS Azure GCP Docker Kubernetes PostgreSQL MongoDB Redis

Data Engineering Β· Big Data

Spark Hadoop Kafka Airflow Databricks


πŸ“Š GitHub Activity

Sakshi's GitHub Activity Graph

GitHub Streak

Β Β 

Featured Projects

SaaS Churn Prediction System

FastAPI XGBoost Docker Streamlit

  • Complete ML system predicting SaaS customer churn with 91-94% ROC-AUC. Features: 10K customers, 60+ engineered features, XGBoost model, multi-horizon predictions (30/60/90d), ROI-driven retention strategies, REST API (FastAPI), interactive dashboard (Streamlit), Docker deployment. $1.1M revenue at risk identified.
    πŸ”— https://github.com/Sakshi3027/saas-churn-prediction

Real-Time Advertising Analytics Platform

Hadoop Kafka AWS EMR Docker

  • Engineered a distributed big data analytics platform processing 100,000+ advertising impressions using Hadoop MapReduce and Apache Kafka, simulating PubMatic's real-world SSP infrastructure for publisher revenue optimization and fraud detection
  • Developed 4 custom MapReduce jobs in Java analyzing publisher revenue ($2,456 total), device performance (Mobile 46%, Desktop 46%, Tablet 8%), campaign ROI (650 profitable campaigns), and fraud patterns (47 suspicious users identified with 5% fraud rate)
  • Implemented real-time event streaming pipeline using Apache Kafka processing 10 events/second with live analytics dashboard, deployed on 13-container Docker cluster with HDFS distributed storage and YARN resource management
  • Created interactive Tableau Public dashboard with 7 visualizations including revenue analysis, CTR correlation matrix, device breakdown, and fraud detection patterns, featuring dynamic filtering and drill-down capabilities across 50 publishers
  • Designed fraud detection algorithm identifying suspicious patterns (click spam, impression flooding, multi-publisher bots) with pattern recognition achieving 8% click concentration from 0.5% of users, saving estimated $85/month in ad fraud πŸ”— https://github.com/Sakshi3027/PubMatic-Ad-Analytics

E-commerce A/B Testing Platform

Python Bayesian Analysis Streamlit

  • Developed a full-stack A/B testing platform analyzing 40K+ users, leveraging Bayesian/Frequentist methods and 100K+ Monte Carlo simulations to quantify revenue impact ($125K+).
  • Automated detection of statistical pitfalls (Simpson's paradox, peeking bias, multiple testing) with interactive dashboards in Streamlit and Plotly. πŸ”— https://github.com/Sakshi3027/ecommerce-ab-testing-platform

Natural Language Analyst (NL β†’ SQL App)

Node.js SQLite

  • Developed and deployed full-stack web application enabling users to query CSV data using natural language, serving 100+ demo users
  • Engineered pattern-matching algorithm converting English questions to SQL, achieving 95%+ query accuracy
  • Implemented RESTful API with Express.js and SQLite for real-time data processing and analysis
  • Integrated Chart.js for dynamic data visualizations (bar/pie charts) with responsive frontend design
  • Deployed on Railway with CI/CD pipeline, automatic builds on Git push πŸ”— https://github.com/Sakshi3027/natural-language-analyst

Semantic Video Search Engine

Python Whisper FastAPI Pinecone Next.js Docker GCP

  • Built an end-to-end semantic video search system enabling natural language search across video content, processing 14-min videos in ~95s using faster-Whisper transcription, sentence-transformers embeddings (384-dim), and Pinecone vector database with cosine similarity search returning results in under 600ms.
  • Engineered a production REST API with FastAPI featuring 8 endpoints, async video processing via Celery + Redis job queue, PostgreSQL metadata storage, and free cross-encoder re-ranking pipeline (cross-encoder/ms-marco-MiniLM-L-6-v2) with automatic query expansion achieving zero inference cost.
  • Deployed full stack on GCP Compute Engine using Docker Compose orchestrating 5 containers; built Next.js frontend deployed on Vercel and Streamlit analytics dashboard with auto-chapter generation and YouTube timestamp deep links. πŸ”— https://github.com/Sakshi3027/semantic-video-search

CloudCart Analytics Platform

Node.js Python React Kafka ClickHouse scikit-learn Docker AWS

  • Built end-to-end microservices e-commerce platform with 4 services processing real-time transactions for 100+ users, implementing event-driven architecture via Apache Kafka achieving <1s order-to-analytics latency and 85% Redis cache hit ratio.
  • Engineered ML recommendation engine using scikit-learn collaborative filtering on PostgreSQL order history, computing product similarity matrices with cosine similarity to deliver personalized suggestions in <50ms; integrated with ClickHouse OLAP for 10x faster analytics queries.
  • Deployed production infrastructure with Docker Compose managing 11 containers (databases, Kafka, Redis); developed React dashboard with real-time sales visualization, established GitHub Actions CI/CD with automated testing and security scanning (CodeQL, Trivy). πŸ”— https://github.com/Sakshi3027/cloudcart-analytics-platform

Experience

  • Research Assistant – I worked with a faculty member to make ML experiments run faster. I cleaned up the data pipelines and automated repetitive steps, which cut down experimentation time by about 30%.
  • Teaching Assistant – I ran the lab sessions for an ML course around 60 students. I helped them work through the hands-on parts and graded their assignments.
  • Data Analyst Intern – I worked with healthcare data, Medicare and Medicaid claims. I wrote SQL queries to clean and validate the data, then built models and dashboards in Python to help the business make better decisions.
  • Data Science Intern – I built a model that predicted which customers were most likely to buy, and segmented them so the marketing team could target them better. That ended up increasing conversion rates by 20%.

Currently Exploring

Explainable AI LLMs & Agents Real-time ML Data Engineering


Let's Connect

πŸ“§ sakshchavan30@gmail.com


"The goal is to turn data into information, and information into insight." β€” Carly Fiorina

Pinned Loading

  1. PubMatic-Ad-Analytics PubMatic-Ad-Analytics Public

    Real-Time Programmatic Advertising Analytics using Hadoop MapReduce, Kafka, and Python.

    Python 1

  2. cloudcart-analytics-platform cloudcart-analytics-platform Public

    Production-grade microservices e-commerce platform with real-time analytics | Node.js | Python | Kafka | Kubernetes

    JavaScript 1

  3. ecommerce-ab-testing-platform ecommerce-ab-testing-platform Public

    Comprehensive A/B testing framework with statistical rigor and business impact analysis

    Python 1

  4. saas-churn-prediction saas-churn-prediction Public

    End-to-end SaaS churn prediction and retention analytics pipeline. Interpretable ML models, feature engineering, and Streamlit dashboard.

    Python 1

  5. fleet-reliability-pipeline fleet-reliability-pipeline Public

    End-to-end EV fleet reliability ETL pipeline : PostgreSQL, Airflow, Prophet forecasting, Streamlit dashboard

    Python 1

  6. pit-wall pit-wall Public

    🏎️ AI-powered F1 intelligence platform live 2026 standings, XGBoost race predictions, Monte Carlo championship simulator & AI Race Engineer chatbot. Built with Python, Streamlit & Gemini.

    Python 2