Skip to content

Ayushhgit/predictive-maintenance

Repository files navigation

πŸ”§ Predictive Maintenance MLOps Project

CI/CD Pipeline Python 3.10+ License: MIT Code style: black

An end-to-end MLOps project for predicting Remaining Useful Life (RUL) of industrial equipment using the NASA C-MAPSS Turbofan Engine Degradation Dataset. This project demonstrates production-grade ML engineering practices including CI/CD, experiment tracking, model serving, containerization, and monitoring.

🎯 Project Overview

Predictive maintenance uses machine learning to predict when equipment will fail, enabling proactive maintenance scheduling. This project:

  • Predicts RUL (Remaining Useful Life) of turbofan engines
  • Trains multiple models (Random Forest, Gradient Boosting, LSTM, etc.)
  • Tracks experiments with MLflow
  • Serves predictions via REST API
  • Monitors performance through Streamlit dashboard
  • Automates CI/CD with GitHub Actions

Business Value

  • ⬇️ Reduce unplanned downtime by 30-50%
  • πŸ’° Lower maintenance costs through optimized scheduling
  • πŸ“ˆ Extend equipment lifespan with timely interventions

πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           PREDICTIVE MAINTENANCE SYSTEM                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   Data       │───▢│   Data       │───▢│    Data      │───▢│  Model    β”‚ β”‚
β”‚  β”‚  Ingestion   β”‚    β”‚  Validation  β”‚    β”‚Transformationβ”‚    β”‚ Training  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         β”‚                   β”‚                   β”‚                   β”‚       β”‚
β”‚         β”‚                   β”‚                   β”‚                   β–Ό       β”‚
β”‚         β”‚                   β”‚                   β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚         β”‚                   β”‚                   β”‚            β”‚   Model   β”‚  β”‚
β”‚         β”‚                   β”‚                   β”‚            β”‚Evaluation β”‚  β”‚
β”‚         β”‚                   β”‚                   β”‚            β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                   β”‚                   β”‚                  β”‚        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                         MLflow Tracking Server                        β”‚  β”‚
β”‚  β”‚              (Experiments, Parameters, Metrics, Artifacts)            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                  β”‚                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                          Model Registry                               β”‚  β”‚
β”‚  β”‚                    (Versioning, Staging, Production)                  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                  β”‚                                          β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                 β”‚
β”‚         β–Ό                        β–Ό                        β–Ό                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚  FastAPI    β”‚         β”‚  Streamlit  β”‚          β”‚   Batch     β”‚          β”‚
β”‚  β”‚  REST API   β”‚         β”‚  Dashboard  β”‚          β”‚ Prediction  β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚         β”‚                       β”‚                        β”‚                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚                         Prometheus + Grafana                          β”‚  β”‚
β”‚  β”‚                      (Monitoring & Alerting)                          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              INFRASTRUCTURE                                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚   Docker    β”‚    β”‚   GitHub    β”‚    β”‚    DVC      β”‚    β”‚   MongoDB   β”‚ β”‚
β”‚   β”‚  Compose    β”‚    β”‚   Actions   β”‚    β”‚   (Data)    β”‚    β”‚  (Storage)  β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ›  Tech Stack

Category Technologies
ML/DL scikit-learn, TensorFlow/Keras, LSTM
MLOps MLflow, DVC, Docker, GitHub Actions
API FastAPI, Uvicorn, Pydantic
Data Pandas, NumPy, MongoDB
Visualization Streamlit, Plotly, Matplotlib
Testing pytest, pytest-cov, hypothesis
Code Quality Black, isort, flake8, mypy, pre-commit
Monitoring Prometheus, Grafana

✨ Features

ML Pipeline

  • βœ… Automated data ingestion from multiple sources
  • βœ… Data validation with quality checks and anomaly detection
  • βœ… Feature engineering (lag features, rolling statistics)
  • βœ… Multiple model training (RF, GB, Linear, Ridge, Lasso, SVR, LSTM)
  • βœ… Hyperparameter tuning with GridSearchCV
  • βœ… Model evaluation with comprehensive metrics (RMSE, MAE, RΒ², MAPE)

MLOps

  • βœ… Experiment tracking with MLflow
  • βœ… Model registry for versioning and staging
  • βœ… Data versioning with DVC
  • βœ… CI/CD pipeline with GitHub Actions
  • βœ… Containerization with Docker & Docker Compose
  • βœ… Pre-commit hooks for code quality

Production

  • βœ… REST API with FastAPI for real-time predictions
  • βœ… Batch prediction pipeline for large datasets
  • βœ… Monitoring dashboard with Streamlit
  • βœ… Health checks and API documentation (Swagger/OpenAPI)
  • βœ… Risk level classification (Critical, High, Medium, Low)

πŸ“ Project Structure

predictive-maintenance/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       β”œβ”€β”€ main.yml              # CI/CD pipeline
β”‚       └── model-training.yml    # Scheduled training
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                   # FastAPI application
β”‚   └── schemas.py                # Pydantic models
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ config.yaml               # Main configuration
β”‚   └── schema.yaml               # Data schema
β”œβ”€β”€ dashboard/
β”‚   └── app.py                    # Streamlit dashboard
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                      # Raw data
β”‚   β”œβ”€β”€ validated/                # Validated data
β”‚   β”œβ”€β”€ transformed/              # Processed features
β”‚   └── predictions/              # Batch predictions
β”œβ”€β”€ monitoring/
β”‚   β”œβ”€β”€ prometheus.yml            # Prometheus config
β”‚   └── grafana/                  # Grafana dashboards
β”œβ”€β”€ notebooks/
β”‚   └── eda.ipynb                 # Exploratory analysis
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_validation.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   β”œβ”€β”€ model_trainer.py
β”‚   β”‚   β”œβ”€β”€ model_evaluation.py
β”‚   β”‚   └── batch_prediction.py
β”‚   β”œβ”€β”€ pipelines/
β”‚   β”‚   └── training_pipeline.py
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ logger.py
β”‚   β”‚   └── model_utils.py
β”‚   β”œβ”€β”€ constants/
β”‚   β”‚   └── __init__.py
β”‚   └── mlflow_tracking.py
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit/
β”‚   β”‚   β”œβ”€β”€ test_data_validation.py
β”‚   β”‚   β”œβ”€β”€ test_model_evaluation.py
β”‚   β”‚   └── test_api.py
β”‚   β”œβ”€β”€ integration/
β”‚   β”‚   └── test_pipeline.py
β”‚   └── conftest.py               # Pytest fixtures
β”œβ”€β”€ artifacts/
β”‚   β”œβ”€β”€ models/                   # Trained models
β”‚   β”œβ”€β”€ logs/                     # Application logs
β”‚   └── reports/                  # Evaluation reports
β”œβ”€β”€ .dvc/                         # DVC configuration
β”œβ”€β”€ .pre-commit-config.yaml       # Pre-commit hooks
β”œβ”€β”€ docker-compose.yml            # Docker services
β”œβ”€β”€ Dockerfile                    # Multi-stage Dockerfile
β”œβ”€β”€ requirements.txt              # Dependencies
β”œβ”€β”€ setup.py                      # Package setup
β”œβ”€β”€ pyproject.toml                # Build configuration
β”œβ”€β”€ pytest.ini                    # Pytest configuration
└── README.md                     # This file

πŸ”„ ML Pipeline

Training Pipeline Flow

1. Data Ingestion    β†’ Load raw sensor data from source
2. Data Validation   β†’ Validate schema, types, and ranges
3. Transformation    β†’ Feature engineering & scaling
4. Model Training    β†’ Train multiple models
5. Model Evaluation  β†’ Compare and select best model
6. Model Registry    β†’ Version and stage models

Models Implemented

Model Type Use Case
Random Forest Ensemble Baseline, robust
Gradient Boosting Ensemble High accuracy
Linear Regression Linear Interpretable
Ridge/Lasso Linear Regularized
SVR Kernel Non-linear
LSTM Deep Learning Sequence modeling

πŸ“‘ API Documentation

Endpoints

Method Endpoint Description
GET /health Health check
GET /models List available models
POST /predict Single/batch prediction
POST /predict/batch File-based batch prediction
POST /models/reload Reload models

πŸ“Š Monitoring Dashboard

The Streamlit dashboard provides:

  • Overview: Key metrics, model comparison
  • Model Performance: Detailed metrics, visualizations
  • Predictions: Interactive prediction interface
  • Data Explorer: Feature distributions, correlations
  • System Health: API status, resource usage

πŸ“ˆ Results

Model Performance (Test Set)

Model RMSE MAE RΒ²
Random Forest 18.5 12.3 0.87
Gradient Boosting 17.2 11.8 0.89
LSTM 15.8 10.5 0.91

About

An end-to-end MLOps project for predicting Remaining Useful Life (RUL) of industrial equipment using the NASA C-MAPSS Turbofan Engine Degradation Dataset. This project demonstrates production-grade ML engineering practices including CI/CD, experiment tracking, model serving, containerization, and monitoring.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors