RL-based CI/CD Pipeline Optimizer

Implementation of arXiv:2601.11647 — Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines.

A DQN agent learns to dynamically select test scope (full, partial, or skip) for each commit based on its metadata, optimizing the tradeoff between pipeline throughput and defect detection.

Results

Verified on Modal cloud (2000 episodes, 100 commits/episode):

Metric	Paper Target	Our Result
Throughput Improvement	~30%	32.1%
Test Time Savings	~25%	24.3%
Defect Miss Rate	<5%	3.0%

The agent learns a state-dependent policy: 68% full tests on risky commits, 25% partial on safe commits, 7% skip on trivially safe commits.

Quick Start

# Clone and setup
git clone https://github.com/Ataraxy-Labs/rlci.git
cd rlci
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest

# Quick training (reduced parameters, ~2 min)
python main.py --quick

# Full training (paper parameters, ~30 min)
python main.py

Project Structure

src/
  environment.py   # MDP simulation (10-dim state, 3 actions, reward function)
  agent.py         # DQN with replay buffer and target network
  baselines.py     # Static, Heuristic, and Supervised Classifier baselines
  train.py         # Training pipeline with convergence checking
  evaluate.py      # Evaluation metrics (TP, DMR, TTS, SI) and comparison
  api.py           # FastAPI server for policy deployment
  visualize.py     # Matplotlib visualization plots
main.py            # CLI entry point (full pipeline)
modal_train.py     # Cloud training on Modal
tests/             # 85 tests across 6 test files

MDP Formulation

Component	Definition
State	10-dimensional normalized vector: diff size, developer ID, file types, historical defect rate, prior pass rate, time gap, files changed, merge flag, branch depth, complexity
Actions	`full_test` (10 min, 100% detection), `partial_test` (3 min, 70%), `skip_test` (0 min, 0%)
Reward	R = -(t_exec / T_full) - β · I_bug_escaped
Discount	γ = 0.99

Training

Local

# Default: 2000 episodes, beta=20, seed=42
python main.py

# Custom parameters
python main.py --episodes 2000 --beta 20 --seed 42 --commits 100

# Quick mode for testing
python main.py --quick

Modal (Cloud)

pip install modal
modal run modal_train.py

Runs full training + evaluation (baselines, beta sensitivity, adversarial, 5-run stats) on Modal's cloud infrastructure. Takes ~1 hour.

Evaluation

The pipeline runs 5 evaluation stages:

DQN Training - 2000 episodes with epsilon decay (1.0 → 0.1)
Baseline Comparison - RL agent vs Static, Heuristic, and Supervised Classifier over 5 independent runs
β Sensitivity Study - Sweep β ∈ {1, 3, 5, 10, 15, 20, 30, 40} showing the safety-speed tradeoff
Adversarial Robustness - Tests against deceptive low-diff commits with hidden bugs
Results & Plots - Saves metrics to results/ and plots to plots/

API Deployment

# Start the API server
uvicorn src.api:app --reload

# Query the policy
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "diff_size": 0.3,
    "developer_id": 0.5,
    "file_types_modified": 0.4,
    "historical_defect_rate": 0.1,
    "prior_test_pass_rate": 0.9,
    "time_since_last_commit": 0.2,
    "num_files_changed": 0.15,
    "is_merge_commit": 0.0,
    "branch_depth": 0.1,
    "code_complexity": 0.2
  }'

Response:

{
  "action": "partial_test",
  "action_id": 1,
  "q_values": [-0.98, -0.31, -2.45],
  "confidence": 0.87
}

Key Findings

β controls the safety-speed tradeoff: Low β (<10) leads to aggressive skipping with high defect miss rates. High β (>40) leads to near-100% full testing with minimal time savings. The sweet spot is β=15-20.
Bimodal commit distribution: Real CI/CD pipelines have ~35% trivially safe commits (config, docs) and ~65% substantive code changes. The agent learns to exploit this structure.
Reward normalization is critical: Normalizing execution time by T_full keeps the reward components on comparable scales, allowing β to effectively control the tradeoff.

β Sensitivity

 beta |  TP_imp |   TTS |  DMR | Policy
------+---------+-------+------+---------------------------
    1 | 75658%  | 99.9% | 99.4%| 100% skip
    5 |   400%  | 80.0% | 37.3%| 67% partial, 33% skip
   10 |   116%  | 53.7% | 19.8%| 31% full, 51% partial
   15 |    33%  | 24.6% |  2.9%| 66% full, 32% partial
   20 |    33%  | 24.5% |  3.1%| 68% full, 25% partial
   30 |    29%  | 22.5% |  2.2%| 68% full, 30% partial
   40 |    17%  | 14.5% |  1.2%| 79% full, 20% partial

Citation

@article{rlcicd2025,
  title={Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines},
  journal={arXiv preprint arXiv:2601.11647},
  year={2025}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
modal_train.py		modal_train.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-based CI/CD Pipeline Optimizer

Results

Quick Start

Project Structure

MDP Formulation

Training

Local

Modal (Cloud)

Evaluation

API Deployment

Key Findings

β Sensitivity

Citation

License

About

Uh oh!

Releases

Packages

Languages

Ataraxy-Labs/rlci

Folders and files

Latest commit

History

Repository files navigation

RL-based CI/CD Pipeline Optimizer

Results

Quick Start

Project Structure

MDP Formulation

Training

Local

Modal (Cloud)

Evaluation

API Deployment

Key Findings

β Sensitivity

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages