A Flat Data attempt at historically documenting GitHub statuses.
This project builds the "missing GitHub status page": a historical mirror that shows actual uptime percentages and incidents across the entire platform, plus per-service uptime based on the incident data. It reconstructs timelines from the Atom feed history and turns them into structured outputs and a static site.
Install uv per the installation docs.
The extractor depends on onnxruntime, which supports Python 3.11–3.13 (not 3.14+).
Check your version: python --version or python3 --version.
If you need a compatible version, uv can install it:
uv python install 3.13Then create the virtual environment with that Python:
uv venv --python 3.13
uv syncOr use uv sync with an existing venv. If uv sync fails with an onnxruntime error, your Python is
likely 3.14+; switch to 3.11–3.13 as above.
Run uv sync before using the extractor scripts. To view the static site only, no dependencies are
needed—the parsed/ directory in the repo contains pre-generated data; just serve the repo root with
any HTTP server.
uv venv
uv sync
Run the extractor across all history:
uv run python scripts/extract_incidents.py --out out
Run the extractor for the last year (UTC example):
uv run python scripts/extract_incidents.py --out out_last_year --since 2025-01-03 --until 2026-01-03
Use JSONL (default) or split per-incident outputs:
uv run python scripts/extract_incidents.py --out out --incidents-format jsonl
uv run python scripts/extract_incidents.py --out out --incidents-format split
Enrich incidents with impact level by scraping the incident pages (cached):
uv run python scripts/extract_incidents.py --out out --enrich-impact
Infer missing components with GLiNER2 (used only when the incident page lacks "affected components"):
uv run python scripts/extract_incidents.py --out out --infer-components gliner2
After each Flat data update, a GitHub Action runs the parser and commits outputs to parsed/.
Run tests:
uv run python -m unittest discover -s tests
Static site lives in site/ and reads data from parsed/.
No build step—serve the repo root with any static HTTP server:
python -m http.server 8000
# or: python3 -m http.server 8000Then open http://localhost:8000/site/ in your browser.
Some incident pages do not list "affected components". In those cases we use GLiNER2 as a fallback:
- Input text: incident title + non-Resolved updates.
- Labels: the 10 GitHub services with short descriptions.
- Thresholded inference (default: 0.75 confidence).
- Final filter: the label must also appear via explicit service aliases in the text.
This keeps HTML tags as the source of truth and uses ML only to fill gaps.
To validate the fallback approach, an experiment is run that produces:
- Audit: every GLiNER2-tagged incident with text evidence snippets.
- Evaluation: GLiNER2 predictions compared against incidents that do have HTML "affected components".
Reproduce the experiment at a fixed time point (numbers will change as new data arrives):
uv run python scripts/run_gliner_experiment.py --as-of 2026-01-08 --output-dir tagging-experiment
Outputs are written to:
tagging-experiment/gliner2_audit.jsonl(tagged incidents + evidence snippets)tagging-experiment/gliner2_eval.json(metrics, per-label breakdown, sample mismatches)tagging-experiment/error_analysis.md(diff-style table of sample errors)
Latest results (as-of 2026-01-08, threshold 0.75, alias filter on, non-Resolved text only):
| Metric | Value |
|---|---|
| Evaluated incidents | 447 |
| Predicted non-empty | 419 |
| Precision | 0.950 |
| Recall | 0.883 |
| Exact match rate | 0.785 |
| Audit count (missing-tag incidents) | 51 |
Per-label precision/recall (top-level service components):
| Label | Precision | Recall | TP | FP | FN |
|---|---|---|---|---|---|
| Git Operations | 0.968 | 0.909 | 60 | 2 | 6 |
| Webhooks | 0.938 | 0.918 | 45 | 3 | 4 |
| API Requests | 0.915 | 0.915 | 54 | 5 | 5 |
| Issues | 1.000 | 0.286 | 22 | 0 | 55 |
| Pull Requests | 0.948 | 0.979 | 92 | 5 | 2 |
| Actions | 0.958 | 0.947 | 161 | 7 | 9 |
| Packages | 0.917 | 0.971 | 33 | 3 | 1 |
| Pages | 0.855 | 0.964 | 53 | 9 | 2 |
| Codespaces | 0.982 | 0.982 | 110 | 2 | 2 |
| Copilot | 1.000 | 0.967 | 58 | 0 | 2 |
Summary: the fallback is high-precision and mostly conservative. Most errors are false negatives (missing a true component), while false positives are typically "extra" components inferred from multi-service incident titles.
Sample errors (diffs highlighted with + for extra and - for missing):
| Type | Incident | Predicted | Truth |
|---|---|---|---|
| false_positive | Incident with GitHub Actions and Codespaces | Actions, +Codespaces |
Actions |
| false_positive | Incident with GitHub Packages and GitHub Pages | Packages, +Pages |
Packages |
| false_positive | Incident with Pull Requests and Webhooks | Pull Requests, +Webhooks |
Pull Requests |
| false_negative | Incident on 2022-09-06 22:05 UTC | none |
-Git Operations, -Visit www, -Webhooks |
| false_negative | Incident on 2022-09-06 22:56 UTC | none |
-Git Operations, -Visit www, -Webhooks |
| false_negative | Incident with API Requests | none |
-API Requests |
Outputs are written to the directory passed to --out (local examples use out/).
The GLiNER2 experiment writes to tagging-experiment/ by default.
The automation workflow writes to parsed/.
<out>/incidents.json: merged incident timeline records<out>/incidents.jsonl: one JSON object per incident (default)<out>/incidents/: per-incident JSON files when using--incidents-format split<out>/segments.csv: per-status timeline segments for Gantt/phase views<out>/downtime_windows.csv: downtime windows for incident bar charts
Incident records include optional impact and components fields when enrichment is enabled.
Service components are sourced as follows:
- Primary: the incident page "affected components" section (if present).
- Fallback: GLiNER2 schema-driven extraction from the incident title + non-resolved updates, filtered by explicit service aliases to avoid generic matches.