GitHub - iamkalio/traceflow: AI Observability & Evaluation

Traceflow

AI observability + evals: ingest OTLP traces for LLM calls, run LLM-as-a-judge evaluations, and ship a regression workflow that answers “did we get better or worse?” across recent traces.

What this repo is

Traceflow is a local-first stack for LLM tracing and quality:

Trace ingest (OTLP/HTTP): send OpenTelemetry traces for LLM calls (prompt, completion, latency, tokens, cost, status).
Dashboard (Next.js): browse traces/spans, inspect inputs/outputs/context, track spend, and run evals from the UI.
LLM-as-a-judge evals: run groundedness scoring against retrieved context (question + context + response → score/label/reasoning).
Regression: pick the last (N) traces, re-run evals, compare against the previous snapshot, and roll up a batch summary (improved/regressed/unchanged).

Why it’s useful

Catch quality drift before it hits users (regressions/improvements over time).
Debug faster with full trace context (inputs, outputs, retrieval context, latency, cost).
Production-shaped, demo-friendly: one docker compose brings up API, UI, Postgres, Redis, and an async worker.

Features

SDK: instrument with one line; captures prompt, completion, tokens, cost, latency, and caller name; exports OTLP.
Trace explorer: table + filters + per-trace detail (spans, attributes, errors).
Evals: groundedness judge returns structured JSON (score/label/reasoning + improvements).
Regression batches: last (N) traces → compare vs prior → per-trace deltas + human-readable system summary.

Installation

pip install traceflow-ai[openai]

Quick start

For full local setup (Docker + non-Docker), see SETUP.md.

1. Run the dashboard

Using the image from Docker Hub:

docker run -p 8000:8000 iamkalio/traceflow-dashboard

Open http://localhost:8000. The dashboard (API + UI) receives traces from the SDK and shows them in a simple UI (traces, spans, cost, latency).

2. In your app

pip install traceflow-ai[openai]

import traceflow_ai
from openai import OpenAI

traceflow_ai.init(endpoint="http://localhost:8000")
client = OpenAI()
# Use client.chat.completions.create(...) as usual — traces appear in the dashboard

3. Run from this repo (if you forked or cloned)

docker compose up --build

This starts Postgres, Redis, the API (:8000), the UI, and a worker that processes eval jobs. For evals/regression to finish, all four must be running: the API stores traces and enqueues work; the worker reads from Redis, loads spans from Postgres, and writes eval results (set OPENAI_API_KEY in .env.docker next to docker-compose.yml, or export it before docker compose up).

Or run the backend locally: cd backend && pip install -r requirements.txt && uvicorn main:app --reload --port 8000 — then start Redis, run python3 -m modules.jobs.worker from backend/ with the same DATABASE_URL / REDIS_URL / OPENAI_API_KEY as the API.

4. Example — sdk/python/ has a minimal example; run python3 app.py with the dashboard running and OPENAI_API_KEY set.

Project

Part	Description
`backend/`	Modular monolith: FastAPI, domain modules (`modules/*`), OTLP ingest, RQ worker
`app/`	Dashboard (Next.js)
`sdk/python/`	Example using traceflow-ai from PyPI against a local API

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.claude/worktrees		.claude/worktrees
.cursor		.cursor
app		app
assets		assets
backend		backend
sdk/python		sdk/python
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SETUP.md		SETUP.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What this repo is

Why it’s useful

Features

Installation

Quick start

Project

Dashboard

Trace details

LLM-as-a-judge

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What this repo is

Why it’s useful

Features

Installation

Quick start

Project

Dashboard

Trace details

LLM-as-a-judge

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages