feat(#51): AI-powered model recommendations for cost optimization#73
feat(#51): AI-powered model recommendations for cost optimization#73
Conversation
…API endpoints - modelClassifier.ts: session complexity classifier (simple/moderate/complex) based on message length, tool usage patterns, and feature analysis - modelRecommendations.ts: maps complexity to cheaper model alternatives with cost savings estimates - routes.ts: GET /api/recommendations/:sessionId and /api/recommendations/summary Implements backend for AI-powered model recommendations (issue #51)
…ge cases - 23 tests covering classifier and recommendation mapper - Edge cases: zero-cost sessions, empty models, no messages, unknown models - Savings rounding to 4 decimal places - Bulk recommendations filter completed sessions, respect limits - Added vitest as test framework - All tests passing, build clean
Prevents Express from matching 'summary' as a sessionId param.
There was a problem hiding this comment.
Pull request overview
Adds backend support for AI-assisted “cheaper model” recommendations by classifying session complexity and estimating potential cost savings, exposed via new recommendations API endpoints.
Changes:
- Introduces heuristic session complexity classification (
simple|moderate|complex). - Adds model recommendation + savings estimation logic for single sessions and recent-session summaries.
- Exposes new API routes for per-session and summary recommendations, and adds vitest unit tests.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/src/routes.ts | Adds /api/recommendations/summary and /api/recommendations/:sessionId endpoints. |
| backend/src/modelRecommendations.ts | Implements complexity→model mapping and savings estimation (single + bulk). |
| backend/src/modelClassifier.ts | Implements heuristic session complexity classifier (detail + summary). |
| backend/src/tests/modelRecommendations.test.ts | Adds unit tests for recommendation mapping and savings edge cases. |
| backend/src/tests/modelClassifier.test.ts | Adds unit tests for complexity classification heuristics. |
| backend/package.json | Adds vitest scripts and dev dependency. |
| backend/package-lock.json | Locks vitest and transitive dependencies. |
Files not reviewed (1)
- backend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Calculate potential savings | ||
| const currentCost = session.costUsd; | ||
| const estimatedNewCost = estimateSessionCost(session.tokenCount, recommendation.model); | ||
| const savings = Math.max(0, currentCost - estimatedNewCost); | ||
| const savingsPercentage = currentCost > 0 ? (savings / currentCost) * 100 : 0; | ||
|
|
There was a problem hiding this comment.
Savings are computed even when the recommendation doesn’t actually change the model (or changes only the label, e.g. anthropic/claude-haiku-3 → claude-haiku). Because estimateSessionCost is only a rough estimate, this can report non-zero “savings” for a no-op recommendation. Consider short-circuiting: if the recommended tier/normalized model is the same as the current tier, set savings to 0 (and percentage to 0) instead of estimating.
| const complexity = classifySessionSummary(session); | ||
| const recommendation = mapComplexityToModel(complexity, session.model); | ||
|
|
||
| const currentCost = session.costUsd; | ||
| const estimatedNewCost = estimateSessionCost(session.tokenCount, recommendation.model); | ||
| const savings = Math.max(0, currentCost - estimatedNewCost); | ||
| const savingsPercentage = currentCost > 0 ? (savings / currentCost) * 100 : 0; | ||
|
|
There was a problem hiding this comment.
In bulk mode, savings are estimated even when mapComplexityToModel effectively recommends the same model tier (e.g. already on Haiku / GPT-3.5, or default “no change”). Since estimateSessionCost is approximate, this can produce misleading positive savings for a no-op. Consider detecting “no real change” (e.g. comparing normalized model ids) and forcing savings/percentage to 0 in those cases.
| router.get("/recommendations/summary", async (req: Request, res: Response) => { | ||
| try { | ||
| const profile = req.query.profile as string | undefined; | ||
| const limit = parseInt(req.query.limit as string) || 20; |
There was a problem hiding this comment.
limit is parsed without a radix and without clamping/validation. Elsewhere in this file (e.g. the earlier /sessions listing) limit is parsed with base 10 and clamped to a safe range. Consider matching that pattern here (e.g. parseInt(..., 10) + min/max bounds) to avoid negative/NaN/overly large limits affecting behavior/perf.
| const limit = parseInt(req.query.limit as string) || 20; | |
| const rawLimit = parseInt(req.query.limit as string, 10); | |
| const limit = | |
| !Number.isNaN(rawLimit) ? Math.max(1, Math.min(rawLimit, 100)) : 20; |
| import { SessionDetail, SessionSummary } from "./sessions.js"; | ||
| import { classifySession, classifySessionSummary, SessionComplexity } from "./modelClassifier.js"; |
There was a problem hiding this comment.
This relative import uses a .js extension ("./sessions.js"), but the source file is sessions.ts and the project is compiled as module: "commonjs" without NodeNext-style module resolution. This is likely to fail type-check/build with "Cannot find module". Use the same extensionless style used across the backend (e.g. "./sessions").
| import { SessionDetail, SessionSummary } from "./sessions.js"; | |
| import { classifySession, classifySessionSummary, SessionComplexity } from "./modelClassifier.js"; | |
| import { SessionDetail, SessionSummary } from "./sessions"; | |
| import { classifySession, classifySessionSummary, SessionComplexity } from "./modelClassifier"; |
| @@ -0,0 +1,148 @@ | |||
| import { SessionDetail, SessionSummary } from "./sessions.js"; | |||
There was a problem hiding this comment.
This relative import uses a .js extension ("./sessions.js"), but the codebase generally uses extensionless relative imports and the backend tsconfig is module: "commonjs" (not NodeNext). To avoid module resolution/build issues, import "./sessions" instead.
| import { SessionDetail, SessionSummary } from "./sessions.js"; | |
| import { SessionDetail, SessionSummary } from "./sessions"; |
| return { | ||
| sessionId: session.id, | ||
| title: session.title, | ||
| recommendation: { | ||
| currentModel: session.model, | ||
| recommendedModel: recommendation.model, | ||
| complexity, | ||
| confidence: 0.6, // Lower confidence without full detail | ||
| reasons: [recommendation.reason], | ||
| potentialSavings: { | ||
| costUsd: savings, | ||
| percentage: savingsPercentage, | ||
| }, | ||
| }, | ||
| }; | ||
| }) | ||
| .filter((r) => r.recommendation.potentialSavings.costUsd > 0.001); // Filter out negligible savings | ||
|
|
||
| const totalSavings = recommendations.reduce((sum, r) => sum + r.recommendation.potentialSavings.costUsd, 0); | ||
|
|
||
| return { | ||
| totalSessions: recentSessions.length, | ||
| potentialTotalSavings: totalSavings, | ||
| recommendations, |
There was a problem hiding this comment.
Bulk recommendations return potentialSavings.costUsd and percentage without rounding, while the per-session endpoint rounds (4dp for cost / 1dp for percentage) and the PR description calls out rounding as an edge case. Consider rounding here as well (and rounding potentialTotalSavings) to keep the API consistent and avoid long floating-point decimals in the UI.
- Add BulkRecommendationSummary, SessionRecommendation types - Add API functions: getRecommendationSummary, getSessionRecommendation - Create RecommendationsTab component with summary card + session list - Wire tab into dashboard page with fetch effect - Responsive desktop/mobile layouts, complexity badges, savings display - Shows total potential savings, avg per session, individual recommendations
- SessionClient: stack stats on mobile, full-width messages, responsive timeline dots - ProjectClient: hide dividers on mobile, stack stats, truncate model names, responsive timeline - Both: sm: breakpoints, break-words for long text, min-h touch targets on buttons - Pattern matches dashboard tabs (mobile cards, desktop layouts)
|
Frontend review ✅ Reviewed the UI integration:
Ready for merge from the frontend side. |
…ndary thresholds, GPT/Gemini paths, savings precision - Classifier: boundary thresholds (100/300 char, 0.3/1.0 tool ratio, 20/21 msg count, 30/31 min duration) - Classifier: case-sensitive tool names (Read/Write/Edit), null content handling, confidence caps - Recommendations: GPT-4 → gpt-3.5/gpt-4-turbo paths, Gemini pro → flash paths - Recommendations: cheap model upgrade hints for complex tasks (haiku, gpt-3.5) - Bulk: negligible savings filter, mixed model families, idle sessions, confidence level - Savings: non-negative guarantee, percentage validation for known cost reductions Coverage: 23 → 88 tests (4 test files)
GalDayan
left a comment
There was a problem hiding this comment.
🎨 Frontend Review — PR #73 (Model Recommendations)
Reviewed all 6 frontend files. Overall: solid work, merge-ready. A few notes:
✅ What's Good
- RecommendationsTab.tsx — Clean component structure. Skeleton loading states are well done (matching existing patterns). Empty state handles both zero-sessions and all-optimal cases. Nice touch.
- Responsive layouts — The dual desktop/mobile layout in RecommendationsTab is good UX. The responsive fixes in ProjectClient and SessionClient (flex-col on mobile, hidden dividers, break-words) are exactly the kind of polish we need.
- Types —
types.tsadditions are well-typed.SessionComplexityas a union type is clean.BulkRecommendationSummaryshape matches the API contract. - API layer —
api.tsfunctions follow existing patterns (fetchJson, URLSearchParams). Good. - Tab integration — The
useEffectwith cancellation token for recommendations fetch is correct. Only fetches when tab is active — no wasted calls.
🟡 Minor Observations (non-blocking)
-
formatModel()could live in a shared util — It's useful outside RecommendationsTab (e.g., session detail views). Not urgent, but worth extracting later. -
Confidence dot — The inline
style={{ backgroundColor: ... }}with ternary chains works but could be acomplexityConfig-style lookup for consistency. Fine for now. -
overflow-x-autoon tab bar — Smart addition for the 5th tab. Good catch preventing horizontal overflow on narrow viewports. -
getSessionRecommendation()in api.ts — Exported but not used anywhere in this PR. Presumably for the per-session view later? No issue, just noting it. -
Average percentage calculation — Line ~140:
Math.round()on the reduce result — the division byrecommendations.lengthis outsideMath.round(). Works correctly, just dense. A variable would improve readability.
🟢 Verdict
Ship it. Clean component, good responsive handling, proper TypeScript, follows existing patterns. The responsive fixes in ProjectClient/SessionClient are a nice bonus — those pages needed the mobile love.
— Anas 🎨
Summary
Implements the backend for AI-powered model recommendations (closes #51).
What's included
3-layer architecture:
Classifier (
modelClassifier.ts) — Analyzes session complexity (simple/moderate/complex) using heuristics:Mapper (
modelRecommendations.ts) — Maps complexity to cheaper model alternatives:API Endpoints (
routes.ts):GET /api/recommendations/:sessionId— Per-session recommendationGET /api/recommendations/summary— Bulk savings across all agents (the money view)Edge cases handled
/summarybefore param/:sessionId)Tests
tsc --noEmitWhat's next
cc @GalDayan