feat(auth): warm allowlisted providers during fill-first routing#2720
feat(auth): warm allowlisted providers during fill-first routing#2720cookie223 wants to merge 8 commits intorouter-for-me:devfrom
Conversation
Trigger a tiny background request for allowlisted secondary providers after fill-first selects the primary auth so cooldown/reset windows can start without delaying the user request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request introduces an asynchronous warmup mechanism for secondary providers when using the FillFirstSelector, specifically targeting OpenAI. It adds state to the Manager to track warmup TTLs and logic to dispatch dummy requests to ensure secondary providers are ready. Review feedback suggests addressing a potential memory leak in the TTL tracking map and optimizing goroutine creation to reduce overhead and lock contention.
| func (m *Manager) reserveMixedProviderWarmup(provider, authID string, now time.Time) bool { | ||
| if m == nil { | ||
| return false | ||
| } | ||
| provider = strings.ToLower(strings.TrimSpace(provider)) | ||
| authID = strings.TrimSpace(authID) | ||
| if provider == "" || authID == "" { | ||
| return false | ||
| } | ||
| key := provider + "|" + authID | ||
| m.mixedWarmMu.Lock() | ||
| defer m.mixedWarmMu.Unlock() | ||
| if until, ok := m.mixedWarmUntil[key]; ok && until.After(now) { | ||
| return false | ||
| } | ||
| m.mixedWarmUntil[key] = now.Add(mixedProviderWarmupTTL) | ||
| return true | ||
| } |
There was a problem hiding this comment.
The mixedWarmUntil map is used to track warmup TTLs but lacks a cleanup mechanism. In a long-running process with many unique credentials, this map will grow indefinitely, leading to a memory leak. Consider implementing a periodic cleanup of expired entries or using a size-limited cache with TTL support.
sdk/cliproxy/auth/conductor.go
Outdated
| go func(requestCtx context.Context, targetProviders []string, model string) { | ||
| for _, providerKey := range targetProviders { | ||
| if err := m.warmMixedProvider(requestCtx, providerKey, model); err != nil { | ||
| logEntryWithRequestID(requestCtx).Debugf("mixed provider warmup skipped provider=%s model=%s error=%v", providerKey, model, err) | ||
| } | ||
| } | ||
| }(ctx, append([]string(nil), targets...), strings.TrimSpace(routeModel)) |
There was a problem hiding this comment.
Spawning a goroutine for every pickNextMixed call can be inefficient, especially since warmMixedProvider immediately performs a pickNext call which acquires the manager's read lock. Under high load or during failure retry loops, this could lead to excessive goroutine creation and lock contention. Consider checking if a warmup is already in progress or recently completed for the target providers before spawning the goroutine, or at least before performing the pickNext call inside the goroutine.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d19511cc0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| func (m *Manager) warmMixedProvider(ctx context.Context, provider, routeModel string) error { | ||
| warmCtx := context.Background() | ||
| if ctx != nil { | ||
| warmCtx = context.WithoutCancel(ctx) |
There was a problem hiding this comment.
Preserve cancellation for async warmup HTTP calls
The warmup path detaches from the caller context using context.WithoutCancel, so these background HTTP calls have no cancellation or deadline signal from the originating request. If the secondary provider hangs or becomes slow to respond, warmup requests can remain blocked for a long time, and new warmups continue to be scheduled every TTL window, which can accumulate stuck goroutines/connections in production.
Useful? React with 👍 / 👎.
Prevent duplicate fill-first warmup goroutines per provider, give async warmups their own timeout, and prune expired warmup TTL entries to avoid unbounded state growth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ba1337cded
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| } | ||
| m.mu.Unlock() | ||
| } | ||
| m.maybeWarmOtherMixedProvidersAsync(ctx, eligibleProviders, providerKey, model) |
There was a problem hiding this comment.
Avoid warmup side effects during token-count routing
pickNextMixed is shared by Execute, ExecuteStream, and ExecuteCount, so calling maybeWarmOtherMixedProvidersAsync here also triggers warmup HTTP calls for token-count requests. In a fill-first mixed route (for example claude,openai), a /count_tokens request now sends an extra OpenAI chat completion in the background, which adds unexpected external traffic and billable token usage for what should be a read-only counting operation.
Useful? React with 👍 / 👎.
Restrict mixed provider warmups to execute and stream paths so count-only routing stays read-only and avoids unexpected upstream traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f45f291b25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| for key, until := range m.mixedWarmUntil { | ||
| if !until.After(now) || !strings.HasPrefix(key, provider+"|") { | ||
| continue | ||
| } | ||
| return false |
There was a problem hiding this comment.
Scope warmup TTL gating to auth ID
tryStartMixedProviderWarmup currently skips new warmups if any active key matches provider|*, so a recent warmup for one credential (for example openai|authA) suppresses warmup for other credentials of the same provider (such as openai|authB) for the full TTL. In fill-first mixed routes that rely on multiple auths per provider, this leaves fallback auths cold and undermines the intended per-provider/auth dedup behavior, causing avoidable latency spikes when traffic shifts to a different auth.
Useful? React with 👍 / 👎.
Keep provider-level single-flight dedupe, but avoid suppressing warmups for other auths on the same provider while a different auth is still within TTL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abb5cf0903
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| if !m.reserveMixedProviderWarmup(provider, auth.ID, time.Now()) { | ||
| return nil | ||
| } | ||
| models, _ := m.preparedExecutionModels(auth, routeModel) |
There was a problem hiding this comment.
Avoid advancing model pool cursor during warmup
Calling preparedExecutionModels in the warmup path mutates model-pool state for OpenAI-compatible auths: executionModelCandidates increments modelPoolOffsets through nextModelPoolOffset, so each warmup consumes a rotation slot before any real fallback traffic arrives. In mixed fill-first routes that use model alias pools, this changes which upstream model the next user request is sent to (and therefore can change cost/capability behavior) even though warmup is meant to be side-effect-free.
Useful? React with 👍 / 👎.
Keep mixed-provider warmup side-effect free by peeking pooled execution models without consuming alias-pool rotation state before real traffic arrives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbd7eee326
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| models, _ := m.preparedExecutionModelsWithoutAdvancing(auth, routeModel) | ||
| if len(models) == 0 { | ||
| return &Error{Code: "model_not_found", Message: "warmup model is unavailable"} | ||
| } | ||
| warmReq, err := buildMixedProviderWarmupRequest(auth, provider, models[0]) |
There was a problem hiding this comment.
Strip thinking suffix from warmup model payload
The warmup path takes models[0] directly from preparedExecutionModelsWithoutAdvancing and forwards it into the raw OpenAI payload, but those model candidates intentionally preserve thinking suffixes (for example gpt-4.1(8192)). Unlike the normal Execute path (which parses/removes suffixes before upstream translation), this warmup request sends the suffixed model string as-is, which OpenAI-compatible /chat/completions endpoints can reject as an unknown model; in practice, warmup silently fails for suffix-enabled routes and the fallback provider remains cold.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
The mixed-provider warmup flow is headed in the right direction, but I found one blocking correctness issue in the new background request path.
Blocking:
warmMixedProvider()currently treats any HTTP response as a successful warmup. The executor contract returns a raw*http.Response, so 401/429/5xx responses do not come back as errors here. Because the TTL reservation is recorded before the request and is not cleared on a bad status, a failed warmup suppresses follow-up warmups for that auth for the next minute. In practice, that means the feature can silently stop warming a secondary provider exactly when the provider rejects the request.
Please make non-2xx responses fail the warmup path and avoid keeping the TTL reservation for unsuccessful attempts.
Test plan:
PATH="/opt/homebrew/opt/go/bin:$PATH" go test ./sdk/cliproxy/auth/...
Treat non-2xx warmup responses as failures, clear auth-scoped warmup TTL reservations on unsuccessful attempts, and strip thinking suffixes from warmup model payloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8931be8f04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, ok := m.mixedWarmInFlight[provider]; ok { | ||
| return false | ||
| } | ||
| m.mixedWarmInFlight[provider] = struct{}{} | ||
| return true |
There was a problem hiding this comment.
Check warmup TTL before starting async warmup
tryStartMixedProviderWarmup only gates on mixedWarmInFlight, so once a warmup finishes, every subsequent fill-first request within the 1-minute TTL still spawns a new goroutine, runs pickNext/model preparation, and only then exits when reserveMixedProviderWarmup rejects it. In high-QPS mixed routes this defeats the intended TTL deduplication and adds avoidable goroutine/CPU churn even though no warmup HTTP call is sent.
Useful? React with 👍 / 👎.
Pre-check auth-scoped warmup TTL before starting async warmup workers so fill-first routes skip duplicate background warmup setup without regressing prior warmup safety fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61158cfff4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| targets = append(targets, providerKey) | ||
| } | ||
| for _, providerKey := range targets { | ||
| auth, _, err := m.pickNext(ctx, providerKey, routeModel, cliproxyexecutor.Options{}, nil) |
There was a problem hiding this comment.
Respect pinned auth when deciding warmup candidates
The warmup selector bypasses request metadata by calling pickNext with cliproxyexecutor.Options{} here, so pinned_auth_id is ignored for warmup selection. In pinned sessions (for example websocket follow-up requests that lock execution to one auth), mixed routing cannot actually fail over to secondary providers, but this code still emits background warmup HTTP calls to those providers, creating avoidable external traffic and billable usage with no chance of serving the request.
Useful? React with 👍 / 👎.
Preserve request metadata when preselecting warmup auths so pinned sessions do not send unnecessary fallback warmup traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
openaiallowlist entry and deduplicate warmups per provider/auth with a short TTL so concurrent requests do not fan outTest plan
PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./sdk/cliproxy/auth/...PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go build -o /tmp/cliproxyapi-test-build ./cmd/server && rm /tmp/cliproxyapi-test-buildPATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./...(currently fails ininternal/runtime/executor/claude_executor_test.goon pre-existing system prompt block assertions unrelated to this change)🤖 Generated with Claude Code