Skip to content

feat(auth): warm allowlisted providers during fill-first routing#2720

Open
cookie223 wants to merge 8 commits intorouter-for-me:devfrom
cookie223:feat/fill-first-provider-warmup
Open

feat(auth): warm allowlisted providers during fill-first routing#2720
cookie223 wants to merge 8 commits intorouter-for-me:devfrom
cookie223:feat/fill-first-provider-warmup

Conversation

@cookie223
Copy link
Copy Markdown

Summary

  • trigger a tiny background warmup request for allowlisted secondary providers after fill-first mixed routing selects the primary auth
  • start with an openai allowlist entry and deduplicate warmups per provider/auth with a short TTL so concurrent requests do not fan out
  • add conductor tests covering async warmup, round-robin skip, failure isolation, and TTL deduplication

Test plan

  • PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./sdk/cliproxy/auth/...
  • PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go build -o /tmp/cliproxyapi-test-build ./cmd/server && rm /tmp/cliproxyapi-test-build
  • PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./... (currently fails in internal/runtime/executor/claude_executor_test.go on pre-existing system prompt block assertions unrelated to this change)

🤖 Generated with Claude Code

Trigger a tiny background request for allowlisted secondary providers after fill-first selects the primary auth so cooldown/reset windows can start without delaying the user request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot changed the base branch from main to dev April 12, 2026 00:18
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an asynchronous warmup mechanism for secondary providers when using the FillFirstSelector, specifically targeting OpenAI. It adds state to the Manager to track warmup TTLs and logic to dispatch dummy requests to ensure secondary providers are ready. Review feedback suggests addressing a potential memory leak in the TTL tracking map and optimizing goroutine creation to reduce overhead and lock contention.

Comment on lines +3015 to +3032
func (m *Manager) reserveMixedProviderWarmup(provider, authID string, now time.Time) bool {
if m == nil {
return false
}
provider = strings.ToLower(strings.TrimSpace(provider))
authID = strings.TrimSpace(authID)
if provider == "" || authID == "" {
return false
}
key := provider + "|" + authID
m.mixedWarmMu.Lock()
defer m.mixedWarmMu.Unlock()
if until, ok := m.mixedWarmUntil[key]; ok && until.After(now) {
return false
}
m.mixedWarmUntil[key] = now.Add(mixedProviderWarmupTTL)
return true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mixedWarmUntil map is used to track warmup TTLs but lacks a cleanup mechanism. In a long-running process with many unique credentials, this map will grow indefinitely, leading to a memory leak. Consider implementing a periodic cleanup of expired entries or using a size-limited cache with TTL support.

Comment on lines +2901 to +2907
go func(requestCtx context.Context, targetProviders []string, model string) {
for _, providerKey := range targetProviders {
if err := m.warmMixedProvider(requestCtx, providerKey, model); err != nil {
logEntryWithRequestID(requestCtx).Debugf("mixed provider warmup skipped provider=%s model=%s error=%v", providerKey, model, err)
}
}
}(ctx, append([]string(nil), targets...), strings.TrimSpace(routeModel))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Spawning a goroutine for every pickNextMixed call can be inefficient, especially since warmMixedProvider immediately performs a pickNext call which acquires the manager's read lock. Under high load or during failure retry loops, this could lead to excessive goroutine creation and lock contention. Consider checking if a warmup is already in progress or recently completed for the target providers before spawning the goroutine, or at least before performing the pickNext call inside the goroutine.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d19511cc0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

func (m *Manager) warmMixedProvider(ctx context.Context, provider, routeModel string) error {
warmCtx := context.Background()
if ctx != nil {
warmCtx = context.WithoutCancel(ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cancellation for async warmup HTTP calls

The warmup path detaches from the caller context using context.WithoutCancel, so these background HTTP calls have no cancellation or deadline signal from the originating request. If the secondary provider hangs or becomes slow to respond, warmup requests can remain blocked for a long time, and new warmups continue to be scheduled every TTL window, which can accumulate stuck goroutines/connections in production.

Useful? React with 👍 / 👎.

Prevent duplicate fill-first warmup goroutines per provider, give async warmups their own timeout, and prune expired warmup TTL entries to avoid unbounded state growth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba1337cded

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}
m.mu.Unlock()
}
m.maybeWarmOtherMixedProvidersAsync(ctx, eligibleProviders, providerKey, model)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid warmup side effects during token-count routing

pickNextMixed is shared by Execute, ExecuteStream, and ExecuteCount, so calling maybeWarmOtherMixedProvidersAsync here also triggers warmup HTTP calls for token-count requests. In a fill-first mixed route (for example claude,openai), a /count_tokens request now sends an extra OpenAI chat completion in the background, which adds unexpected external traffic and billable token usage for what should be a read-only counting operation.

Useful? React with 👍 / 👎.

Restrict mixed provider warmups to execute and stream paths so count-only routing stays read-only and avoids unexpected upstream traffic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f45f291b25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3059 to +3063
for key, until := range m.mixedWarmUntil {
if !until.After(now) || !strings.HasPrefix(key, provider+"|") {
continue
}
return false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scope warmup TTL gating to auth ID

tryStartMixedProviderWarmup currently skips new warmups if any active key matches provider|*, so a recent warmup for one credential (for example openai|authA) suppresses warmup for other credentials of the same provider (such as openai|authB) for the full TTL. In fill-first mixed routes that rely on multiple auths per provider, this leaves fallback auths cold and undermines the intended per-provider/auth dedup behavior, causing avoidable latency spikes when traffic shifts to a different auth.

Useful? React with 👍 / 👎.

Keep provider-level single-flight dedupe, but avoid suppressing warmups for other auths on the same provider while a different auth is still within TTL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abb5cf0903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if !m.reserveMixedProviderWarmup(provider, auth.ID, time.Now()) {
return nil
}
models, _ := m.preparedExecutionModels(auth, routeModel)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid advancing model pool cursor during warmup

Calling preparedExecutionModels in the warmup path mutates model-pool state for OpenAI-compatible auths: executionModelCandidates increments modelPoolOffsets through nextModelPoolOffset, so each warmup consumes a rotation slot before any real fallback traffic arrives. In mixed fill-first routes that use model alias pools, this changes which upstream model the next user request is sent to (and therefore can change cost/capability behavior) even though warmup is meant to be side-effect-free.

Useful? React with 👍 / 👎.

Keep mixed-provider warmup side-effect free by peeking pooled execution models without consuming alias-pool rotation state before real traffic arrives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbd7eee326

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3010 to +3014
models, _ := m.preparedExecutionModelsWithoutAdvancing(auth, routeModel)
if len(models) == 0 {
return &Error{Code: "model_not_found", Message: "warmup model is unavailable"}
}
warmReq, err := buildMixedProviderWarmupRequest(auth, provider, models[0])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip thinking suffix from warmup model payload

The warmup path takes models[0] directly from preparedExecutionModelsWithoutAdvancing and forwards it into the raw OpenAI payload, but those model candidates intentionally preserve thinking suffixes (for example gpt-4.1(8192)). Unlike the normal Execute path (which parses/removes suffixes before upstream translation), this warmup request sends the suffixed model string as-is, which OpenAI-compatible /chat/completions endpoints can reject as an unknown model; in practice, warmup silently fails for suffix-enabled routes and the fallback provider remains cold.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixed-provider warmup flow is headed in the right direction, but I found one blocking correctness issue in the new background request path.

Blocking:

  • warmMixedProvider() currently treats any HTTP response as a successful warmup. The executor contract returns a raw *http.Response, so 401/429/5xx responses do not come back as errors here. Because the TTL reservation is recorded before the request and is not cleared on a bad status, a failed warmup suppresses follow-up warmups for that auth for the next minute. In practice, that means the feature can silently stop warming a secondary provider exactly when the provider rejects the request.

Please make non-2xx responses fail the warmup path and avoid keeping the TTL reservation for unsuccessful attempts.

Test plan:

  • PATH="/opt/homebrew/opt/go/bin:$PATH" go test ./sdk/cliproxy/auth/...

Treat non-2xx warmup responses as failures, clear auth-scoped warmup TTL reservations on unsuccessful attempts, and strip thinking suffixes from warmup model payloads.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8931be8f04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +3117 to +3121
if _, ok := m.mixedWarmInFlight[provider]; ok {
return false
}
m.mixedWarmInFlight[provider] = struct{}{}
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check warmup TTL before starting async warmup

tryStartMixedProviderWarmup only gates on mixedWarmInFlight, so once a warmup finishes, every subsequent fill-first request within the 1-minute TTL still spawns a new goroutine, runs pickNext/model preparation, and only then exits when reserveMixedProviderWarmup rejects it. In high-QPS mixed routes this defeats the intended TTL deduplication and adds avoidable goroutine/CPU churn even though no warmup HTTP call is sent.

Useful? React with 👍 / 👎.

Pre-check auth-scoped warmup TTL before starting async warmup workers so fill-first routes skip duplicate background warmup setup without regressing prior warmup safety fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61158cfff4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

targets = append(targets, providerKey)
}
for _, providerKey := range targets {
auth, _, err := m.pickNext(ctx, providerKey, routeModel, cliproxyexecutor.Options{}, nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect pinned auth when deciding warmup candidates

The warmup selector bypasses request metadata by calling pickNext with cliproxyexecutor.Options{} here, so pinned_auth_id is ignored for warmup selection. In pinned sessions (for example websocket follow-up requests that lock execution to one auth), mixed routing cannot actually fail over to secondary providers, but this code still emits background warmup HTTP calls to those providers, creating avoidable external traffic and billable usage with no chance of serving the request.

Useful? React with 👍 / 👎.

Preserve request metadata when preselecting warmup auths so pinned sessions do not send unnecessary fallback warmup traffic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants