feat(auth): warm allowlisted providers during fill-first routing by cookie223 · Pull Request #2720 · router-for-me/CLIProxyAPI

cookie223 · 2026-04-12T00:18:46Z

Summary

trigger a tiny background warmup request for allowlisted secondary providers after fill-first mixed routing selects the primary auth
start with an openai allowlist entry and deduplicate warmups per provider/auth with a short TTL so concurrent requests do not fan out
add conductor tests covering async warmup, round-robin skip, failure isolation, and TTL deduplication

Test plan

PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./sdk/cliproxy/auth/...
PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go build -o /tmp/cliproxyapi-test-build ./cmd/server && rm /tmp/cliproxyapi-test-build
PATH=\"/opt/homebrew/opt/go/bin:$PATH\" && go test ./... (currently fails in internal/runtime/executor/claude_executor_test.go on pre-existing system prompt block assertions unrelated to this change)

🤖 Generated with Claude Code

Trigger a tiny background request for allowlisted secondary providers after fill-first selects the primary auth so cooldown/reset windows can start without delaying the user request. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-12T00:18:55Z

This pull request targeted main.

The base branch has been automatically changed to dev.

gemini-code-assist

Code Review

This pull request introduces an asynchronous warmup mechanism for secondary providers when using the FillFirstSelector, specifically targeting OpenAI. It adds state to the Manager to track warmup TTLs and logic to dispatch dummy requests to ensure secondary providers are ready. Review feedback suggests addressing a potential memory leak in the TTL tracking map and optimizing goroutine creation to reduce overhead and lock contention.

gemini-code-assist · 2026-04-12T00:24:02Z

sdk/cliproxy/auth/conductor.go

+func (m *Manager) reserveMixedProviderWarmup(provider, authID string, now time.Time) bool {
+	if m == nil {
+		return false
+	}
+	provider = strings.ToLower(strings.TrimSpace(provider))
+	authID = strings.TrimSpace(authID)
+	if provider == "" || authID == "" {
+		return false
+	}
+	key := provider + "|" + authID
+	m.mixedWarmMu.Lock()
+	defer m.mixedWarmMu.Unlock()
+	if until, ok := m.mixedWarmUntil[key]; ok && until.After(now) {
+		return false
+	}
+	m.mixedWarmUntil[key] = now.Add(mixedProviderWarmupTTL)
+	return true
+}


The mixedWarmUntil map is used to track warmup TTLs but lacks a cleanup mechanism. In a long-running process with many unique credentials, this map will grow indefinitely, leading to a memory leak. Consider implementing a periodic cleanup of expired entries or using a size-limited cache with TTL support.

gemini-code-assist · 2026-04-12T00:24:02Z

sdk/cliproxy/auth/conductor.go

+	go func(requestCtx context.Context, targetProviders []string, model string) {
+		for _, providerKey := range targetProviders {
+			if err := m.warmMixedProvider(requestCtx, providerKey, model); err != nil {
+				logEntryWithRequestID(requestCtx).Debugf("mixed provider warmup skipped provider=%s model=%s error=%v", providerKey, model, err)
+			}
+		}
+	}(ctx, append([]string(nil), targets...), strings.TrimSpace(routeModel))


Spawning a goroutine for every pickNextMixed call can be inefficient, especially since warmMixedProvider immediately performs a pickNext call which acquires the manager's read lock. Under high load or during failure retry loops, this could lead to excessive goroutine creation and lock contention. Consider checking if a warmup is already in progress or recently completed for the target providers before spawning the goroutine, or at least before performing the pickNext call inside the goroutine.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d19511cc0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T00:24:39Z

sdk/cliproxy/auth/conductor.go

+func (m *Manager) warmMixedProvider(ctx context.Context, provider, routeModel string) error {
+	warmCtx := context.Background()
+	if ctx != nil {
+		warmCtx = context.WithoutCancel(ctx)


Preserve cancellation for async warmup HTTP calls

The warmup path detaches from the caller context using context.WithoutCancel, so these background HTTP calls have no cancellation or deadline signal from the originating request. If the secondary provider hangs or becomes slow to respond, warmup requests can remain blocked for a long time, and new warmups continue to be scheduled every TTL window, which can accumulate stuck goroutines/connections in production.

Useful? React with 👍 / 👎.

Prevent duplicate fill-first warmup goroutines per provider, give async warmups their own timeout, and prune expired warmup TTL entries to avoid unbounded state growth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba1337cded

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T13:44:33Z

sdk/cliproxy/auth/conductor.go

 		}
 		m.mu.Unlock()
 	}
+	m.maybeWarmOtherMixedProvidersAsync(ctx, eligibleProviders, providerKey, model)


Avoid warmup side effects during token-count routing

pickNextMixed is shared by Execute, ExecuteStream, and ExecuteCount, so calling maybeWarmOtherMixedProvidersAsync here also triggers warmup HTTP calls for token-count requests. In a fill-first mixed route (for example claude,openai), a /count_tokens request now sends an extra OpenAI chat completion in the background, which adds unexpected external traffic and billable token usage for what should be a read-only counting operation.

Useful? React with 👍 / 👎.

Restrict mixed provider warmups to execute and stream paths so count-only routing stays read-only and avoids unexpected upstream traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f45f291b25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T14:22:49Z

sdk/cliproxy/auth/conductor.go

+	for key, until := range m.mixedWarmUntil {
+		if !until.After(now) || !strings.HasPrefix(key, provider+"|") {
+			continue
+		}
+		return false


Scope warmup TTL gating to auth ID

tryStartMixedProviderWarmup currently skips new warmups if any active key matches provider|*, so a recent warmup for one credential (for example openai|authA) suppresses warmup for other credentials of the same provider (such as openai|authB) for the full TTL. In fill-first mixed routes that rely on multiple auths per provider, this leaves fallback auths cold and undermines the intended per-provider/auth dedup behavior, causing avoidable latency spikes when traffic shifts to a different auth.

Useful? React with 👍 / 👎.

Keep provider-level single-flight dedupe, but avoid suppressing warmups for other auths on the same provider while a different auth is still within TTL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abb5cf0903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T14:50:22Z

sdk/cliproxy/auth/conductor.go

+	if !m.reserveMixedProviderWarmup(provider, auth.ID, time.Now()) {
+		return nil
+	}
+	models, _ := m.preparedExecutionModels(auth, routeModel)


Avoid advancing model pool cursor during warmup

Calling preparedExecutionModels in the warmup path mutates model-pool state for OpenAI-compatible auths: executionModelCandidates increments modelPoolOffsets through nextModelPoolOffset, so each warmup consumes a rotation slot before any real fallback traffic arrives. In mixed fill-first routes that use model alias pools, this changes which upstream model the next user request is sent to (and therefore can change cost/capability behavior) even though warmup is meant to be side-effect-free.

Useful? React with 👍 / 👎.

Keep mixed-provider warmup side-effect free by peeking pooled execution models without consuming alias-pool rotation state before real traffic arrives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbd7eee326

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T16:57:21Z

sdk/cliproxy/auth/conductor.go

+	models, _ := m.preparedExecutionModelsWithoutAdvancing(auth, routeModel)
+	if len(models) == 0 {
+		return &Error{Code: "model_not_found", Message: "warmup model is unavailable"}
+	}
+	warmReq, err := buildMixedProviderWarmupRequest(auth, provider, models[0])


Strip thinking suffix from warmup model payload

The warmup path takes models[0] directly from preparedExecutionModelsWithoutAdvancing and forwards it into the raw OpenAI payload, but those model candidates intentionally preserve thinking suffixes (for example gpt-4.1(8192)). Unlike the normal Execute path (which parses/removes suffixes before upstream translation), this warmup request sends the suffixed model string as-is, which OpenAI-compatible /chat/completions endpoints can reject as an unknown model; in practice, warmup silently fails for suffix-enabled routes and the fallback provider remains cold.

Useful? React with 👍 / 👎.

luispater

The mixed-provider warmup flow is headed in the right direction, but I found one blocking correctness issue in the new background request path.

Blocking:

warmMixedProvider() currently treats any HTTP response as a successful warmup. The executor contract returns a raw *http.Response, so 401/429/5xx responses do not come back as errors here. Because the TTL reservation is recorded before the request and is not cleared on a bad status, a failed warmup suppresses follow-up warmups for that auth for the next minute. In practice, that means the feature can silently stop warming a secondary provider exactly when the provider rejects the request.

Please make non-2xx responses fail the warmup path and avoid keeping the TTL reservation for unsuccessful attempts.

Test plan:

PATH="/opt/homebrew/opt/go/bin:$PATH" go test ./sdk/cliproxy/auth/...

Treat non-2xx warmup responses as failures, clear auth-scoped warmup TTL reservations on unsuccessful attempts, and strip thinking suffixes from warmup model payloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8931be8f04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T17:31:51Z

sdk/cliproxy/auth/conductor.go

+	if _, ok := m.mixedWarmInFlight[provider]; ok {
+		return false
+	}
+	m.mixedWarmInFlight[provider] = struct{}{}
+	return true


Check warmup TTL before starting async warmup

tryStartMixedProviderWarmup only gates on mixedWarmInFlight, so once a warmup finishes, every subsequent fill-first request within the 1-minute TTL still spawns a new goroutine, runs pickNext/model preparation, and only then exits when reserveMixedProviderWarmup rejects it. In high-QPS mixed routes this defeats the intended TTL deduplication and adds avoidable goroutine/CPU churn even though no warmup HTTP call is sent.

Useful? React with 👍 / 👎.

Pre-check auth-scoped warmup TTL before starting async warmup workers so fill-first routes skip duplicate background warmup setup without regressing prior warmup safety fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61158cfff4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-12T19:34:56Z

sdk/cliproxy/auth/conductor.go

+		targets = append(targets, providerKey)
+	}
+	for _, providerKey := range targets {
+		auth, _, err := m.pickNext(ctx, providerKey, routeModel, cliproxyexecutor.Options{}, nil)


Respect pinned auth when deciding warmup candidates

The warmup selector bypasses request metadata by calling pickNext with cliproxyexecutor.Options{} here, so pinned_auth_id is ignored for warmup selection. In pinned sessions (for example websocket follow-up requests that lock execution to one auth), mixed routing cannot actually fail over to secondary providers, but this code still emits background warmup HTTP calls to those providers, creating avoidable external traffic and billable usage with no chance of serving the request.

Useful? React with 👍 / 👎.

Preserve request metadata when preselecting warmup auths so pinned sessions do not send unnecessary fallback warmup traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot changed the base branch from main to dev April 12, 2026 00:18

gemini-code-assist bot reviewed Apr 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

fix(auth): dedupe and bound mixed provider warmups

ba1337c

Prevent duplicate fill-first warmup goroutines per provider, give async warmups their own timeout, and prune expired warmup TTL entries to avoid unbounded state growth. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

fix(auth): skip mixed warmup for token counting

f45f291

Restrict mixed provider warmups to execute and stream paths so count-only routing stays read-only and avoids unexpected upstream traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

fix(auth): scope mixed warmup TTL to auth

abb5cf0

Keep provider-level single-flight dedupe, but avoid suppressing warmups for other auths on the same provider while a different auth is still within TTL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

fix(auth): avoid advancing model pool during warmup

cbd7eee

Keep mixed-provider warmup side-effect free by peeking pooled execution models without consuming alias-pool rotation state before real traffic arrives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

luispater requested changes Apr 12, 2026

View reviewed changes

fix(auth): retry mixed warmups after failed responses

8931be8

Treat non-2xx warmup responses as failures, clear auth-scoped warmup TTL reservations on unsuccessful attempts, and strip thinking suffixes from warmup model payloads. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 12, 2026

View reviewed changes

fix(auth): honor pinned auth during mixed warmup

55c00c9

Preserve request metadata when preselecting warmup auths so pinned sessions do not send unnecessary fallback warmup traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Uh oh!

Conversation

cookie223 commented Apr 12, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

luispater left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants