fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the proxy’s load-balancing and sticky-session routing to proactively avoid routing Responses traffic to accounts that are already above a configured budget threshold, and to rebind durable codex_session affinity when the pinned account becomes budget-pressured and a healthier option exists.
Changes:
- Adjust load balancer selection to prefer below-threshold (“budget-safe”) accounts when any exist.
- Extend sticky-session reallocation logic to include durable
codex_sessionmappings under budget pressure. - Add unit + integration regressions and OpenSpec deltas documenting the new routing behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
app/modules/proxy/load_balancer.py |
Implements budget-safe preference wrapper around account selection and enables durable codex_session rebind under budget pressure. |
tests/unit/test_proxy_load_balancer_refresh.py |
Adds unit coverage proving a budget-safe account is preferred over a pressured one. |
tests/integration/test_proxy_sticky_sessions.py |
Adds integration coverage for session_id-pinned Codex routing reallocation once the pinned account crosses the budget threshold. |
openspec/changes/reallocate-codex-session-budget-pressure/tasks.md |
Tracks implementation + verification tasks for the change set. |
openspec/changes/reallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md |
Specifies durable codex_session rebind behavior under budget pressure. |
openspec/changes/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md |
Specifies budget-safe preference behavior for Responses routes. |
openspec/changes/reallocate-codex-session-budget-pressure/proposal.md |
Documents motivation, changes, and expected impact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Works for me |
Summary
codex_sessionaffinity once the pinned account crosses the configured budget threshold and a healthier candidate exists/backend-api/codex/responsesand/v1/responsesroutes while stripping tool-only fields from compact requestsimage_generationrequestsWhy
Repeated
usage_limit_reachedfailures were still occurring even while other accounts had headroom.Live DB inspection showed the failures were being routed to accounts whose local primary usage at request time was already
97-100%. The proxy was still allowing those near-exhaustedACTIVEaccounts to win fresh selection, and durable backendcodex_sessionaffinity only reallocated under narrower conditions.Separately, recent Codex Desktop builds now send newer built-in Responses tools such as
image_generation. That exposed proxy-side compatibility issues including this concrete failure on full Responses requests:{"type":"error","status":400,"error":{"message":"Invalid request payload","type":"invalid_request_error","code":"invalid_request_error","param":"tools"}}The underlying issues were:
gpt-5.4could still carry large inlineimage_generationoutputs over upstream websocket, which made the proxy hit a local frame ceiling and fail beforeresponse.completedVerification
uv run pytest tests/integration/test_proxy_sticky_sessions.py -q -k "test_proxy_stream_sticky_threads_reallocate_by_prompt_cache_key or test_proxy_codex_session_id_pins_responses_and_compact_without_sticky_threads or test_proxy_codex_session_id_reallocates_when_pinned_budget_exhausted or test_proxy_codex_session_id_compact_first_pins_followup_stream_without_sticky_threads or test_proxy_sticky_switches_when_pinned_rate_limited"uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -q -k "prefers_budget_safe_account_when_any_exist"uv run pytest -q tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py -k "builtin_tool or builtin_tools or compact_strips_tool_fields or responses_accepts_builtin_tool_passthrough or v1_responses_forwards_builtin_tools or v1_chat_completions_rejects_builtin_tools or responses_accepts_builtin_tools"uv run pytest -q tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py -k "multi_replica_defaults or connect_responses_websocket_uses_websockets_transport"uv run pytest tests/unit/test_proxy_utils.py -q -k "resolve_stream_transport_prefers_http_for_image_generation_even_with_native_codex_headers or resolve_stream_transport_keeps_explicit_websocket_override_for_image_generation or stream_responses_auto_transport_prefers_http_for_image_generation_tool or stream_responses_auto_transport_uses_model_preference or stream_responses_auto_transport_uses_bootstrap_model_preference_when_registry_unloaded"uvx ruff check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-httpuvx ruff format --check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-httpopenspec validate --specs