Skip to content

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421

Open
mhughdo wants to merge 4 commits intoSoju06:mainfrom
mhughdo:hung/reallocate-budget-safe-routing
Open

fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools")#421
mhughdo wants to merge 4 commits intoSoju06:mainfrom
mhughdo:hung/reallocate-budget-safe-routing

Conversation

@mhughdo
Copy link
Copy Markdown
Contributor

@mhughdo mhughdo commented Apr 16, 2026

Summary

  • reallocate durable backend codex_session affinity once the pinned account crosses the configured budget threshold and a healthier candidate exists
  • prefer budget-safe Responses routing candidates over already-pressured candidates whenever any safe candidate exists
  • allow built-in Responses tools on full /backend-api/codex/responses and /v1/responses routes while stripping tool-only fields from compact requests
  • raise the default upstream Responses event/message limit to 16 MiB and make auto transport prefer upstream HTTP for image_generation requests

Why

Repeated usage_limit_reached failures were still occurring even while other accounts had headroom.

Live DB inspection showed the failures were being routed to accounts whose local primary usage at request time was already 97-100%. The proxy was still allowing those near-exhausted ACTIVE accounts to win fresh selection, and durable backend codex_session affinity only reallocated under narrower conditions.

Separately, recent Codex Desktop builds now send newer built-in Responses tools such as image_generation. That exposed proxy-side compatibility issues including this concrete failure on full Responses requests:

{"type":"error","status":400,"error":{"message":"Invalid request payload","type":"invalid_request_error","code":"invalid_request_error","param":"tools"}}

The underlying issues were:

  • full Responses and compact requests did not sanitize tool payloads correctly for the different upstream contracts
  • websocket-preferred models such as gpt-5.4 could still carry large inline image_generation outputs over upstream websocket, which made the proxy hit a local frame ceiling and fail before response.completed

Verification

  • uv run pytest tests/integration/test_proxy_sticky_sessions.py -q -k "test_proxy_stream_sticky_threads_reallocate_by_prompt_cache_key or test_proxy_codex_session_id_pins_responses_and_compact_without_sticky_threads or test_proxy_codex_session_id_reallocates_when_pinned_budget_exhausted or test_proxy_codex_session_id_compact_first_pins_followup_stream_without_sticky_threads or test_proxy_sticky_switches_when_pinned_rate_limited"
  • uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -q -k "prefers_budget_safe_account_when_any_exist"
  • uv run pytest -q tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py -k "builtin_tool or builtin_tools or compact_strips_tool_fields or responses_accepts_builtin_tool_passthrough or v1_responses_forwards_builtin_tools or v1_chat_completions_rejects_builtin_tools or responses_accepts_builtin_tools"
  • uv run pytest -q tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py -k "multi_replica_defaults or connect_responses_websocket_uses_websockets_transport"
  • uv run pytest tests/unit/test_proxy_utils.py -q -k "resolve_stream_transport_prefers_http_for_image_generation_even_with_native_codex_headers or resolve_stream_transport_keeps_explicit_websocket_override_for_image_generation or stream_responses_auto_transport_prefers_http_for_image_generation_tool or stream_responses_auto_transport_uses_model_preference or stream_responses_auto_transport_uses_bootstrap_model_preference_when_registry_unloaded"
  • uvx ruff check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
  • uvx ruff format --check app/modules/proxy/load_balancer.py tests/integration/test_proxy_sticky_sessions.py tests/unit/test_proxy_load_balancer_refresh.py app/core/openai/requests.py app/core/openai/v1_requests.py app/core/config/settings.py app/core/clients/proxy.py tests/unit/test_openai_requests.py tests/integration/test_openai_compat_features.py tests/integration/test_proxy_compact.py tests/unit/test_settings_multi_replica.py tests/unit/test_proxy_websocket_client.py tests/unit/test_proxy_utils.py openspec/changes/reallocate-codex-session-budget-pressure openspec/changes/support-responses-builtin-tools openspec/changes/raise-upstream-event-size-limit openspec/changes/route-image-generation-over-http
  • openspec validate --specs

Copilot AI review requested due to automatic review settings April 16, 2026 14:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the proxy’s load-balancing and sticky-session routing to proactively avoid routing Responses traffic to accounts that are already above a configured budget threshold, and to rebind durable codex_session affinity when the pinned account becomes budget-pressured and a healthier option exists.

Changes:

  • Adjust load balancer selection to prefer below-threshold (“budget-safe”) accounts when any exist.
  • Extend sticky-session reallocation logic to include durable codex_session mappings under budget pressure.
  • Add unit + integration regressions and OpenSpec deltas documenting the new routing behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
app/modules/proxy/load_balancer.py Implements budget-safe preference wrapper around account selection and enables durable codex_session rebind under budget pressure.
tests/unit/test_proxy_load_balancer_refresh.py Adds unit coverage proving a budget-safe account is preferred over a pressured one.
tests/integration/test_proxy_sticky_sessions.py Adds integration coverage for session_id-pinned Codex routing reallocation once the pinned account crosses the budget threshold.
openspec/changes/reallocate-codex-session-budget-pressure/tasks.md Tracks implementation + verification tasks for the change set.
openspec/changes/reallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md Specifies durable codex_session rebind behavior under budget pressure.
openspec/changes/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md Specifies budget-safe preference behavior for Responses routes.
openspec/changes/reallocate-codex-session-budget-pressure/proposal.md Documents motivation, changes, and expected impact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/modules/proxy/load_balancer.py Outdated
Comment thread app/modules/proxy/load_balancer.py
@mhughdo mhughdo changed the title fix(proxy): prefer budget-safe responses routing fix(proxy): prefer budget-safe routing and support image-generation compatibility Apr 17, 2026
@mhughdo mhughdo requested a review from Copilot April 17, 2026 04:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mhughdo mhughdo changed the title fix(proxy): prefer budget-safe routing and support image-generation compatibility fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") Apr 17, 2026
@cani1989
Copy link
Copy Markdown

Works for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants