-
Notifications
You must be signed in to change notification settings - Fork 132
fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mhughdo
wants to merge
4
commits into
Soju06:main
Choose a base branch
from
mhughdo:hung/reallocate-budget-safe-routing
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
128213b
fix(proxy): prefer budget-safe responses routing
mhughdo 1302166
test(proxy): align budget-safe stickiness expectations
mhughdo 98cb65a
chore(proxy): clarify budget-safe routing naming
mhughdo 0db3341
fix(proxy): support responses image-generation compatibility
mhughdo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
15 changes: 15 additions & 0 deletions
15
openspec/changes/raise-upstream-event-size-limit/proposal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Proposal: raise-upstream-event-size-limit | ||
|
|
||
| ## Why | ||
|
|
||
| Recent Codex Desktop builds can request built-in tools such as `image_generation`, which may produce large upstream Responses events. The proxy currently caps upstream SSE events and websocket message frames at 2 MiB, which is too low for legitimate image payloads and causes local websocket `1009 message too big` disconnects before `response.completed`. | ||
|
|
||
| ## What Changes | ||
|
|
||
| - Raise the default upstream Responses event/message size limit from 2 MiB to 16 MiB. | ||
| - Keep the existing configuration knob (`max_sse_event_bytes`) so operators can still override the limit. | ||
|
|
||
| ## Impact | ||
|
|
||
| - Prevents local `1009` disconnects for large but valid Responses tool outputs. | ||
| - Aligns the default limit with the common 16 MiB websocket ceiling already assumed by the proxy's `response.create` budget logic. |
7 changes: 7 additions & 0 deletions
7
...spec/changes/raise-upstream-event-size-limit/specs/responses-api-compat/spec.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| ## MODIFIED Requirements | ||
| ### Requirement: Upstream Responses event size budget | ||
| The service SHALL allow upstream Responses SSE events and upstream websocket message frames up to 16 MiB by default before treating them as oversized. | ||
|
|
||
| #### Scenario: built-in tool output exceeds the old 2 MiB limit | ||
| - **WHEN** upstream Responses traffic includes a single SSE event or websocket message frame larger than 2 MiB but not larger than 16 MiB | ||
| - **THEN** the proxy continues processing the event instead of closing the upstream websocket locally with `1009 message too big` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| ## 1. Implementation | ||
|
|
||
| - [x] 1.1 Raise the default upstream Responses event/message size limit to 16 MiB. | ||
|
|
||
| ## 2. Verification | ||
|
|
||
| - [x] 2.1 Add or update settings coverage for the new default. | ||
| - [x] 2.2 Run targeted pytest, ruff, and `openspec validate --specs`. |
18 changes: 18 additions & 0 deletions
18
openspec/changes/reallocate-codex-session-budget-pressure/proposal.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| ## Why | ||
|
|
||
| Backend Codex routes use durable `codex_session` stickiness when the client sends a `session_id` header. Today that stickiness preserves continuity too aggressively: if the pinned account remains `ACTIVE` but its short-window usage is already above the sticky reallocation threshold, selection still keeps routing the session there until upstream starts returning `usage_limit_reached`. | ||
|
|
||
| Separately, fresh selection still lets near-exhausted active accounts compete with budget-safe accounts until a hard failure occurs. In production this surfaces as repeated compact and websocket failures on accounts whose latest local primary-window usage is already in the `97-99%` range, even while other active accounts still have healthy short-window budget. | ||
|
|
||
| ## What Changes | ||
|
|
||
| - extend proactive sticky reallocation to durable backend `codex_session` mappings when the pinned account is above the configured budget threshold and a healthier candidate exists | ||
| - prefer budget-safe Responses routing candidates over already-pressured candidates whenever at least one budget-safe candidate exists | ||
| - keep existing durable `codex_session` behavior below that threshold | ||
| - add regression coverage for backend Codex responses + compact routing with `session_id` | ||
|
|
||
| ## Impact | ||
|
|
||
| - backend Codex sessions may rebind to a different account slightly earlier, before the pinned account hard-fails upstream on short-window budget exhaustion | ||
| - fresh Responses requests will prefer accounts that are still below the configured budget threshold instead of spending more attempts on near-exhausted accounts first | ||
| - OpenAI `/v1` routes still do not create durable `codex_session` mappings from `session_id`, but they do benefit from the same budget-safe fresh-selection preference |
9 changes: 9 additions & 0 deletions
9
...ges/reallocate-codex-session-budget-pressure/specs/responses-api-compat/spec.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| ## MODIFIED Requirements | ||
| ### Requirement: Responses routing prefers budget-safe accounts | ||
| When serving Responses routes, the service MUST prefer eligible accounts that are still below the configured budget threshold over eligible accounts already above that threshold. If no below-threshold candidate exists, the service MAY fall back to the pressured candidates. | ||
|
|
||
| #### Scenario: Fresh Responses request avoids a near-exhausted account | ||
| - **WHEN** `/backend-api/codex/responses`, `/backend-api/codex/responses/compact`, `/v1/responses`, or `/v1/responses/compact` selects among multiple eligible active accounts | ||
| - **AND** one candidate is above the configured budget threshold | ||
| - **AND** another candidate remains below that threshold | ||
| - **THEN** the below-threshold candidate is chosen first |
13 changes: 13 additions & 0 deletions
13
...eallocate-codex-session-budget-pressure/specs/sticky-session-operations/spec.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| ## MODIFIED Requirements | ||
| ### Requirement: Sticky sessions are explicitly typed | ||
| The system SHALL persist each sticky-session mapping with an explicit kind so durable Codex backend affinity, durable dashboard sticky-thread routing, and bounded prompt-cache affinity can be managed independently. | ||
|
|
||
| #### Scenario: Backend Codex session affinity is stored as durable | ||
| - **WHEN** a backend Codex request creates or refreshes stickiness from `session_id` | ||
| - **THEN** the stored mapping kind is `codex_session` | ||
|
|
||
| #### Scenario: Backend Codex session rebinds under budget pressure | ||
| - **WHEN** a backend Codex request resolves an existing `codex_session` mapping | ||
| - **AND** the pinned account is above the configured sticky reallocation budget threshold | ||
| - **AND** another eligible account remains below that threshold | ||
| - **THEN** selection rebinds the durable `codex_session` mapping to the healthier account before sending the request upstream |
11 changes: 11 additions & 0 deletions
11
openspec/changes/reallocate-codex-session-budget-pressure/tasks.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| ## 1. Implementation | ||
|
|
||
| - [x] 1.1 Update sticky selection so backend `codex_session` mappings reallocate when the pinned account is above the sticky budget threshold and a healthier candidate exists | ||
| - [x] 1.2 Preserve existing durable `codex_session` behavior when the pinned account is still below the threshold | ||
| - [x] 1.3 Prefer budget-safe Responses routing candidates over pressured candidates when any budget-safe option exists | ||
|
|
||
| ## 2. Verification | ||
|
|
||
| - [x] 2.1 Add integration coverage for backend Codex `session_id` routing that proves reallocation above threshold | ||
| - [x] 2.2 Run the affected sticky-session integration tests | ||
| - [x] 2.3 Add targeted selection coverage for fresh routing that proves a budget-safe account wins over a pressured one |
2 changes: 2 additions & 0 deletions
2
openspec/changes/route-image-generation-over-http/.openspec.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| schema: spec-driven | ||
| created: 2026-04-17 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.