perf(executor): increase HTTP transport MaxIdleConnsPerHost#2707
perf(executor): increase HTTP transport MaxIdleConnsPerHost#2707Sotatek-HieuBui2 wants to merge 1 commit intorouter-for-me:devfrom
Conversation
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request introduces HTTP transport tuning for the proxy-aware HTTP client, specifically configuring connection limits and enabling HTTP/2. The feedback identifies a potential performance regression if clients are not reused, as each now receives a unique transport instance. Additionally, there are concerns regarding the unconditional overriding of settings on user-provided transports and a race condition in the tests caused by modifying the global http.DefaultTransport while running parallel tests.
| httpClient := &http.Client{ | ||
| Transport: cloneDefaultTransport(), | ||
| } |
There was a problem hiding this comment.
By using cloneDefaultTransport(), each call to NewProxyAwareHTTPClient now creates a new http.Transport with its own connection pool. In the previous implementation, the global http.DefaultTransport was used when no proxy was configured, allowing connection reuse across different clients. This change is beneficial for performance only if the returned *http.Client is long-lived and reused. If clients are created per-request, this will lead to a performance regression due to the lack of connection pooling and potential socket exhaustion.
| if transport, ok := rt.(*http.Transport); ok { | ||
| cloned := transport.Clone() | ||
| applyTransportTuning(cloned) | ||
| httpClient.Transport = cloned |
There was a problem hiding this comment.
When a custom RoundTripper is provided via context, this logic clones it (if it's a *http.Transport) and overrides its settings with the new tuning parameters. While cloning is good for isolation, unconditionally overriding settings like MaxIdleConns or ForceAttemptHTTP2 on a user-provided transport might be unexpected. If the caller provided a specifically configured transport, their settings will be lost.
| orig := http.DefaultTransport | ||
| defer func() { http.DefaultTransport = orig }() | ||
|
|
||
| http.DefaultTransport = nil | ||
| t.Cleanup(func() { http.DefaultTransport = orig }) |
There was a problem hiding this comment.
Modifying the global http.DefaultTransport in a test can cause race conditions and flaky failures when other tests in the same process are running in parallel. Since other tests in this file (e.g., TestNewProxyAwareHTTPClient_DefaultTransportTuned) use t.Parallel(), they might observe http.DefaultTransport as nil while this test is running, leading to non-deterministic behavior. Additionally, using both defer and t.Cleanup to restore the same global variable is redundant; t.Cleanup is generally preferred in modern Go tests.
| orig := http.DefaultTransport | |
| defer func() { http.DefaultTransport = orig }() | |
| http.DefaultTransport = nil | |
| t.Cleanup(func() { http.DefaultTransport = orig }) | |
| orig := http.DefaultTransport | |
| t.Cleanup(func() { http.DefaultTransport = orig }) | |
| http.DefaultTransport = nil |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 79d69e8802
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| httpClient := &http.Client{ | ||
| Transport: cloneDefaultTransport(), | ||
| } |
There was a problem hiding this comment.
Preserve shared default transport for no-proxy clients
When neither auth/cfg proxy nor context round-tripper is set, this now assigns Transport: cloneDefaultTransport() to every new client. I checked executor call sites (for example internal/runtime/executor/gemini_executor.go) and they construct a fresh client per request, so this change creates a separate connection pool each time and prevents cross-request keep-alive reuse that previously came from Transport == nil (shared http.DefaultTransport). That can materially increase dial/TLS overhead and connection churn in production.
Useful? React with 👍 / 👎.
| if transport, ok := rt.(*http.Transport); ok { | ||
| cloned := transport.Clone() | ||
| applyTransportTuning(cloned) | ||
| httpClient.Transport = cloned |
There was a problem hiding this comment.
Reuse context transport instead of cloning per request
The context round-tripper path now clones *http.Transport before use, which defeats the shared-transport caching strategy in sdk/cliproxy/rtprovider.go (cached per proxy URL). Since this helper is called repeatedly, cloning here fragments pools and loses socket reuse across executions for the same auth/proxy, increasing outbound connection pressure and latency under load.
Useful? React with 👍 / 👎.
79d69e8 to
76c5a2c
Compare
|
Thank you for the detailed feedback! I've updated the PR to address the performance and safety concerns:
I've force-pushed a single clean commit with these improvements. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76c5a2c047
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Returns: | ||
| // - *http.Client: An HTTP client with configured proxy or transport | ||
| func NewProxyAwareHTTPClient(ctx context.Context, cfg *config.Config, auth *cliproxyauth.Auth, timeout time.Duration) *http.Client { | ||
| GlobalTuneDefaultTransport() |
There was a problem hiding this comment.
Avoid mutating default transport in request path
Calling GlobalTuneDefaultTransport() here makes the first executor request mutate http.DefaultTransport lazily, but this repository already performs startup HTTP traffic before executor requests (for example registry.StartModelsUpdater in cmd/server/main.go and fetchModelsFromRemote in internal/registry/model_updater.go using &http.Client{Timeout: ...} with nil Transport). Reconfiguring a transport after it has been used is unsafe and can race with in-flight/default-client requests, leading to nondeterministic behavior under startup load; the tuning should happen during early init (before any HTTP use) or on dedicated cloned transports only.
Useful? React with 👍 / 👎.
76c5a2c to
8087c74
Compare
|
Updated the approach to be even safer and more robust:
Force-pushed the updated commit. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8087c74106
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| // Default fallback: use shared tuned transport | ||
| httpClient.Transport = defaultTunedTransport |
There was a problem hiding this comment.
Preserve nil-transport fallback for Antigravity client
Setting a default transport unconditionally here changes the no proxy + no context round-tripper path from Transport == nil to always non-nil, which breaks the sentinel logic in internal/runtime/executor/antigravity_executor.go (newAntigravityHTTPClient) that uses nil to select the shared antigravityTransport singleton. After this change, that branch is never taken and the Antigravity path clones a new transport per call, fragmenting connection pools and increasing connection churn/latency under load.
Useful? React with 👍 / 👎.
Increased MaxIdleConns from 100 to 200 and MaxIdleConnsPerHost from 2 to 20 to improve connection reuse and reduce latency for HTTP-based executors like Gemini CLI, Qwen, and others that share the proxy client. This also enables HTTP/2 by default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8087c74 to
6194488
Compare
|
Finalized the implementation with the most robust and idiomatic approach:
This approach combines high performance with the highest level of safety and compatibility across the repository. |
luispater
left a comment
There was a problem hiding this comment.
Summary:
This tuning currently escapes the executor helper and changes process-wide HTTP transport defaults.
Blocking findings:
internal/runtime/executor/helps/proxy_helpers.go:15callsapplyTransportTuning(http.DefaultTransport)during package init. That change does not stay scoped toNewProxyAwareHTTPClient.sdk/proxyutil/proxy.go:71cloneshttp.DefaultTransport, so oncehelpsis imported, every laterproxyutil.BuildHTTPTransport(...)call inherits the new200/20pool settings as well.- I verified this in a detached PR worktree:
BuildHTTPTransport("direct")reports100/0without importinghelps, and200/20after importing it. So auth/management callers such asinternal/util/proxy.goandsdk/cliproxy/rtprovider.goare retuned too.
Test plan:
go test ./internal/runtime/executor/helps ./sdk/proxyutil ./sdk/cliproxy -count=1go build -o test-output ./cmd/server && rm -f test-output
Requested change:
Please keep the tuning local to the executor-specific client/transport construction instead of mutating http.DefaultTransport in package init, and add a regression test that proves unrelated proxyutil callers keep their previous defaults.
Increased MaxIdleConns from 100 to 200 and MaxIdleConnsPerHost from 2 to 20 to improve connection reuse and reduce latency for HTTP-based executors like Gemini CLI, Qwen, and others that share the proxy client. This also enables HTTP/2 by default.