API proxy for AI services with intelligent load balancing, automatic failover, and multi-protocol translation.
Accepts requests in Anthropic, OpenAI, Responses API, or Gemini format, and routes them to any supported upstream — translating between protocols automatically.
- Multi-Protocol Support — 9 upstream API types: Anthropic, OpenAI, Gemini, Responses, Codex, Gemini CLI, Antigravity, Claude Code, Kiro
- Automatic Protocol Translation — Clients speak one API format; the proxy translates to each upstream's native format
- Weighted Round-Robin Load Balancing — Distribute requests across multiple upstream services based on configured weights
- Per-Upstream Model Mapping — Each upstream can map client model names to its own model names
- Model Filtering — Only route requests to upstreams that support the requested model
- Circuit Breaker — Automatically mark failing upstreams as unavailable (3 consecutive failures per model), with auto-recovery after 30 minutes
- Automatic Failover — Seamlessly retry failed requests on alternative upstreams
- Model Fallback Chain — When all upstreams for a model are exhausted, automatically retry with a configured fallback model (e.g.,
claude-opus-4-6→claude-opus-4-5→claude-sonnet-4-5) - OAuth Authentication — Support for OAuth-based upstreams (Codex, Gemini CLI, Antigravity, Claude Code) with automatic token refresh
- Auth File Round-Robin — Multiple auth files per upstream, rotated in round-robin fashion
- Token Round-Robin — Multiple API tokens per upstream with round-robin rotation and automatic failover
- Streaming Support — Full support for streaming responses across all protocols
- Upstream Must-Stream Fallback — Force streaming-only upstreams while keeping non-stream client compatibility
- Outbound Request Compression — Upstream request bodies use configurable
request_compression(zstddefault,gzip/br/nonesupported) - Config Hot-Reload — Config file changes are watched and applied without restart
- Rotating File Logging — Optional file-based logging with automatic rotation by size/age
| API Type | Protocol | Authentication | Endpoint |
|---|---|---|---|
anthropic |
Anthropic Messages | API key (token) |
/v1/messages |
openai |
OpenAI Chat Completions | API key (token) |
/v1/chat/completions |
gemini |
Google Gemini | API key (token) |
/v1beta/models/*/generateContent |
responses |
OpenAI Responses | API key (token) |
/v1/responses, /v1/responses/compact |
codex |
ChatGPT Codex | OAuth (auth_files) |
chatgpt.com/backend-api/codex/responses |
geminicli |
Gemini CLI | OAuth (auth_files) |
cloudcode-pa.googleapis.com |
antigravity |
Antigravity | OAuth (auth_files) |
daily-cloudcode-pa.googleapis.com |
claudecode |
Claude Code | OAuth (auth_files) |
api.anthropic.com |
kiro |
Kiro (AWS CodeWhisperer/AmazonQ) | OAuth (auth_files) |
codewhisperer/q.amazonaws.com |
go build -o aiproxyCreate a config.yaml file (see config.example.yaml for reference):
# Server settings
bind: "127.0.0.1"
listen: ":8080"
default_max_tokens: 4096
upstream_request_timeout: 60 # seconds to wait for upstream response headers (default: 60)
# Rotating file log (optional, omit for stdout)
log:
file: "/var/log/aiproxy/aiproxy.log"
max_size: 100 # MB before rotation
max_backups: 3 # old files to keep
max_age: 28 # days to retain
compress: false # gzip old files
# Upstream services
upstreams:
# API key-based upstream (single token)
- name: "primary"
base_url: "https://api.anthropic.com"
token: "sk-ant-xxx"
weight: 10
api_type: "anthropic"
request_compression: "zstd" # optional, default is zstd; set "none" to disable
model_mappings:
"claude-3-opus": "claude-3-opus-20240229"
"claude-3-sonnet": "claude-3-sonnet-20240229"
available_models:
- "claude-3-opus"
- "claude-3-sonnet"
# API key-based upstream (multiple tokens, round-robin with failover)
- name: "secondary"
base_url: "https://api.anthropic.com"
token:
- "sk-ant-key1"
- "sk-ant-key2"
weight: 5
api_type: "anthropic"
# OAuth-based upstream (Gemini CLI / Antigravity / Codex / Claude Code / Kiro)
- name: "Gemini CLI"
weight: 5
api_type: "geminicli"
request_compression: "gzip"
auth_files:
- "/path/to/geminicli-auth1.json"
- "/path/to/geminicli-auth2.json"
model_mappings:
"gemini-3-pro": "gemini-3-pro-preview"
available_models:
- "gemini-3-pro"
# Model fallback chain (optional)
# When all upstreams fail for a model, retry with the fallback model
model_fallback:
"claude-opus-4-6": "claude-opus-4-5"
"claude-opus-4-5": "claude-sonnet-4-5"
"gpt-5.3-codex": "gpt-5.2-codex"Compression behavior:
- Upstream
Accept-Encodingis fixed togzip, zstd, br, identity(not configurable). request_compressioncontrols outbound request bodyContent-Encoding.request_compressiondefault iszstd; set tonone/identityto send plain request bodies.
Start the proxy:
./aiproxy -config config.yamlThe proxy exposes the following client-facing endpoints (only explicitly listed paths are supported; arbitrary sub-paths are not forwarded):
# Anthropic Messages API
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: any-key" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-3-opus", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}]}'
# OpenAI Chat Completions API
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any-key" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'
# OpenAI Responses API
curl http://localhost:8080/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer any-key" \
-d '{"model": "gpt-4o", "input": "Hello!"}'
# Gemini API (compatible with Google AI Studio SDKs)
# /v1beta/models/{model}:generateContent
# /v1/models/{model}:generateContent
curl http://localhost:8080/v1beta/models/gemini-pro:generateContent \
-H "Content-Type: application/json" \
-H "x-goog-api-key: any-key" \
-d '{"contents": [{"parts": [{"text": "Hello!"}]}]}'The proxy watches config.yaml for changes and applies updates without restart. Not all fields can be hot-reloaded — some require a full restart to take effect.
Hot-reloadable (applied immediately):
| Field | Scope |
|---|---|
upstreams |
All sub-fields: name, enabled, base_url, token, weight, model_mappings, available_models, api_type, auth_files, request_compression, http_headers |
upstream_request_timeout |
Timeout for upstream response headers |
default_max_tokens |
Default max tokens for Gemini requests |
model_fallback |
Model fallback chain |
api-key |
Client authentication keys |
Requires restart:
| Field | Reason |
|---|---|
bind |
Server listen address is fixed after startup |
listen |
Server listen port is fixed after startup |
log |
Log output target (file, max_size, max_backups, max_age, compress) is configured once at startup |
- Request Reception — Client sends request in any supported format (Anthropic / OpenAI / Responses / Gemini)
- Model Filtering — Proxy filters upstreams that support the requested model
- Load Balancing — Weighted round-robin selects the next upstream
- Protocol Translation — Request is converted from the client's format to the upstream's native format
- Model Mapping — Client model name is mapped to upstream-specific model name
- Must-Stream Handling — If upstream requires streaming, force
stream=trueand convert response back - Request Forwarding — Request is sent to the selected upstream
- Response Translation — Upstream response is converted back to the client's expected format
- Automatic Retry — On 4xx/5xx errors, automatically tries the next upstream
- Model Fallback — When all upstreams are exhausted, retry with the configured fallback model (full upstream re-rotation)
- Circuit Breaking — After 3 consecutive failures per model, upstream is marked unavailable for 30 minutes
This project is dual-licensed:
-
Personal Use: GNU General Public License v3.0 (GPL-3.0)
- Free for personal projects, educational purposes, and non-commercial use
-
Commercial Use: Commercial License Required
- For commercial or workplace use, please contact: missdeer@gmail.com
- See LICENSE-COMMERCIAL for details
See LICENSE for the full GPL-3.0 license text.