aiproxy

API proxy for AI services with intelligent load balancing, automatic failover, and multi-protocol translation.

Accepts requests in Anthropic, OpenAI, Responses API, or Gemini format, and routes them to any supported upstream — translating between protocols automatically.

Features

Multi-Protocol Support — 9 upstream API types: Anthropic, OpenAI, Gemini, Responses, Codex, Gemini CLI, Antigravity, Claude Code, Kiro
Automatic Protocol Translation — Clients speak one API format; the proxy translates to each upstream's native format
Weighted Round-Robin Load Balancing — Distribute requests across multiple upstream services based on configured weights
Per-Upstream Model Mapping — Each upstream can map client model names to its own model names
Model Filtering — Only route requests to upstreams that support the requested model
Circuit Breaker — Automatically mark failing upstreams as unavailable (3 consecutive failures per model), with auto-recovery after 30 minutes
Automatic Failover — Seamlessly retry failed requests on alternative upstreams
Model Fallback Chain — When all upstreams for a model are exhausted, automatically retry with a configured fallback model (e.g., claude-opus-4-6 → claude-opus-4-5 → claude-sonnet-4-5)
OAuth Authentication — Support for OAuth-based upstreams (Codex, Gemini CLI, Antigravity, Claude Code) with automatic token refresh
Auth File Round-Robin — Multiple auth files per upstream, rotated in round-robin fashion
Token Round-Robin — Multiple API tokens per upstream with round-robin rotation and automatic failover
Streaming Support — Full support for streaming responses across all protocols
Upstream Must-Stream Fallback — Force streaming-only upstreams while keeping non-stream client compatibility
Outbound Request Compression — Upstream request bodies use configurable request_compression (zstd default, gzip/br/none supported)
Config Hot-Reload — Config file changes are watched and applied without restart
Rotating File Logging — Optional file-based logging with automatic rotation by size/age

Supported API Types

API Type	Protocol	Authentication	Endpoint
`anthropic`	Anthropic Messages	API key (`token`)	`/v1/messages`
`openai`	OpenAI Chat Completions	API key (`token`)	`/v1/chat/completions`
`gemini`	Google Gemini	API key (`token`)	`/v1beta/models/*/generateContent`
`responses`	OpenAI Responses	API key (`token`)	`/v1/responses`, `/v1/responses/compact`
`codex`	ChatGPT Codex	OAuth (`auth_files`)	`chatgpt.com/backend-api/codex/responses`
`geminicli`	Gemini CLI	OAuth (`auth_files`)	`cloudcode-pa.googleapis.com`
`antigravity`	Antigravity	OAuth (`auth_files`)	`daily-cloudcode-pa.googleapis.com`
`claudecode`	Claude Code	OAuth (`auth_files`)	`api.anthropic.com`
`kiro`	Kiro (AWS CodeWhisperer/AmazonQ)	OAuth (`auth_files`)	`codewhisperer/q.amazonaws.com`

Installation

go build -o aiproxy

Configuration

Create a config.yaml file (see config.example.yaml for reference):

# Server settings
bind: "127.0.0.1"
listen: ":8080"
default_max_tokens: 4096
upstream_request_timeout: 60  # seconds to wait for upstream response headers (default: 60)

# Rotating file log (optional, omit for stdout)
log:
  file: "/var/log/aiproxy/aiproxy.log"
  max_size: 100       # MB before rotation
  max_backups: 3      # old files to keep
  max_age: 28         # days to retain
  compress: false     # gzip old files

# Upstream services
upstreams:
  # API key-based upstream (single token)
  - name: "primary"
    base_url: "https://api.anthropic.com"
    token: "sk-ant-xxx"
    weight: 10
    api_type: "anthropic"
    request_compression: "zstd"  # optional, default is zstd; set "none" to disable
    model_mappings:
      "claude-3-opus": "claude-3-opus-20240229"
      "claude-3-sonnet": "claude-3-sonnet-20240229"
    available_models:
      - "claude-3-opus"
      - "claude-3-sonnet"

  # API key-based upstream (multiple tokens, round-robin with failover)
  - name: "secondary"
    base_url: "https://api.anthropic.com"
    token:
      - "sk-ant-key1"
      - "sk-ant-key2"
    weight: 5
    api_type: "anthropic"

  # OAuth-based upstream (Gemini CLI / Antigravity / Codex / Claude Code / Kiro)
  - name: "Gemini CLI"
    weight: 5
    api_type: "geminicli"
    request_compression: "gzip"
    auth_files:
      - "/path/to/geminicli-auth1.json"
      - "/path/to/geminicli-auth2.json"
    model_mappings:
      "gemini-3-pro": "gemini-3-pro-preview"
    available_models:
      - "gemini-3-pro"

# Model fallback chain (optional)
# When all upstreams fail for a model, retry with the fallback model
model_fallback:
  "claude-opus-4-6": "claude-opus-4-5"
  "claude-opus-4-5": "claude-sonnet-4-5"
  "gpt-5.3-codex": "gpt-5.2-codex"

Compression behavior:

Upstream Accept-Encoding is fixed to gzip, zstd, br, identity (not configurable).
request_compression controls outbound request body Content-Encoding.
request_compression default is zstd; set to none/identity to send plain request bodies.

Usage

Start the proxy:

./aiproxy -config config.yaml

The proxy exposes the following client-facing endpoints (only explicitly listed paths are supported; arbitrary sub-paths are not forwarded):

# Anthropic Messages API
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model": "claude-3-opus", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello!"}]}'

# OpenAI Chat Completions API
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}]}'

# OpenAI Responses API
curl http://localhost:8080/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer any-key" \
  -d '{"model": "gpt-4o", "input": "Hello!"}'

# Gemini API (compatible with Google AI Studio SDKs)
# /v1beta/models/{model}:generateContent
# /v1/models/{model}:generateContent
curl http://localhost:8080/v1beta/models/gemini-pro:generateContent \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: any-key" \
  -d '{"contents": [{"parts": [{"text": "Hello!"}]}]}'

Config Hot-Reload

The proxy watches config.yaml for changes and applies updates without restart. Not all fields can be hot-reloaded — some require a full restart to take effect.

Hot-reloadable (applied immediately):

Field	Scope
`upstreams`	All sub-fields: `name`, `enabled`, `base_url`, `token`, `weight`, `model_mappings`, `available_models`, `api_type`, `auth_files`, `request_compression`, `http_headers`
`upstream_request_timeout`	Timeout for upstream response headers
`default_max_tokens`	Default max tokens for Gemini requests
`model_fallback`	Model fallback chain
`api-key`	Client authentication keys

Requires restart:

Field	Reason
`bind`	Server listen address is fixed after startup
`listen`	Server listen port is fixed after startup
`log`	Log output target (`file`, `max_size`, `max_backups`, `max_age`, `compress`) is configured once at startup

How It Works

Request Reception — Client sends request in any supported format (Anthropic / OpenAI / Responses / Gemini)
Model Filtering — Proxy filters upstreams that support the requested model
Load Balancing — Weighted round-robin selects the next upstream
Protocol Translation — Request is converted from the client's format to the upstream's native format
Model Mapping — Client model name is mapped to upstream-specific model name
Must-Stream Handling — If upstream requires streaming, force stream=true and convert response back
Request Forwarding — Request is sent to the selected upstream
Response Translation — Upstream response is converted back to the client's expected format
Automatic Retry — On 4xx/5xx errors, automatically tries the next upstream
Model Fallback — When all upstreams are exhausted, retry with the configured fallback model (full upstream re-rotation)
Circuit Breaking — After 3 consecutive failures per model, upstream is marked unavailable for 30 minutes

License

This project is dual-licensed:

Personal Use: GNU General Public License v3.0 (GPL-3.0)
- Free for personal projects, educational purposes, and non-commercial use
Commercial Use: Commercial License Required
- For commercial or workplace use, please contact: missdeer@gmail.com
- See LICENSE-COMMERCIAL for details

See LICENSE for the full GPL-3.0 license text.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github/workflows		.github/workflows
balancer		balancer
config		config
middleware		middleware
oauthcache		oauthcache
proxy		proxy
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
LICENSE-COMMERCIAL		LICENSE-COMMERCIAL
Makefile		Makefile
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

aiproxy

Features

Supported API Types

Installation

Configuration

Usage

Config Hot-Reload

How It Works

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

aiproxy

Features

Supported API Types

Installation

Configuration

Usage

Config Hot-Reload

How It Works

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages