Overview #
AI Router Stack is a local proxy solution that enables intelligent routing of AI API requests between multiple providers. It provides automatic failover, cost optimization, and seamless provider switching without modifying application code.
The router listens on configurable ports and forwards requests to Anthropic Claude, MiniMax, or both based on routing rules. It supports four distinct routing modes, hot-swappable at runtime via simple commands.
Automatic failover between providers | Real-time mode switching | Health monitoring | Cost tracking | Parallel session support | Print-friendly documentation
Network Architecture #
The router operates as a local proxy between your applications and external AI providers. All traffic flows through localhost, ensuring data privacy while enabling advanced routing logic.
Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT APPLICATIONS │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Claude Code │ │ Cursor │ │ Custom │ │
│ │ (CLI/IDE) │ │ (IDE) │ │ Scripts │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ AI ROUTER STACK │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Port 8787 (Dynamic) │ │
│ │ Port 8771 (Anthropic) │ │
│ │ Port 8772 (MiniMax) │ │
│ │ Port 8773 (Mixed) │ │
│ │ Port 8774 (Interactive) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Routing Engine │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────────────┐ │ │
│ │ │ Mode │ │ Health │ │ Cost Tracker │ │ │
│ │ │ Manager │ │ Monitor │ │ & Analytics │ │ │
│ │ └────────────┘ └────────────┘ └────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ ANTHROPIC │ │ MINIMAX │
│ API SERVER │ │ API SERVER │
│ api.anthropic.com│ │ api.minimax.io │
└─────────────────┘ └─────────────────┘
The router maintains persistent connections to both providers and implements connection pooling for optimal performance. Health checks run every 30 seconds to detect provider availability.
The Four Routing Modes #
Each mode optimizes for different use cases. Choose based on your priority: cost savings, reliability, or quality.
3.1 Anthropic (Pure) #
Direct passthrough to Anthropic Claude. Use this when you need guaranteed Claude-only responses or during MiniMax outages.
Port: 8771
Behavior
- 100% of requests go to Anthropic
- No cost optimization
- No failover (returns error on outage)
When you need deterministic Claude responses for compliance or testing purposes.
3.2 MiniMax (Pure) #
Direct passthrough to MiniMax API. Use this for cost-sensitive operations where response quality is secondary.
Port: 8772
Behavior
- 100% of requests go to MiniMax
- Maximum cost savings (~90% cheaper)
- No failover to Claude
Batch processing, initial drafts, non-critical automation where cost is the primary factor.
3.3 Mixed (Bidirectional Fallback) #
Intelligent routing with automatic failover in both directions. Primary provider is MiniMax; falls back to Claude on failure.
Port: 8773
Behavior
| Condition | Action |
|---|---|
| Normal operation | Route to MiniMax |
| MiniMax error/timeout | Retry once, then fallback to Claude |
| Claude also fails | Return error with both error messages |
Production workloads requiring cost optimization with reliability guarantees. This is the recommended mode for most use cases.
3.4 Interactive (Intelligent) #
AI-assisted routing decision. Claude analyzes the request and determines optimal provider based on task complexity, cost, and current load.
Port: 8774
Routing Logic
Request Analysis:
- Task complexity score (1-10)
- Estimated tokens required
- Required model capabilities
- Cost sensitivity threshold
Decision Matrix:
Complexity 1-3 → MiniMax (cost efficiency)
Complexity 4-6 → Mixed mode evaluation
Complexity 7-10 → Anthropic (quality priority)
Capability mismatch → Fallback to Claude
Interactive mode adds ~50-100ms latency due to routing analysis. For latency-critical applications, use Mixed mode instead.
Runtime Mode Switching #
Switch routing modes without restarting the router. Both dynamic and fixed port modes support hot-swapping.
4.1 Dynamic Port (8787) #
Single port that accepts mode changes via chat commands. Ideal for interactive sessions and Claude Code usage.
Configuration
export ANTHROPIC_API_KEY="sk-ant-..."
export MINIMAX_API_KEY="eyJ..."
# Start router
python router.py --port 8787 --mode mixed
Supported Modes
| Mode | Flag | Description |
|---|---|---|
| anthropic | --mode anthropic | Anthropic-only routing |
| minimax | --mode minimax | MiniMax-only routing |
| mixed | --mode mixed | Bidirectional fallback |
| interactive | --mode interactive | AI-assisted routing |
4.2 Fixed Ports (8771-8774) #
Each routing mode gets a dedicated port. Applications connect to specific ports based on their routing requirements.
| Port | Mode | Use Case |
|---|---|---|
| 8771 | Anthropic | Compliance, testing, Claude-specific features |
| 8772 | MiniMax | Cost-sensitive batch operations |
| 8773 | Mixed | Production with failover |
| 8774 | Interactive | Intelligent multi-objective routing |
python router.py --all-ports starts all four ports simultaneously for maximum flexibility.
In-Chat Commands #
Control the router directly from your conversation. Prefix commands with /router or use the shorthand /rr.
Available Commands
| Command | Description | Example |
|---|---|---|
/router mode [mode] | Switch routing mode | /router mode anthropic |
/router status | Show current configuration | /router status |
/router health | Run health check | /router health |
/router stats | Display cost and usage stats | /router stats |
/router logs [n] | Show recent logs | /router logs 50 |
/router help | Show command reference | /router help |
Commands return structured JSON when called programmatically, human-readable text in interactive mode.
Status Response Example
$ /router status
{
"mode": "mixed",
"primary": "minimax",
"fallback": "anthropic",
"uptime": "2h 34m",
"requests_total": 1247,
"requests_minimax": 1089,
"requests_anthropic": 158,
"cost_savings": "67.3%",
"health": {
"minimax": "ok (23ms)",
"anthropic": "ok (145ms)"
}
}
Health Check #
Automated health monitoring ensures requests only go to healthy endpoints. Checks run every 30 seconds by default.
Health Check Endpoint
GET http://localhost:8787/health
Response
{
"status": "healthy",
"providers": {
"anthropic": {
"status": "up",
"latency_ms": 142,
"last_check": "2026-06-23T10:30:00Z"
},
"minimax": {
"status": "up",
"latency_ms": 28,
"last_check": "2026-06-23T10:30:00Z"
}
},
"current_mode": "mixed",
"router_uptime": "24h 15m"
}
Failure Thresholds
| Condition | Threshold | Action |
|---|---|---|
| Timeout | 5 consecutive | Mark provider down |
| HTTP 5xx | 3 consecutive | Mark provider degraded |
| HTTP 429 | 2 consecutive | Rate limit backoff |
| Auth failure | 1 | Mark provider down, alert |
If both providers are down, the router returns 503 Service Unavailable with retry-after header. Monitor this condition closely.
Use-Case Examples #
7.1 Basic Claude Code #
Configure Claude Code to use the router for automatic cost optimization.
# 1. Start the router in mixed mode
python router.py --port 8787 --mode mixed
# 2. Configure Claude Code environment
export ANTHROPIC_API_BASE="http://localhost:8787/v1"
export ANTHROPIC_API_KEY="sk-ant-..." # Your actual key
export MINIMAX_API_KEY="eyJ..." # Your MiniMax key
# 3. Run Claude Code normally
claude-code
# 4. Monitor routing decisions
/router status
Claude Code now automatically routes requests through MiniMax when appropriate, falling back to Claude on errors. Cost savings of 50-70% typical.
7.2 Parallel Sessions with Different Backends #
Run multiple sessions simultaneously, each using different routing strategies.
# Terminal 1: Claude Code with Anthropic-only (compliance)
python router.py --port 8771 --mode anthropic
export ANTHROPIC_API_BASE="http://localhost:8771/v1"
claude-code --project compliance-audit
# Terminal 2: Claude Code with cost optimization
python router.py --port 8773 --mode mixed
export ANTHROPIC_API_BASE="http://localhost:8773/v1"
claude-code --project batch-refactor
# Terminal 3: Claude Code with AI-assisted routing
python router.py --port 8774 --mode interactive
export ANTHROPIC_API_BASE="http://localhost:8774/v1"
claude-code --project research-analysis
7.3 Automatic Failover #
Configure automatic failover for critical production workloads.
# Start router with health monitoring
python router.py \
--port 8787 \
--mode mixed \
--health-check 15s \
--failover-threshold 3 \
--alert-webhook "https://hooks.example.com/router"
# Health check will automatically:
# 1. Detect MiniMax outage within 15s
# 2. Mark MiniMax as unhealthy
# 3. Route all traffic to Claude
# 4. Send webhook alert
# 5. Continue checking MiniMax
# 6. Auto-recover when MiniMax returns
Expect 1-3 second delay during failover. The router retries once before switching, adding approximately 500ms to the first failed request.
7.4 Smart Cost Saving #
Use Interactive mode to balance cost and quality based on task analysis.
# Configure cost sensitivity
export COST_THRESHOLD="0.50" # Max $0.50 per request
export QUALITY_FLOOR="medium" # Minimum acceptable quality
# Start with Interactive routing
python router.py \
--port 8787 \
--mode interactive \
--cost-ceiling 10.00 \ # Hard cap: $10/hour
--prefer-minimax 0.8 # 80% preference for MiniMax
# Example routing decisions:
# - "Write unit tests" → MiniMax (cost: $0.02)
# - "Design system architecture" → Anthropic (quality needed)
# - "Debug authentication bug" → Mixed (balanced)
Cost Tracking
$ /router stats
Session Statistics:
Total Requests: 1,247
MiniMax Requests: 892 (71.5%)
Claude Requests: 355 (28.5%)
MiniMax Cost: $2.34
Claude Cost: $18.72
Original Claude: $64.18
Total Savings: 67.2% ($43.12 saved)
Average Latency: 89ms (MiniMax: 31ms, Claude: 156ms)
Quick Troubleshooting #
Common issues and their solutions. For extended debugging, see Health Check and In-Chat Commands.
| Symptom | Cause | Solution |
|---|---|---|
| Connection refused (8787) | Router not running | python router.py --port 8787 |
| All requests fail | Both providers down | /router health to diagnose |
| 401 Unauthorized | Invalid API key | Verify ANTHROPIC_API_KEY and MINIMAX_API_KEY |
| High latency | Network or provider issue | Check /router status for latencies |
| Mode not switching | Invalid mode name | Use: anthropic, minimax, mixed, interactive |
| Cost not saving | Stuck in Anthropic-only mode | /router mode mixed |
| WebSocket errors | Streaming not supported on port | Use port 8787 for streaming |
Run with --debug flag to enable verbose logging: python router.py --port 8787 --debug
Log Analysis
# View recent logs
/router logs 100
# Filter by provider
grep "minimax" ~/.ai-router/logs/router.log
# Find errors
grep -E "(ERROR|FATAL)" ~/.ai-router/logs/router.log
Hardening & Resilience #
Triple defense strategy ensures reliability for production deployments.
Defense Layer 1: Transport Security
- TLS 1.3 for all external connections
- Certificate pinning for provider endpoints
- Local traffic on localhost only (no exposure)
Defense Layer 2: Request Integrity
- Request/response validation
- Schema enforcement for API compatibility
- Timeout per-request: 30s default, configurable
Defense Layer 3: Operational Continuity
- Automatic failover between providers
- Circuit breaker pattern (open after 5 failures)
- Graceful degradation to error responses
- Webhook alerts for critical failures
python router.py \
--port 8787 \
--mode mixed \
--health-check 15s \
--timeout 30 \
--circuit-breaker 5 \
--alert-webhook "YOUR_WEBHOOK_URL" \
--log-level INFO
Environment Variables #
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY | Yes | - | Anthropic API key (sk-ant-...) |
MINIMAX_API_KEY | Yes | - | MiniMax API key (eyJ...) |
ANTHROPIC_API_BASE | No | https://api.anthropic.com | Anthropic endpoint |
MINIMAX_API_BASE | No | https://api.minimax.io | MiniMax endpoint |
ROUTER_PORT | No | 8787 | Default listening port |
ROUTER_MODE | No | mixed | Default routing mode |
ROUTER_TIMEOUT | No | 30 | Request timeout (seconds) |
ROUTER_LOG_LEVEL | No | INFO | DEBUG, INFO, WARN, ERROR |
COST_CEILING | No | unlimited | Max cost per hour ($) |
ALERT_WEBHOOK | No | - | Webhook URL for alerts |
System Requirements #
Minimum Requirements
- Python 3.9+
- 2 GB RAM
- 100 MB disk space
- Internet connection (for API access)
Supported Operating Systems
- Linux (Ubuntu 20.04+, Debian 11+)
- macOS 12+ (Monterey and later)
- Windows 10/11 with WSL2
Python Dependencies
# All dependencies are embedded - no installation needed
# The router is distributed as a single executable
Valid Anthropic API key (supports Claude models) and MiniMax API key. Trial keys work for testing.
Do-Not List #
Critical restrictions and prohibited use cases.
1. Do not expose router ports to the internet (localhost only)
2. Do not hardcode API keys in configuration files committed to version control
3. Do not use in production without understanding the failover behavior
4. Do not ignore health check failures - they indicate real problems
5. Do not disable TLS verification even in development
6. Do not exceed rate limits - both providers will ban offending IPs
7. Do not use for malicious purposes, content generation violations, or illegal activities
Rate Limits
| Provider | Requests/min | Tokens/min |
|---|---|---|
| Anthropic | 50 (standard) | 100,000 |
| MiniMax | 100 | 200,000 |