AI Router Proxy - Operational Guide

Overview #

AI Router Stack is a local proxy solution that enables intelligent routing of AI API requests between multiple providers. It provides automatic failover, cost optimization, and seamless provider switching without modifying application code.

The router listens on configurable ports and forwards requests to Anthropic Claude, MiniMax, or both based on routing rules. It supports four distinct routing modes, hot-swappable at runtime via simple commands.

Key Features

Automatic failover between providers | Real-time mode switching | Health monitoring | Cost tracking | Parallel session support | Print-friendly documentation

Network Architecture #

The router operates as a local proxy between your applications and external AI providers. All traffic flows through localhost, ensuring data privacy while enabling advanced routing logic.

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        CLIENT APPLICATIONS                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
│  │ Claude Code  │  │   Cursor    │  │    Custom    │           │
│  │  (CLI/IDE)  │  │   (IDE)     │  │   Scripts    │           │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘           │
└─────────┼─────────────────┼─────────────────┼────────────────────┘
          │                 │                 │
          ▼                 ▼                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                      AI ROUTER STACK                             │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    Port 8787 (Dynamic)                    │   │
│  │                    Port 8771 (Anthropic)                 │   │
│  │                    Port 8772 (MiniMax)                   │   │
│  │                    Port 8773 (Mixed)                     │   │
│  │                    Port 8774 (Interactive)               │   │
│  └──────────────────────────────────────────────────────────┘   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    Routing Engine                         │   │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────────────┐  │   │
│  │  │  Mode      │  │  Health    │  │  Cost Tracker      │  │   │
│  │  │  Manager   │  │  Monitor   │  │  & Analytics       │  │   │
│  │  └────────────┘  └────────────┘  └────────────────────┘  │   │
│  └──────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
          │                                    │
          ▼                                    ▼
┌─────────────────┐                  ┌─────────────────┐
│    ANTHROPIC    │                  │     MINIMAX     │
│   API SERVER    │                  │    API SERVER   │
│ api.anthropic.com│                │ api.minimax.io  │
└─────────────────┘                  └─────────────────┘

The router maintains persistent connections to both providers and implements connection pooling for optimal performance. Health checks run every 30 seconds to detect provider availability.

The Four Routing Modes #

Each mode optimizes for different use cases. Choose based on your priority: cost savings, reliability, or quality.

3.1 Anthropic (Pure) #

Direct passthrough to Anthropic Claude. Use this when you need guaranteed Claude-only responses or during MiniMax outages.

Port: 8771

Behavior

  • 100% of requests go to Anthropic
  • No cost optimization
  • No failover (returns error on outage)
Use Case

When you need deterministic Claude responses for compliance or testing purposes.

3.2 MiniMax (Pure) #

Direct passthrough to MiniMax API. Use this for cost-sensitive operations where response quality is secondary.

Port: 8772

Behavior

  • 100% of requests go to MiniMax
  • Maximum cost savings (~90% cheaper)
  • No failover to Claude
Use Case

Batch processing, initial drafts, non-critical automation where cost is the primary factor.

3.3 Mixed (Bidirectional Fallback) #

Intelligent routing with automatic failover in both directions. Primary provider is MiniMax; falls back to Claude on failure.

Port: 8773

Behavior

ConditionAction
Normal operationRoute to MiniMax
MiniMax error/timeoutRetry once, then fallback to Claude
Claude also failsReturn error with both error messages
Best For

Production workloads requiring cost optimization with reliability guarantees. This is the recommended mode for most use cases.

3.4 Interactive (Intelligent) #

AI-assisted routing decision. Claude analyzes the request and determines optimal provider based on task complexity, cost, and current load.

Port: 8774

Routing Logic

Request Analysis:
  - Task complexity score (1-10)
  - Estimated tokens required
  - Required model capabilities
  - Cost sensitivity threshold

Decision Matrix:
  Complexity 1-3 → MiniMax (cost efficiency)
  Complexity 4-6 → Mixed mode evaluation
  Complexity 7-10 → Anthropic (quality priority)
  Capability mismatch → Fallback to Claude
Note

Interactive mode adds ~50-100ms latency due to routing analysis. For latency-critical applications, use Mixed mode instead.

Runtime Mode Switching #

Switch routing modes without restarting the router. Both dynamic and fixed port modes support hot-swapping.

4.1 Dynamic Port (8787) #

Single port that accepts mode changes via chat commands. Ideal for interactive sessions and Claude Code usage.

Configuration

export ANTHROPIC_API_KEY="sk-ant-..."
export MINIMAX_API_KEY="eyJ..."

# Start router
python router.py --port 8787 --mode mixed

Supported Modes

ModeFlagDescription
anthropic--mode anthropicAnthropic-only routing
minimax--mode minimaxMiniMax-only routing
mixed--mode mixedBidirectional fallback
interactive--mode interactiveAI-assisted routing

4.2 Fixed Ports (8771-8774) #

Each routing mode gets a dedicated port. Applications connect to specific ports based on their routing requirements.

PortModeUse Case
8771AnthropicCompliance, testing, Claude-specific features
8772MiniMaxCost-sensitive batch operations
8773MixedProduction with failover
8774InteractiveIntelligent multi-objective routing
Startup Command

python router.py --all-ports starts all four ports simultaneously for maximum flexibility.

In-Chat Commands #

Control the router directly from your conversation. Prefix commands with /router or use the shorthand /rr.

Available Commands

CommandDescriptionExample
/router mode [mode]Switch routing mode/router mode anthropic
/router statusShow current configuration/router status
/router healthRun health check/router health
/router statsDisplay cost and usage stats/router stats
/router logs [n]Show recent logs/router logs 50
/router helpShow command reference/router help
Response Format

Commands return structured JSON when called programmatically, human-readable text in interactive mode.

Status Response Example

$ /router status
{
  "mode": "mixed",
  "primary": "minimax",
  "fallback": "anthropic",
  "uptime": "2h 34m",
  "requests_total": 1247,
  "requests_minimax": 1089,
  "requests_anthropic": 158,
  "cost_savings": "67.3%",
  "health": {
    "minimax": "ok (23ms)",
    "anthropic": "ok (145ms)"
  }
}

Health Check #

Automated health monitoring ensures requests only go to healthy endpoints. Checks run every 30 seconds by default.

Health Check Endpoint

GET http://localhost:8787/health

Response

{
  "status": "healthy",
  "providers": {
    "anthropic": {
      "status": "up",
      "latency_ms": 142,
      "last_check": "2026-06-23T10:30:00Z"
    },
    "minimax": {
      "status": "up",
      "latency_ms": 28,
      "last_check": "2026-06-23T10:30:00Z"
    }
  },
  "current_mode": "mixed",
  "router_uptime": "24h 15m"
}

Failure Thresholds

ConditionThresholdAction
Timeout5 consecutiveMark provider down
HTTP 5xx3 consecutiveMark provider degraded
HTTP 4292 consecutiveRate limit backoff
Auth failure1Mark provider down, alert
Critical

If both providers are down, the router returns 503 Service Unavailable with retry-after header. Monitor this condition closely.

Use-Case Examples #

7.1 Basic Claude Code #

Configure Claude Code to use the router for automatic cost optimization.

# 1. Start the router in mixed mode
python router.py --port 8787 --mode mixed

# 2. Configure Claude Code environment
export ANTHROPIC_API_BASE="http://localhost:8787/v1"
export ANTHROPIC_API_KEY="sk-ant-..."  # Your actual key
export MINIMAX_API_KEY="eyJ..."        # Your MiniMax key

# 3. Run Claude Code normally
claude-code

# 4. Monitor routing decisions
/router status
Result

Claude Code now automatically routes requests through MiniMax when appropriate, falling back to Claude on errors. Cost savings of 50-70% typical.

7.2 Parallel Sessions with Different Backends #

Run multiple sessions simultaneously, each using different routing strategies.

# Terminal 1: Claude Code with Anthropic-only (compliance)
python router.py --port 8771 --mode anthropic
export ANTHROPIC_API_BASE="http://localhost:8771/v1"
claude-code --project compliance-audit

# Terminal 2: Claude Code with cost optimization
python router.py --port 8773 --mode mixed
export ANTHROPIC_API_BASE="http://localhost:8773/v1"
claude-code --project batch-refactor

# Terminal 3: Claude Code with AI-assisted routing
python router.py --port 8774 --mode interactive
export ANTHROPIC_API_BASE="http://localhost:8774/v1"
claude-code --project research-analysis

7.3 Automatic Failover #

Configure automatic failover for critical production workloads.

# Start router with health monitoring
python router.py \
  --port 8787 \
  --mode mixed \
  --health-check 15s \
  --failover-threshold 3 \
  --alert-webhook "https://hooks.example.com/router"

# Health check will automatically:
# 1. Detect MiniMax outage within 15s
# 2. Mark MiniMax as unhealthy
# 3. Route all traffic to Claude
# 4. Send webhook alert
# 5. Continue checking MiniMax
# 6. Auto-recover when MiniMax returns
Failover Latency

Expect 1-3 second delay during failover. The router retries once before switching, adding approximately 500ms to the first failed request.

7.4 Smart Cost Saving #

Use Interactive mode to balance cost and quality based on task analysis.

# Configure cost sensitivity
export COST_THRESHOLD="0.50"  # Max $0.50 per request
export QUALITY_FLOOR="medium" # Minimum acceptable quality

# Start with Interactive routing
python router.py \
  --port 8787 \
  --mode interactive \
  --cost-ceiling 10.00 \      # Hard cap: $10/hour
  --prefer-minimax 0.8        # 80% preference for MiniMax

# Example routing decisions:
# - "Write unit tests" → MiniMax (cost: $0.02)
# - "Design system architecture" → Anthropic (quality needed)
# - "Debug authentication bug" → Mixed (balanced)

Cost Tracking

$ /router stats
Session Statistics:
  Total Requests:    1,247
  MiniMax Requests:   892 (71.5%)
  Claude Requests:    355 (28.5%)
  
  MiniMax Cost:       $2.34
  Claude Cost:        $18.72
  Original Claude:    $64.18
  
  Total Savings:     67.2% ($43.12 saved)
  
  Average Latency:    89ms (MiniMax: 31ms, Claude: 156ms)

Quick Troubleshooting #

Common issues and their solutions. For extended debugging, see Health Check and In-Chat Commands.

SymptomCauseSolution
Connection refused (8787)Router not runningpython router.py --port 8787
All requests failBoth providers down/router health to diagnose
401 UnauthorizedInvalid API keyVerify ANTHROPIC_API_KEY and MINIMAX_API_KEY
High latencyNetwork or provider issueCheck /router status for latencies
Mode not switchingInvalid mode nameUse: anthropic, minimax, mixed, interactive
Cost not savingStuck in Anthropic-only mode/router mode mixed
WebSocket errorsStreaming not supported on portUse port 8787 for streaming
Debug Mode

Run with --debug flag to enable verbose logging: python router.py --port 8787 --debug

Log Analysis

# View recent logs
/router logs 100

# Filter by provider
grep "minimax" ~/.ai-router/logs/router.log

# Find errors
grep -E "(ERROR|FATAL)" ~/.ai-router/logs/router.log

Hardening & Resilience #

Triple defense strategy ensures reliability for production deployments.

Defense Layer 1: Transport Security

  • TLS 1.3 for all external connections
  • Certificate pinning for provider endpoints
  • Local traffic on localhost only (no exposure)

Defense Layer 2: Request Integrity

  • Request/response validation
  • Schema enforcement for API compatibility
  • Timeout per-request: 30s default, configurable

Defense Layer 3: Operational Continuity

  • Automatic failover between providers
  • Circuit breaker pattern (open after 5 failures)
  • Graceful degradation to error responses
  • Webhook alerts for critical failures
Recommended Production Configuration
python router.py \
  --port 8787 \
  --mode mixed \
  --health-check 15s \
  --timeout 30 \
  --circuit-breaker 5 \
  --alert-webhook "YOUR_WEBHOOK_URL" \
  --log-level INFO

Environment Variables #

VariableRequiredDefaultDescription
ANTHROPIC_API_KEYYes-Anthropic API key (sk-ant-...)
MINIMAX_API_KEYYes-MiniMax API key (eyJ...)
ANTHROPIC_API_BASENohttps://api.anthropic.comAnthropic endpoint
MINIMAX_API_BASENohttps://api.minimax.ioMiniMax endpoint
ROUTER_PORTNo8787Default listening port
ROUTER_MODENomixedDefault routing mode
ROUTER_TIMEOUTNo30Request timeout (seconds)
ROUTER_LOG_LEVELNoINFODEBUG, INFO, WARN, ERROR
COST_CEILINGNounlimitedMax cost per hour ($)
ALERT_WEBHOOKNo-Webhook URL for alerts

System Requirements #

Minimum Requirements

  • Python 3.9+
  • 2 GB RAM
  • 100 MB disk space
  • Internet connection (for API access)

Supported Operating Systems

  • Linux (Ubuntu 20.04+, Debian 11+)
  • macOS 12+ (Monterey and later)
  • Windows 10/11 with WSL2

Python Dependencies

# All dependencies are embedded - no installation needed
# The router is distributed as a single executable
API Keys Required

Valid Anthropic API key (supports Claude models) and MiniMax API key. Trial keys work for testing.

Do-Not List #

Critical restrictions and prohibited use cases.

Never Do These

1. Do not expose router ports to the internet (localhost only)
2. Do not hardcode API keys in configuration files committed to version control
3. Do not use in production without understanding the failover behavior
4. Do not ignore health check failures - they indicate real problems
5. Do not disable TLS verification even in development
6. Do not exceed rate limits - both providers will ban offending IPs
7. Do not use for malicious purposes, content generation violations, or illegal activities

Rate Limits

ProviderRequests/minTokens/min
Anthropic50 (standard)100,000
MiniMax100200,000