-
Notifications
You must be signed in to change notification settings - Fork 1.1k
litellm integration
- Overview
- Quick Start
- Architecture
- Basic Setup
- Production Setup
- Model Configuration
- Multi-Tenant Configuration
- Testing & Validation
- Troubleshooting
- Advanced Topics
LiteLLM acts as a universal LLM gateway that enables Claude Code to route requests to multiple non-Anthropic models while maintaining the same interface. This integration provides cost optimization, fallback capabilities, and multi-tenant support for enterprise deployments.
- Multi-Provider Support: Route to OpenAI, Azure, OpenRouter, Bedrock, Ollama, and 100+ providers
- Cost Optimization: Automatically use cheaper models for appropriate tasks
- High Availability: Fallback chains ensure resilience
- Multi-Tenancy: Isolated configurations and budgets per team
- Enterprise Features: Monitoring, audit logging, and compliance
# 1. Navigate to examples directory
cd /examples/litellm
# 2. Start basic LiteLLM proxy
docker-compose -f docker-compose.basic.yml up -d
# 3. Configure Claude Code
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_AUTH_TOKEN=sk-1234567890
# 4. Test with Claude Code
claude --model test-model "Hello, world!"
# Check health
curl -s http://localhost:4000/health \
-H "Authorization: Bearer sk-1234567890" | jq .
# List available models
curl -s http://localhost:4000/models \
-H "Authorization: Bearer sk-1234567890" | jq .
┌──────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Claude Code │─────▶│ LiteLLM │─────▶│ LLM Providers │
│ Clients │ │ Proxy │ │ (OpenAI, etc) │
└──────────────┘ └──────────────┘ └─────────────────┘
│
┌────────┴────────┐
│ │
┌─────▼────┐ ┌──────▼──────┐
│ Redis │ │ PostgreSQL │
│ Cache │ │ Database │
└──────────┘ └─────────────┘
- Claude Code sends request to LiteLLM endpoint
- LiteLLM authenticates and validates the request
- Router selects appropriate model based on configuration
- Request forwarded to selected provider
- Response streamed back to Claude Code
- Usage tracked and logged
- Docker 20.10+
- Docker Compose 2.0+
- API keys for desired providers
Create config/basic-config.yaml
:
model_list:
# OpenAI Models
- model_name: "gpt-4o-mini"
litellm_params:
model: "openai/gpt-4o-mini"
api_key: ${OPENAI_API_KEY}
# OpenRouter Models
- model_name: "qwen-coder"
litellm_params:
model: "openrouter/qwen/qwen-3-coder"
api_key: ${OPENROUTER_API_KEY}
# Local Models (Ollama)
- model_name: "local-codellama"
litellm_params:
model: "ollama/codellama"
api_base: http://localhost:11434
general_settings:
master_key: ${LITELLM_MASTER_KEY}
request_timeout: 600
Create .env
file:
LITELLM_MASTER_KEY=sk-your-secure-key
OPENAI_API_KEY=sk-your-openai-key
OPENROUTER_API_KEY=sk-or-your-openrouter-key
docker-compose -f docker-compose.basic.yml up -d
Add to your shell profile:
export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_AUTH_TOKEN=sk-your-secure-key
# 1. Clone and navigate
git clone https://github.com/ruvnet/claude-flow.git
cd claude-flow/examples/litellm
# 2. Configure environment
cp .env.example .env
# Edit .env with your keys
# 3. Deploy full stack
./scripts/deploy.sh start
- 3x LiteLLM Proxies: Load-balanced for HA
- Nginx Load Balancer: Traffic distribution
- PostgreSQL: Configuration storage
- Redis: Response caching
- Prometheus + Grafana: Monitoring
- Loki: Log aggregation
services:
nginx: # Load balancer on port 4000
litellm-1/2/3: # Proxy instances
postgres: # Database on port 5432
redis: # Cache on port 6379
prometheus: # Metrics on port 9090
grafana: # Dashboards on port 3000
loki: # Logs on port 3100
Map Claude Code model names to providers:
model_list:
# Map Claude's model name to Qwen
- model_name: "claude-3-5-sonnet"
litellm_params:
model: "openrouter/qwen/qwen-3-coder"
max_tokens: 65536
# Map to OpenAI
- model_name: "claude-3-opus"
litellm_params:
model: "openai/gpt-4-turbo"
max_tokens: 8192
Configure automatic fallbacks:
fallback_models:
code_chain:
- gpt-4o-mini # Fast, cheap
- qwen-coder # Alternative
- local-codellama # Local fallback
reasoning_chain:
- gpt-4-turbo # Primary
- claude-3-opus # Fallback
model_list:
- model_name: "cheap-code"
litellm_params:
model: "openrouter/deepseek/deepseek-coder"
model_info:
cost_per_token: 0.000001 # Very cheap
- model_name: "premium-reasoning"
litellm_params:
model: "openai/o3-pro"
model_info:
cost_per_token: 0.00015 # Expensive
./scripts/manage-tenants.sh create engineering \
sk-eng-key \
100 \
"gpt-4o-mini,qwen-coder"
tenants/engineering.yaml
:
tenant:
id: engineering
api_key: sk-eng-secure-key
allowed_models:
- gpt-4o-mini
- qwen-coder
- local-codellama
budget:
daily_limit: 100
monthly_limit: 3000
rate_limits:
requests_per_minute: 60
requests_per_hour: 1000
# Engineering team
export ANTHROPIC_AUTH_TOKEN=sk-eng-secure-key
claude --model gpt-4o-mini "Build feature"
# Research team
export ANTHROPIC_AUTH_TOKEN=sk-research-key
claude --model gpt-4-turbo "Analyze data"
curl -X GET http://localhost:4000/health \
-H "Authorization: Bearer $LITELLM_MASTER_KEY"
Expected response:
{
"healthy_endpoints": [...],
"healthy_count": 2,
"unhealthy_count": 0
}
curl -X GET http://localhost:4000/models \
-H "Authorization: Bearer $LITELLM_MASTER_KEY"
curl -X POST http://localhost:4000/chat/completions \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
# Concurrent requests test
for i in {1..10}; do
curl -X POST http://localhost:4000/chat/completions \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "test-model", "messages": [{"role": "user", "content": "Test"}]}' &
done
wait
Problem: 401 Authentication Error
Solution:
# Check master key in .env
grep LITELLM_MASTER_KEY .env
# Verify in docker-compose.yml
docker-compose exec litellm env | grep MASTER_KEY
Problem: Model not found
Solution:
# List available models
curl http://localhost:4000/models \
-H "Authorization: Bearer $LITELLM_MASTER_KEY"
# Check config
docker-compose exec litellm cat /app/config.yaml
Problem: Cannot connect to LiteLLM
Solution:
# Check container status
docker-compose ps
# View logs
docker-compose logs litellm
# Restart services
docker-compose restart
Problem: High latency
Solution:
# Enable caching in config.yaml
general_settings:
cache_enabled: true
cache_ttl: 3600
# Add Redis for caching
services:
redis:
image: redis:7-alpine
Problem: 429 Too Many Requests
Solution:
# Adjust rate limits
rate_limits:
requests_per_minute: 100
requests_per_hour: 5000
Enable verbose logging:
# docker-compose.yml
environment:
- LITELLM_LOG_LEVEL=DEBUG
- LITELLM_SET_VERBOSE=true
View debug logs:
docker-compose logs -f litellm | grep DEBUG
# Stop services
docker-compose down
# Remove volumes (caution: deletes data)
docker-compose down -v
# Rebuild containers
docker-compose build --no-cache
# Scale proxies
docker-compose up -d --scale litellm=5
Add custom LLM providers:
model_list:
- model_name: "custom-llm"
litellm_params:
model: "custom_provider/model"
api_base: "https://api.custom.com"
api_key: ${CUSTOM_API_KEY}
custom_llm_provider: "custom"
extra_headers:
X-Custom-Header: "value"
Enable HTTPS:
# nginx.conf
server {
listen 443 ssl;
ssl_certificate /certs/cert.pem;
ssl_certificate_key /certs/key.pem;
}
Access dashboards:
- Grafana: http://localhost:3000
- Prometheus: http://localhost:9090
Key metrics to monitor:
- Request latency (p50, p95, p99)
- Token usage per model
- Error rates
- Cost per tenant
# Backup database
docker-compose exec postgres \
pg_dump -U litellm > backup_$(date +%Y%m%d).sql
# Restore database
docker-compose exec -T postgres \
psql -U litellm < backup_20250807.sql
# Backup configuration
tar -czf config_backup.tar.gz config/ tenants/
# Optimize for high throughput
router_settings:
max_parallel_requests: 100
request_timeout: 30
enable_caching: true
# Worker configuration
environment:
- MAX_WORKERS=8
- WORKER_TIMEOUT=300
- Rotate API Keys:
./scripts/rotate-keys.sh
- Network Isolation:
networks:
internal:
internal: true
external:
external: true
- Rate Limiting:
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req zone=api burst=20 nodelay;
# Basic usage
claude --model gpt-4o-mini "Write a function"
# Streaming
claude --model qwen-coder --stream "Explain this code"
# With context
claude --model local-codellama --context file.py "Refactor this"
import os
import requests
# Configure
os.environ['ANTHROPIC_BASE_URL'] = 'http://localhost:4000'
headers = {'Authorization': f'Bearer {os.environ["LITELLM_MASTER_KEY"]}'}
# Make request
response = requests.post(
f"{os.environ['ANTHROPIC_BASE_URL']}/chat/completions",
headers=headers,
json={
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}
)
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Wiki: Claude Flow Wiki