Codex-LB intelligently routes requests to ChatGPT accounts based on availability, usage, and configured strategies. This ensures optimal load distribution and prevents individual accounts from hitting rate limits.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Soju06/codex-lb/llms.txt
Use this file to discover all available pages before exploring further.
Routing Strategies
Codex-LB supports two primary routing strategies:Usage-Weighted Routing
Routes requests to accounts based on remaining capacity.- Accounts with more remaining capacity receive more traffic
- Accounts near rate limits receive less traffic
- Weights are recalculated based on real-time usage
- Maximizing throughput
- Avoiding rate limit errors
- Production environments with multiple accounts
Round-Robin Routing
Distributes requests evenly across all available accounts.- Each account receives requests in rotation
- No weighting based on usage or capacity
- Simpler algorithm with less overhead
- Testing and development
- Accounts with similar quotas
- Simpler deployment scenarios
Configuring Routing Strategy
Select Routing Strategy
Choose your preferred routing strategy:
- Usage-weighted: Distributes traffic based on remaining capacity (recommended)
- Round-robin: Distributes traffic evenly across accounts
Account Selection
Eligible Accounts
For each incoming request, Codex-LB considers accounts that are:- Active status: Account status is
active - Not rate limited: Account has not hit ChatGPT rate limits
- Fresh tokens: Access tokens are valid and not expired
- Available quota: Account has remaining usage capacity (for usage-weighted)
Account Status Impact
Account status affects routing eligibility:| Status | Eligible for Routing? | Notes |
|---|---|---|
active | Yes | Normal operation |
rate_limited | No | Temporarily excluded until limits reset |
quota_exceeded | No | Excluded until quota resets |
paused | No | Manually paused by admin |
deactivated | No | Permanently excluded |
Account Recovery
Accounts automatically recover from temporary states:- Rate limited: After the rate limit window expires (typically 3-60 minutes)
- Quota exceeded: After the quota window resets (daily/weekly)
- Token expired: After automatic token refresh
Model-Specific Restrictions
You can restrict which models an API key can access using theallowed_models field:
- Only listed models can be requested
- Other models return
403 Forbidden - Empty or
nullallows all models
- Restrict expensive models to production keys
- Limit test keys to cheaper models
- Enforce compliance requirements
Example Configurations
Production Key (All Models)
Development Key (Budget Models)
Premium Key (Latest Models)
Sticky Sessions
Sticky sessions ensure that requests with the sameprompt_cache_key are routed to the same account.
- Requests with the same
prompt_cache_keyare routed to the same account - Improves prompt caching efficiency
- Reduces latency for multi-turn conversations
- Dashboard Settings → “Sticky threads”
- Pass
prompt_cache_keyin requests
- Better prompt cache hit rates
- Lower costs for cached tokens
- Consistent experience for multi-turn conversations
- If the sticky account becomes unavailable, requests are routed to another account
- Sticky sessions are reallocated if the account status changes
Account Preferences
Prefer Earlier Reset Accounts
- Prioritizes accounts that will reset sooner
- Helps distribute usage across reset windows
- Reduces risk of all accounts hitting limits simultaneously
- Managing accounts with different reset times
- Smoothing out traffic patterns
- Preventing simultaneous rate limit errors
Load Balancer Behavior
Selection Algorithm
Filter Accounts
Identify accounts that are:
- Active status
- Not rate limited or quota exceeded
- Not paused or deactivated
- Have valid, unexpired tokens
Check Sticky Session
If sticky sessions are enabled and a
prompt_cache_key is provided:- Check if a sticky session exists for this key
- If yes, prefer that account (if available)
- If account unavailable, reallocate to another account
Apply Routing Strategy
Usage-weighted:
- Calculate remaining capacity for each account
- Weight selection probability by remaining capacity
- Accounts with more capacity are more likely to be selected
- Select next account in rotation
- Skip unavailable accounts
- Continue rotation from last selected account
Apply Preferences
If “prefer earlier reset” is enabled:
- Sort accounts by reset time
- Prefer accounts that reset sooner
Retry Logic
If a request fails, Codex-LB automatically retries with a different account:Detect Failure
Request fails due to:
- Rate limit error (429)
- Quota exceeded error (403)
- Token expiration (401)
- Network error
Mark Account Status
Update account status based on error:
rate_limit_exceeded→rate_limitedquota_exceeded→quota_exceededinsufficient_quota→quota_exceeded- Token errors →
deactivated(if permanent)
Error Handling
No Available Accounts
503 Service Unavailable
Causes:
- All accounts are rate limited or quota exceeded
- All accounts are paused or deactivated
- No accounts added to the load balancer
- All accounts have expired or invalid tokens
- Check account status in the dashboard
- Wait for rate limits to reset
- Add more accounts to increase capacity
- Reactivate paused accounts
Model Not Allowed
403 Forbidden
Cause: Requested model not in API key’s allowed_models list.
Solution: Update the API key’s allowed_models or use a different model.
Rate Limit Propagation
When an account hits a rate limit:- Account status changes to
rate_limited - Account is excluded from routing
- Error details are recorded:
- Error code (e.g.,
rate_limit_exceeded) - Error message from ChatGPT
- Timestamp of failure
- Error code (e.g.,
- Account automatically recovers after the rate limit window
Monitoring
Account Status
Monitor account status in the dashboard:- Active: Available for routing
- Rate limited: Temporarily unavailable
- Quota exceeded: Quota exhausted
- Paused: Manually disabled
- Deactivated: Permanently disabled
Usage Metrics
Track usage across accounts:- Total requests: Number of requests routed to each account
- Token usage: Input/output/cached tokens per account
- Error rate: Percentage of failed requests per account
- Remaining capacity: Available quota for each account
Rate Limit Headers
Response headers show account-level rate limits:Best Practices
Account Management
- Multiple accounts: Add multiple accounts to increase capacity and reliability
- Diverse reset times: Add accounts at different times to stagger reset windows
- Monitor status: Check account status regularly and reactivate as needed
- Remove inactive: Delete deactivated accounts to reduce noise
Routing Strategy
- Production: Use
usage_weightedfor optimal load distribution - Development: Use
round_robinfor simplicity - Sticky sessions: Enable for applications with prompt caching
- Prefer earlier reset: Enable for smoother traffic distribution
Model Restrictions
- Budget control: Restrict expensive models to production keys
- Testing: Use cheaper models for development and testing
- Compliance: Enforce model restrictions for regulatory requirements
Error Handling
- Implement retries: Client applications should retry on
503errors - Exponential backoff: Use exponential backoff for retries
- Fallback logic: Have fallback behavior when all accounts are unavailable
- Monitor alerts: Set up alerts for “no available accounts” errors
Advanced Configuration
Custom Routing Logic
While Codex-LB provides built-in routing strategies, you can implement custom logic by:- Monitoring account status via API
- Distributing requests across multiple Codex-LB instances
- Using external load balancers with health checks
Account Pools
Organize accounts into pools for different use cases:- Pool A: High-quota accounts for production
- Pool B: Lower-quota accounts for development
- Pool C: Specific accounts for certain models
Geographic Distribution
Distribute accounts across regions for lower latency:- Deploy Codex-LB instances in multiple regions
- Add accounts with tokens from the same region
- Route requests to the nearest instance
Troubleshooting
Uneven traffic distribution
Cause: Some accounts have much more capacity than others. Solution:- Use
usage_weightedrouting to automatically balance based on capacity - Add more accounts with similar quotas
- Enable “prefer earlier reset” to distribute across reset windows
Sticky sessions not working
Cause:sticky_threads_enabled is disabled or prompt_cache_key is not provided.
Solution:
- Enable sticky threads in settings
- Pass
prompt_cache_keyin request body - Verify key is consistent across related requests
Accounts frequently rate limited
Cause: Not enough accounts for the request volume. Solution:- Add more accounts to increase total capacity
- Implement client-side rate limiting
- Use API key rate limits to control usage
- Monitor usage patterns and adjust
Request fails even with available accounts
Cause: Model restrictions, API key limits, or network errors. Solution:- Check API key
allowed_modelsconfiguration - Verify API key rate limits
- Check Codex-LB logs for detailed error messages
- Test with a simple request to isolate the issue
Next Steps
Troubleshooting
Diagnose and resolve common issues
API Reference
Explore the complete API documentation