Configuration
InferShrink comes with sensible defaults, but every aspect of the routing and optimization engine can be customized.
Overrides
You can pass a configuration dictionary directly to the optimize() function.
client = optimize(openai.Client(), config={
"tiers": {
"tier1": {"models": ["gpt-4o-mini"], "max_complexity": "SIMPLE"},
"tier2": {"models": ["gpt-4o"], "max_complexity": "MODERATE"},
"tier3": {"models": ["gpt-4.5-preview"], "max_complexity": "COMPLEX"},
},
"compression": {
"enabled": True,
"min_tokens": 500,
"skip_for": ["SECURITY_CRITICAL"],
},
"quality_floor": 0.95,
"cost_tracking": True,
})
Complexity Levels
InferShrink classifies every prompt into one of four complexity levels. These levels determine which model tier is selected.
| Level | Signals | Default Routing |
|---|---|---|
| SIMPLE | Short, no code, basic questions (<500 tokens) | Tier 1 (cheapest) |
| MODERATE | Some code, medium length, summarization | Tier 2 |
| COMPLEX | Heavy code, multi-step reasoning, long prompts | Tier 3 (most capable) |
| SECURITY_CRITICAL | Passwords, API keys, financial data | Never downgraded |
Same-Provider Routing
By default, InferShrink only routes within the same provider to avoid data privacy issues or API key confusion. We never switch providers silently.
gpt-4o → gpt-4o-mini ✅ (same provider)
claude-opus → claude-sonnet ✅ (same provider)
gpt-4o → claude-sonnet ✗ (never happens)
License Key
For paid plans, configure your license key via environment variables or inline config.
# Via environment
# export INFERSHRINK_LICENSE_KEY=ls_live_xxxxx
# Or inline
client = optimize(openai.Client(), config={"license_key": "ls_live_xxxxx"})