Configuration

InferShrink comes with sensible defaults, but every aspect of the routing and optimization engine can be customized.

Overrides

You can pass a configuration dictionary directly to the optimize() function.

client = optimize(openai.Client(), config={
    "tiers": {
        "tier1": {"models": ["gpt-4o-mini"], "max_complexity": "SIMPLE"},
        "tier2": {"models": ["gpt-4o"], "max_complexity": "MODERATE"},
        "tier3": {"models": ["gpt-4.5-preview"], "max_complexity": "COMPLEX"},
    },
    "compression": {
        "enabled": True,
        "min_tokens": 500,
        "skip_for": ["SECURITY_CRITICAL"],
    },
    "quality_floor": 0.95,
    "cost_tracking": True,
})

Complexity Levels

InferShrink classifies every prompt into one of four complexity levels. These levels determine which model tier is selected.

Level	Signals	Default Routing
SIMPLE	Short, no code, basic questions (<500 tokens)	Tier 1 (cheapest)
MODERATE	Some code, medium length, summarization	Tier 2
COMPLEX	Heavy code, multi-step reasoning, long prompts	Tier 3 (most capable)
SECURITY_CRITICAL	Passwords, API keys, financial data	Never downgraded

Same-Provider Routing

By default, InferShrink only routes within the same provider to avoid data privacy issues or API key confusion. We never switch providers silently.

gpt-4o         → gpt-4o-mini       ✅ (same provider)
claude-opus    → claude-sonnet     ✅ (same provider)
gpt-4o         → claude-sonnet     ✗ (never happens)

License Key

For paid plans, configure your license key via environment variables or inline config.

# Via environment
# export INFERSHRINK_LICENSE_KEY=ls_live_xxxxx

# Or inline
client = optimize(openai.Client(), config={"license_key": "ls_live_xxxxx"})