Intelligent AI router across OpenAI, Anthropic, and Google. Every request is classified and routed to the cheapest model that can handle it. Same quality, significantly lower cost.
Up to 60%
Cost Reduction
<1ms
Classification
3
AI Providers
13
AI Models
Routes across
OpenAI
Anthropic
Three steps. Five minutes. Immediate savings.
Swap your OpenAI base URL with ours. Your existing code, prompts, and tools keep working exactly as before.
Our classifier analyzes each request's complexity and picks the cheapest model across OpenAI, Anthropic, and Google. Simple questions go to fast, cheap models.
Your dashboard shows exactly how much you're saving, which models handled which requests, and your cache hit rate.
Built for teams that want to use AI without overpaying.
Every request is classified and routed to the cheapest adequate model. Simple queries go to GPT-4o Mini, complex ones to GPT-4o โ with automatic fallbacks across providers.
Identical requests are served from cache at zero cost. No duplicate API spend. Configurable TTL.
Works with any tool that uses the OpenAI API format. Drop-in replacement โ just change the base URL. Streaming support coming soon.
See your savings, request history, model breakdown, and cache hit rate in a beautiful dashboard.
API keys are SHA-256 hashed. Per-key rate limiting. Security headers on every response. No plaintext secrets.
Access OpenAI, Anthropic, and Google models through one API key. 13 models available, with automatic fallbacks across providers.
Uses the OpenAI-compatible API format. Works with any SDK or tool that supports it.
Before (single provider, one model)
from openai import OpenAI
client = OpenAI(
api_key="sk-your-key",
)
# Paying premium prices for EVERY request
# even simple ones that don't need GPT-4o
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user",
"content": "Hello!"}],
)After (3 providers, 13 models, auto-routed)
from openai import OpenAI
client = OpenAI(
base_url="https://api.thermly.net/v1",
api_key="sk-thermo-your-key",
)
# Routes to cheapest model that fits:
# GPT-4o Mini, Gemini Flash, Claude Haiku...
response = client.chat.completions.create(
model="auto", # We pick the best model
messages=[{"role": "user",
"content": "Hello!"}],
)Two lines changed. That's it. Your "Hello!" goes to GPT-4o Mini instead of GPT-4o โ saving you up to 95% per request.
Most AI requests don't need the most expensive model. Thermly automatically routes each request to the cheapest model that can handle it. Start saving today.