โšกOpenAI-compatible API โ€” zero code changes

Premium AI quality. A fraction of the cost.

Intelligent AI router across OpenAI, Anthropic, and Google. Every request is classified and routed to the cheapest model that can handle it. Same quality, significantly lower cost.

Up to 60%

Cost Reduction

<1ms

Classification

3

AI Providers

13

AI Models

Routes across

OpenAI

Anthropic

Google

How It Works

Three steps. Five minutes. Immediate savings.

1

Replace your base URL

Swap your OpenAI base URL with ours. Your existing code, prompts, and tools keep working exactly as before.

2

We route intelligently

Our classifier analyzes each request's complexity and picks the cheapest model across OpenAI, Anthropic, and Google. Simple questions go to fast, cheap models.

3

Watch your costs drop

Your dashboard shows exactly how much you're saving, which models handled which requests, and your cache hit rate.

Why Thermly?

Built for teams that want to use AI without overpaying.

๐Ÿ”€

Smart Routing

Every request is classified and routed to the cheapest adequate model. Simple queries go to GPT-4o Mini, complex ones to GPT-4o โ€” with automatic fallbacks across providers.

๐Ÿ’พ

Response Caching

Identical requests are served from cache at zero cost. No duplicate API spend. Configurable TTL.

๐Ÿ”‘

OpenAI Compatible

Works with any tool that uses the OpenAI API format. Drop-in replacement โ€” just change the base URL. Streaming support coming soon.

๐Ÿ“Š

Real-Time Dashboard

See your savings, request history, model breakdown, and cache hit rate in a beautiful dashboard.

๐Ÿ”’

Secure by Default

API keys are SHA-256 hashed. Per-key rate limiting. Security headers on every response. No plaintext secrets.

๐ŸŒ

Multi-Provider

Access OpenAI, Anthropic, and Google models through one API key. 13 models available, with automatic fallbacks across providers.

Integration in 30 Seconds

Uses the OpenAI-compatible API format. Works with any SDK or tool that supports it.

Before (single provider, one model)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
)

# Paying premium prices for EVERY request
# even simple ones that don't need GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "Hello!"}],
)

After (3 providers, 13 models, auto-routed)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",
    api_key="sk-thermo-your-key",
)

# Routes to cheapest model that fits:
# GPT-4o Mini, Gemini Flash, Claude Haiku...
response = client.chat.completions.create(
    model="auto",  # We pick the best model
    messages=[{"role": "user",
               "content": "Hello!"}],
)

Two lines changed. That's it. Your "Hello!" goes to GPT-4o Mini instead of GPT-4o โ€” saving you up to 95% per request.

Stop overpaying for AI

Most AI requests don't need the most expensive model. Thermly automatically routes each request to the cheapest model that can handle it. Start saving today.

$1 free credit on signupยทNo credit card required