OpenAI-compatible API — zero code changes

Premium AI quality. A fraction of the cost.

Intelligent AI router across OpenAI, Anthropic, and Google. Every request is classified and routed to the cheapest model that can handle it. Same quality, 40-60% lower cost.

40-60%

Cost Reduction

<50ms

Routing Overhead

3

AI Providers

9+

AI Models

Routes across

OpenAI

Anthropic

Google

How It Works

Three steps. Five minutes. Immediate savings.

1

Replace your base URL

Swap your OpenAI base URL with ours. Your existing code, prompts, and tools keep working exactly as before.

2

We route intelligently

Our classifier analyzes each request's complexity and picks the cheapest model across OpenAI, Anthropic, and Google. Simple questions go to fast, cheap models.

3

Watch your costs drop

Your dashboard shows exactly how much you're saving, which models handled which requests, and your cache hit rate.

Why Thermly?

Built for teams that want to use AI without overpaying.

🔀

Smart Routing

Every request is classified and routed to the cheapest adequate model. Simple queries go to Haiku, complex ones to Sonnet.

💾

Response Caching

Identical requests are served from cache in under 5ms. No duplicate API spend. Configurable TTL.

🔑

OpenAI Compatible

Works with any tool that uses the OpenAI API format. Drop-in replacement — just change the base URL.

📊

Real-Time Dashboard

See your savings, request history, model breakdown, and cache hit rate in a beautiful dashboard.

🔒

Enterprise Security

API keys are SHA-256 hashed. Rate limiting per key. Admin-only endpoints. No plaintext secrets.

🌐

Multi-Provider

Access OpenAI, Anthropic, and Google models through one API. We pick the best provider for each request.

Integration in 30 Seconds

Uses the OpenAI-compatible API format. Works with any SDK or tool that supports it.

Before (single provider, one model)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
)

# Paying premium prices for EVERY request
# even simple ones that don't need GPT-4o
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "Hello!"}],
)

After (3 providers, 9+ models, auto-routed)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",
    api_key="sk-thermo-your-key",
)

# Routes to cheapest model that fits:
# Haiku, Gemini Flash, GPT-4o Mini, Sonnet...
response = client.chat.completions.create(
    model="auto",  # We pick the best model
    messages=[{"role": "user",
               "content": "Hello!"}],
)

Two lines changed. That's it. Your "Hello!" goes to Haiku ($0.0001) instead of GPT-4o ($0.005).

Stop overpaying for AI

Most AI requests don't need the most expensive model. Thermly automatically routes each request to the cheapest model that can handle it. Start saving today.

No credit card required. Free tier included.