API Reference

OpenAI-compatible API. Change your base URL, keep everything else.

Quick Start

Thermly is a drop-in replacement for the OpenAI API. Just change two lines:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",  # ← Add this
    api_key="sk-thermo-your-key-here",      # ← Your Thermly key
)

response = client.chat.completions.create(
    model="auto",  # Thermly picks the cheapest adequate model
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Get your API key from the Dashboard → API Keys page.

Authentication

All API requests require an API key sent in the Authorization header:

bash
Authorization: Bearer sk-thermo-your-key-here
Note: Your API key is shown only once when created. Store it securely. If compromised, revoke it immediately from the dashboard and create a new one.

Endpoint

POSThttps://api.thermly.net/v1/chat/completions

This is the only endpoint you need. It's fully compatible with the OpenAI Chat Completions API format.

Models

Smart Routing (recommended)

Let Thermly pick the best model for each request. This is where the cost savings come from.

ModelDescription
autoAnalyzes your request and routes to the cheapest model that can handle it. Best for most use cases.
fastAlways uses the fastest, cheapest model. Best for simple tasks.
balancedMid-tier model. Balance of cost and quality.
bestAlways uses the most capable model. Best for complex tasks.

Direct Models (passthrough)

Specify a model by name to bypass smart routing. Your request goes directly to that provider.

ProviderModel
OpenAIgpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o3-mini
Anthropicclaude-sonnet-4-5, claude-haiku-4-5, claude-opus-4, claude-3-haiku
Googlegemini-2.5-flash, gemini-2.5-pro
Note: We also accept common aliases like claude-haiku, gemini-flash, or claude-3-5-sonnet-20241022. They automatically resolve to the correct model.

Parameters

ParameterTypeRequiredDescription
modelstringYesModel to use. Use "auto" for smart routing, or a specific model name.
messagesarrayYesArray of message objects with role and content. Max 100 messages.
temperaturefloatNoSampling temperature, 0.0 to 2.0. Higher = more creative. Default: provider default.
max_tokensintegerNoMaximum tokens in the response. Max: 16,384.
top_pfloatNoNucleus sampling, 0.0 to 1.0.
stopstring | arrayNoStop sequences. Up to 4 sequences.
toolsarrayNoFunction calling tools. Max 64 tools.
streambooleanNoNot yet supported. Coming soon.

Response Format

Responses follow the standard OpenAI chat completion format:

json
{
  "id": "thermo-a1b2c3d4e5f6",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  },
  "x_thermostat": {
    "actual_model": "gpt-4o-mini",
    "complexity_level": 1,
    "cached": false
  }
}

The x_thermostat field is Thermly-specific metadata. It tells you which model actually handled the request and whether the response was cached.

Code Examples

Python

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",
    api_key="sk-thermo-your-key-here",
)

# Smart routing - Thermly picks the best model
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how DNS works."},
    ],
)
print(response.choices[0].message.content)

# Direct model - specify exactly which model to use
response = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a haiku about coding."}],
)
print(response.choices[0].message.content)

JavaScript / Node.js

javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.thermly.net/v1",
  apiKey: "sk-thermo-your-key-here",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

curl

bash
curl https://api.thermly.net/v1/chat/completions \
  -H "Authorization: Bearer sk-thermo-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Error Codes

All errors follow the same JSON format:

json
{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type",
    "code": "error_code"
  }
}
CodeMeaningWhat to do
401Invalid or missing API keyCheck your API key is correct and active
402Insufficient creditsAdd credits from the dashboard billing page
422Invalid request parametersCheck your request body matches the parameter specs
429Rate limit exceededWait and retry. Check the Retry-After header
503All models unavailableThe upstream provider is down. Retry after a few seconds

Rate Limits

Each API key has a per-minute request limit. If you exceed it, you'll receive a 429 response with a Retry-After header.

LimitValue
Requests per minute60 RPM per API key
Max messages per request100
Max message length50,000 characters
Max output tokens16,384

Need higher limits? Contact us.

Caching

When using smart routing models (auto, fast, balanced, best), identical requests may return cached responses at zero cost. Cached responses are indicated by x_thermostat.cached: true in the response.

Cache hits:Free — no credits deducted.
Cache TTL:1-24 hours depending on query type. Time-sensitive queries have shorter TTL.
Bypass cache:Send the header x-no-cache: true to force a fresh response.
Direct models:Requests to specific models (e.g., gpt-4o) are never cached.

Support

Questions? Issues? Reach out to support@thermly.net or visit your dashboard.