API Reference

OpenAI-compatible API. Change your base URL, keep everything else.

Quick Start

Thermly is a drop-in replacement for the OpenAI API. Just change two lines:

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",  # ← Add this
    api_key="sk-thermo-your-key-here",      # ← Your Thermly key
)

response = client.chat.completions.create(
    model="auto",  # Thermly picks the cheapest adequate model
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Get your API key from the Dashboard → API Keys page.

Authentication

All API requests require an API key sent in the Authorization header:

bash

Authorization: Bearer sk-thermo-your-key-here

Note: Your API key is shown only once when created. Store it securely. If compromised, revoke it immediately from the dashboard and create a new one.

Endpoint

POSThttps://api.thermly.net/v1/chat/completions

This is the only endpoint you need. It's fully compatible with the OpenAI Chat Completions API format.

Models

Smart Routing (recommended)

Let Thermly pick the best model for each request. This is where the cost savings come from.

Model	Description
`auto`	Analyzes your request and routes to the cheapest model that can handle it. Best for most use cases.
`fast`	Always uses the fastest, cheapest model. Best for simple tasks.
`balanced`	Mid-tier model. Balance of cost and quality.
`best`	Always uses the most capable model. Best for complex tasks.

Direct Models (passthrough)

Specify a model by name to bypass smart routing. Your request goes directly to that provider.

Provider	Model
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`, `o1`, `o3-mini`
Anthropic	`claude-sonnet-4-5`, `claude-haiku-4-5`, `claude-opus-4`, `claude-3-haiku`
Google	`gemini-2.5-flash`, `gemini-2.5-pro`

Note: We also accept common aliases like claude-haiku, gemini-flash, or claude-3-5-sonnet-20241022. They automatically resolve to the correct model.

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model to use. Use `"auto"` for smart routing, or a specific model name.
`messages`	array	Yes	Array of message objects with `role` and `content`. Max 100 messages.
`temperature`	float	No	Sampling temperature, 0.0 to 2.0. Higher = more creative. Default: provider default.
`max_tokens`	integer	No	Maximum tokens in the response. Max: 16,384.
`top_p`	float	No	Nucleus sampling, 0.0 to 1.0.
`stop`	string \| array	No	Stop sequences. Up to 4 sequences.
`tools`	array	No	Function calling tools. Max 64 tools.
`stream`	boolean	No	Not yet supported. Coming soon.

Response Format

Responses follow the standard OpenAI chat completion format:

json

{
  "id": "thermo-a1b2c3d4e5f6",
  "object": "chat.completion",
  "created": 1709000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  },
  "x_thermostat": {
    "actual_model": "gpt-4o-mini",
    "complexity_level": 1,
    "cached": false
  }
}

The x_thermostat field is Thermly-specific metadata. It tells you which model actually handled the request and whether the response was cached.

Code Examples

Python

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thermly.net/v1",
    api_key="sk-thermo-your-key-here",
)

# Smart routing - Thermly picks the best model
response = client.chat.completions.create(
    model="auto",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain how DNS works."},
    ],
)
print(response.choices[0].message.content)

# Direct model - specify exactly which model to use
response = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Write a haiku about coding."}],
)
print(response.choices[0].message.content)

JavaScript / Node.js

javascript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.thermly.net/v1",
  apiKey: "sk-thermo-your-key-here",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

curl

bash

curl https://api.thermly.net/v1/chat/completions \
  -H "Authorization: Bearer sk-thermo-your-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Error Codes

All errors follow the same JSON format:

json

{
  "error": {
    "message": "Human-readable error description",
    "type": "error_type",
    "code": "error_code"
  }
}

Code	Meaning	What to do
401	Invalid or missing API key	Check your API key is correct and active
402	Insufficient credits	Add credits from the dashboard billing page
422	Invalid request parameters	Check your request body matches the parameter specs
429	Rate limit exceeded	Wait and retry. Check the `Retry-After` header
503	All models unavailable	The upstream provider is down. Retry after a few seconds

Rate Limits

Each API key has a per-minute request limit. If you exceed it, you'll receive a 429 response with a Retry-After header.

Limit	Value
Requests per minute	60 RPM per API key
Max messages per request	100
Max message length	50,000 characters
Max output tokens	16,384

Need higher limits? Contact us.

Caching

When using smart routing models (auto, fast, balanced, best), identical requests may return cached responses at zero cost. Cached responses are indicated by x_thermostat.cached: true in the response.

Cache hits:Free — no credits deducted.

Cache TTL:1-24 hours depending on query type. Time-sensitive queries have shorter TTL.

Bypass cache:Send the header x-no-cache: true to force a fresh response.

Direct models:Requests to specific models (e.g., gpt-4o) are never cached.

Support

Questions? Issues? Reach out to support@thermly.net or visit your dashboard.