Models

13 models across 3 providers. Use smart routing or pick a specific model.

Smart Routing

Don't know which model to use? Let us pick for you. Just set model="auto" and we handle the rest.

autoRecommended

Auto

We analyze your request and pick the cheapest model that can handle it. Best for most use cases.

Best for: Everyone — save money automatically

fastCheapest

Fast

Always uses the fastest, cheapest model. Great for simple questions, chat, and quick lookups.

Best for: Simple tasks, high volume

balancedValue

Balanced

A good balance of cost and quality. Handles most tasks well without premium pricing.

Best for: General-purpose work

bestPremium

Best

Always uses the most capable model. For tasks that need the highest quality output.

Best for: Complex coding, research, analysis

All Models

Specify a model by name to bypass smart routing. Your request goes directly to that provider.

OpenAI7 models

GPT-4o

Flagship

gpt-4o

Complex tasks, coding, detailed analysis

Context: 128K tokens

GPT-4o Mini

Fast

gpt-4o-mini

Quick answers, simple tasks, chat

Context: 128K tokens

GPT-4 Turbo

Flagship

gpt-4-turbo

Long documents, detailed work

Context: 128K tokens

GPT-4

Legacy

gpt-4

Reliable general-purpose tasks

Context: 8K tokens

GPT-3.5 Turbo

Legacy

gpt-3.5-turbo

Basic tasks, high volume

Context: 16K tokens

Reasoning

o1

Deep reasoning, math, logic problems

Context: 200K tokens

o3-mini

Reasoning

o3-mini

Reasoning tasks, smaller and faster

Context: 200K tokens

Anthropic4 models

Claude Sonnet 4.5

Flagship

claude-sonnet-4-5

Nuanced writing, analysis, coding

Context: 200K tokens

Claude Haiku 4.5

Fast

claude-haiku-4-5

Quick responses, summaries

Context: 200K tokens

Claude Opus 4

Flagship

claude-opus-4

Most capable Claude, complex research

Context: 200K tokens

Claude 3 Haiku

Legacy

claude-3-haiku

Basic tasks, previous generation

Context: 200K tokens

Google2 models

Gemini 2.5 Flash

Fast

gemini-2.5-flash

Large documents, fast processing

Context: 1M tokens

Gemini 2.5 Pro

Flagship

gemini-2.5-pro

Complex tasks, massive context window

Context: 1M tokens

Not sure which model to choose?

Use auto and let Thermly handle it. We'll analyze each request and route it to the most cost-effective model that can deliver great results.

Get Started Free