Models — choosing & switching
At a glance
- 3 LLM providers: OpenAI, Anthropic (Claude), Google (Gemini)
- Single entry point:
LLMService.call()→LLMFactory→ provider-specificexecute() - Model choice is per-node — different nodes in the same workflow can use different models
- Default for agent templates: Gemini 2.5 Flash (cost-optimized for high-volume logistics)
Why this matters
Model choice affects cost, latency, accuracy, and data residency. When a customer asks “which model powers our agent?” or “can we switch to an on-prem model?”, you need to know what’s available and what the trade-offs are. The platform makes switching straightforward — it’s a config change, not a code change.
Supported models
OpenAI
| Model | Use case |
|---|---|
| GPT-5.4 | Highest capability — complex reasoning, multi-step analysis |
| GPT-5.4-mini | Balanced cost/performance for production workloads |
| GPT-5.4-nano | Lowest cost — classification, simple extraction |
| GPT-4.1 | Legacy — still used in some existing workflows |
| GPT-4o | Multimodal — image/document analysis |
| GPT-4o-mini | Cost-optimized multimodal |
Anthropic (Claude)
| Model | Use case |
|---|---|
| Claude 3 Opus | Highest reasoning — complex agent workflows |
| Claude 3 Sonnet | Balanced — production agent workloads |
| Claude 3 Haiku | Fast and cheap — classification, routing |
Google (Gemini)
| Model | Use case |
|---|---|
| Gemini 2.5 Flash | Platform default. Fast, cheap, good enough for most logistics tasks |
| Gemini 2.5 Pro | Higher capability for complex analysis |
| Gemini 3 Flash Preview | Next-gen — used in address intelligence agents |
| Gemini 3.1 Flash Lite Preview | Ultra-lightweight |
| Gemini 3.1 Pro Preview | Used for rain detection, address intelligence supervisor |
How the LLM service works
All three providers inherit from BaseLLM and implement a standard execute() method. The factory routes based on the model name prefix.
Decision matrix
| Factor | Cloud (GPT/Claude/Gemini) | On-prem (Llama/Mistral) |
|---|---|---|
| Accuracy | Highest — GPT-5.4, Claude Opus lead on complex reasoning and tool use | Improving but trails on multi-step tool orchestration |
| Latency | 1-5s typical (depends on model + prompt length) | Depends on hardware — can match cloud with good GPUs |
| Cost | Pay-per-token. Gemini Flash is cheapest; GPT-5.4 is most expensive | Capex for hardware, no per-token cost |
| Data residency | Data transits to cloud provider (Azure, GCP, AWS) | Everything stays on-prem |
| Switching effort | Config change — update model name in workflow node | Requires on-prem deployment mode |
How to switch a model
Model assignment happens at the workflow node level. To switch:
- Open the agent in the Dashboard.
- Select the node you want to change.
- Update the model field to the new model name.
- Save the workflow.
No code deployment needed — the LLMFactory routes to the correct provider at runtime based on the model name.
What agent templates use by default
| Agent template | Default model | Why |
|---|---|---|
| control_tower_supervisor | Gemini 2.5 Flash | Cost-optimized for high-volume incident processing |
| control_tower_driver / dispatch / customer | Gemini 2.5 Flash | Voice agents need fast response |
| rain_bot_supervisor | Gemini 3.1 Pro Preview | Video analysis requires strong reasoning |
| address_intelligence_supervisor | Gemini 3.1 Pro Preview | Google Maps grounding works best with Gemini |
| address_intelligence_customer | Gemini 3 Flash Preview | Voice agent — needs speed |
| All default templates | Gemini 2.5 Flash | Cheapest viable option for blank-slate agents |
Cost tracking
The platform tracks token usage and cost per task:
tokens_in/tokens_outon each task recordcost_usdaggregated at the job level- Visible in the Dashboard job detail view
- Feeds into Langfuse for cost analytics
Sources
- agent-platform repo:
app/core/llm/providers/ - agent-platform repo:
data/llm_models.json - agent-platform repo:
data/agent/agents.json - See Deployment modes for on-prem model options
- See Observability for cost monitoring in Langfuse
Changelog
- 26 May 2026: Full content from GitHub repo exploration. Provider list, model catalog, decision matrix.