Models — choosing & switching

At a glance

3 LLM providers: OpenAI, Anthropic (Claude), Google (Gemini)
Single entry point: LLMService.call() → LLMFactory → provider-specific execute()
Model choice is per-node — different nodes in the same workflow can use different models
Default for agent templates: Gemini 2.5 Flash (cost-optimized for high-volume logistics)

Why this matters

Model choice affects cost, latency, accuracy, and data residency. When a customer asks “which model powers our agent?” or “can we switch to an on-prem model?”, you need to know what’s available and what the trade-offs are. The platform makes switching straightforward — it’s a config change, not a code change.

Supported models

OpenAI

Model	Use case
GPT-5.4	Highest capability — complex reasoning, multi-step analysis
GPT-5.4-mini	Balanced cost/performance for production workloads
GPT-5.4-nano	Lowest cost — classification, simple extraction
GPT-4.1	Legacy — still used in some existing workflows
GPT-4o	Multimodal — image/document analysis
GPT-4o-mini	Cost-optimized multimodal

Anthropic (Claude)

Model	Use case
Claude 3 Opus	Highest reasoning — complex agent workflows
Claude 3 Sonnet	Balanced — production agent workloads
Claude 3 Haiku	Fast and cheap — classification, routing

Google (Gemini)

Model	Use case
Gemini 2.5 Flash	Platform default. Fast, cheap, good enough for most logistics tasks
Gemini 2.5 Pro	Higher capability for complex analysis
Gemini 3 Flash Preview	Next-gen — used in address intelligence agents
Gemini 3.1 Flash Lite Preview	Ultra-lightweight
Gemini 3.1 Pro Preview	Used for rain detection, address intelligence supervisor

How the LLM service works

All three providers inherit from BaseLLM and implement a standard execute() method. The factory routes based on the model name prefix.

Decision matrix

Factor	Cloud (GPT/Claude/Gemini)	On-prem (Llama/Mistral)
Accuracy	Highest — GPT-5.4, Claude Opus lead on complex reasoning and tool use	Improving but trails on multi-step tool orchestration
Latency	1-5s typical (depends on model + prompt length)	Depends on hardware — can match cloud with good GPUs
Cost	Pay-per-token. Gemini Flash is cheapest; GPT-5.4 is most expensive	Capex for hardware, no per-token cost
Data residency	Data transits to cloud provider (Azure, GCP, AWS)	Everything stays on-prem
Switching effort	Config change — update model name in workflow node	Requires on-prem deployment mode

How to switch a model

Model assignment happens at the workflow node level. To switch:

Open the agent in the Dashboard.
Select the node you want to change.
Update the model field to the new model name.
Save the workflow.

No code deployment needed — the LLMFactory routes to the correct provider at runtime based on the model name.

What agent templates use by default

Agent template	Default model	Why
control_tower_supervisor	Gemini 2.5 Flash	Cost-optimized for high-volume incident processing
control_tower_driver / dispatch / customer	Gemini 2.5 Flash	Voice agents need fast response
rain_bot_supervisor	Gemini 3.1 Pro Preview	Video analysis requires strong reasoning
address_intelligence_supervisor	Gemini 3.1 Pro Preview	Google Maps grounding works best with Gemini
address_intelligence_customer	Gemini 3 Flash Preview	Voice agent — needs speed
All default templates	Gemini 2.5 Flash	Cheapest viable option for blank-slate agents

Cost tracking

The platform tracks token usage and cost per task:

tokens_in / tokens_out on each task record
cost_usd aggregated at the job level
Visible in the Dashboard job detail view
Feeds into Langfuse for cost analytics

Sources

agent-platform repo: app/core/llm/providers/
agent-platform repo: data/llm_models.json
agent-platform repo: data/agent/agents.json
See Deployment modes for on-prem model options
See Observability for cost monitoring in Langfuse

Changelog

26 May 2026: Full content from GitHub repo exploration. Provider list, model catalog, decision matrix.

The 9 capability modules Memory layers