3 · AgentFleet platformModels — choosing & switching

Models — choosing & switching

At a glance

  • 3 LLM providers: OpenAI, Anthropic (Claude), Google (Gemini)
  • Single entry point: LLMService.call()LLMFactory → provider-specific execute()
  • Model choice is per-node — different nodes in the same workflow can use different models
  • Default for agent templates: Gemini 2.5 Flash (cost-optimized for high-volume logistics)

Why this matters

Model choice affects cost, latency, accuracy, and data residency. When a customer asks “which model powers our agent?” or “can we switch to an on-prem model?”, you need to know what’s available and what the trade-offs are. The platform makes switching straightforward — it’s a config change, not a code change.

Supported models

OpenAI

ModelUse case
GPT-5.4Highest capability — complex reasoning, multi-step analysis
GPT-5.4-miniBalanced cost/performance for production workloads
GPT-5.4-nanoLowest cost — classification, simple extraction
GPT-4.1Legacy — still used in some existing workflows
GPT-4oMultimodal — image/document analysis
GPT-4o-miniCost-optimized multimodal

Anthropic (Claude)

ModelUse case
Claude 3 OpusHighest reasoning — complex agent workflows
Claude 3 SonnetBalanced — production agent workloads
Claude 3 HaikuFast and cheap — classification, routing

Google (Gemini)

ModelUse case
Gemini 2.5 FlashPlatform default. Fast, cheap, good enough for most logistics tasks
Gemini 2.5 ProHigher capability for complex analysis
Gemini 3 Flash PreviewNext-gen — used in address intelligence agents
Gemini 3.1 Flash Lite PreviewUltra-lightweight
Gemini 3.1 Pro PreviewUsed for rain detection, address intelligence supervisor

How the LLM service works

All three providers inherit from BaseLLM and implement a standard execute() method. The factory routes based on the model name prefix.

Decision matrix

FactorCloud (GPT/Claude/Gemini)On-prem (Llama/Mistral)
AccuracyHighest — GPT-5.4, Claude Opus lead on complex reasoning and tool useImproving but trails on multi-step tool orchestration
Latency1-5s typical (depends on model + prompt length)Depends on hardware — can match cloud with good GPUs
CostPay-per-token. Gemini Flash is cheapest; GPT-5.4 is most expensiveCapex for hardware, no per-token cost
Data residencyData transits to cloud provider (Azure, GCP, AWS)Everything stays on-prem
Switching effortConfig change — update model name in workflow nodeRequires on-prem deployment mode

How to switch a model

Model assignment happens at the workflow node level. To switch:

  1. Open the agent in the Dashboard.
  2. Select the node you want to change.
  3. Update the model field to the new model name.
  4. Save the workflow.

No code deployment needed — the LLMFactory routes to the correct provider at runtime based on the model name.

What agent templates use by default

Agent templateDefault modelWhy
control_tower_supervisorGemini 2.5 FlashCost-optimized for high-volume incident processing
control_tower_driver / dispatch / customerGemini 2.5 FlashVoice agents need fast response
rain_bot_supervisorGemini 3.1 Pro PreviewVideo analysis requires strong reasoning
address_intelligence_supervisorGemini 3.1 Pro PreviewGoogle Maps grounding works best with Gemini
address_intelligence_customerGemini 3 Flash PreviewVoice agent — needs speed
All default templatesGemini 2.5 FlashCheapest viable option for blank-slate agents

Cost tracking

The platform tracks token usage and cost per task:

  • tokens_in / tokens_out on each task record
  • cost_usd aggregated at the job level
  • Visible in the Dashboard job detail view
  • Feeds into Langfuse for cost analytics

Sources

Changelog

  • 26 May 2026: Full content from GitHub repo exploration. Provider list, model catalog, decision matrix.