3 · AgentFleet platformDeployment modes

Deployment modes

At a glance

  • Three deployment modes: cloud (default), on-prem, hybrid.
  • Cloud runs on AWS ECS with a three-service split: API, worker, scheduler.
  • Model choice drives deployment mode: closed-source LLMs need cloud; open-source (Llama, Mistral) enable on-prem.
  • Data residency is the most common reason customers push for on-prem or hybrid.

Why this matters

“Where does our data go?” is the first question in every enterprise security review. Having a crisp answer — with a diagram — saves weeks of back-and-forth. Know the three modes and when to recommend each.

The three modes

Cloud (default)

The standard deployment for most customers.

ComponentWhere it runs
Agent-platform APIAWS ECS (desiredCount >= 2)
Worker serviceAWS ECS
Scheduler serviceAWS ECS
LLM providerAzure OpenAI (default)
Vector DBPinecone
ObservabilityNew Relic + Elasticsearch + Langfuse

Best for: customers comfortable with cloud, need fastest time-to-value, want access to the best models (GPT-4o, Claude Sonnet).

On-prem

The full stack runs inside the customer’s infrastructure.

ComponentWhere it runs
Agent-platformCustomer’s Kubernetes or VMs
LLMOpen-source (Llama, Mistral — fine-tuned)
Vector DBFAISS (local)
ObservabilityCustomer’s monitoring stack

Best for: banking (BDO), government, defense, or any customer with strict data-residency requirements that prohibit cloud LLM calls.

Trade-off: model quality. Open-source models are catching up but still trail GPT-4o and Claude Sonnet on complex reasoning and tool use.

Hybrid

Platform runs on-prem; LLM calls route to cloud. Data in transit is encrypted; data at rest stays on-prem.

Best for: customers who need data residency for stored data but accept that LLM inference happens in a cloud provider’s environment (with appropriate DPAs and SOC-2 coverage).

Choosing the right mode

Common customer situations

Customer saysRecommendWhy
”We’re fine with cloud”CloudFastest, best models
”Our security team won’t approve cloud LLMs”On-premFull control, open-source models
”Data must stay in our VPC but we want GPT-4o”HybridBest-of-both — data stays local, LLM calls route to Azure
”We’re a bank in the Philippines” (BDO pattern)Start hybrid, assess on-premBanking regulators care about data at rest; inference in transit is usually acceptable with DPAs
”We’re in the EU / Middle East”Cloud (regional) or hybridCheck specific regulation; often cloud with regional hosting suffices

Infrastructure details (cloud mode)

The agent-platform runs as three ECS services:

ServicePurpose
APIHandles inbound requests (webhooks, REST), routes to supervisor agent
WorkerExecutes agent workflows, tool calls, LLM inference
SchedulerRuns scheduled/cron-based workflows (e.g., Maya’s monitoring loops)

Each runs with desiredCount >= 2 for high availability. Health checks are monitored via New Relic with automated alerting.

Sources

Changelog

  • 26 May 2026: Full content from Slack engineering discussions and architecture research.