AgentFleet architecture
At a glance
- LangGraph-based orchestration sits in the middle. Models, memory, tools, and observability hang off it.
- Single integration layer in front of customer systems (TMS/WMS, ERPs, TOS, telephony). One MCP server per integration.
- Stateful, audited, model-agnostic. Cloud, on-prem, or hybrid.
Why this matters
Every conversation with a customer eventually reaches “how does this actually work?” If you can sketch this diagram on a whiteboard and explain each block in one sentence, you can answer 80% of architecture questions in discovery and security reviews. The remaining 20% is the security & compliance and deployment pages.
The diagram
Reading the diagram
Channels in → Supervisor routes → Agents do the work → Models reason, memory remembers, tools act → Observability records everything.
1. Channels
Voice (via SIP trunking to telephony providers like Aviva, Unifonic, Exotel), WhatsApp, email, and event triggers (e.g. an order status change in the TMS). Same agent, multiple channels; context is preserved across them.
2. Orchestration
LangGraph-based, open source. A supervisor agent owns the conversation and delegates sub-tasks to specialist agents based on SOPs and policies the customer encodes. Multi-node workflow, holds state across turns.
Why LangGraph (and not LangChain alone): we need graph-based state machines, not just chains. An agent can loop back, branch, wait for human input.
3. Models
Model-agnostic. The platform routes a given task to the right model:
| Task | Typical choice | Why |
|---|---|---|
| Conversational reasoning | GPT-4o, Claude Sonnet | Strongest tool use + reasoning |
| Cheap classification, summarisation | GPT-4o-mini, Gemini Flash | 10× cheaper, sufficient |
| Voice transcription | Whisper | Quality + language coverage |
| Vision (PODs, invoices) | GPT-4o vision, Gemini | OCR + layout understanding |
| Rule-based (e.g. address normalize) | Proprietary | Deterministic, fast, no LLM needed |
| Sensitive / on-prem | Llama, Mistral (fine-tuned) | Run inside customer’s VPC |
See Models — choosing & switching for the decision matrix.
4. Memory
- Short-term: the current conversation. Held in process. Cleared on session end.
- Long-term: vector DB (Pinecone default; FAISS for on-prem). Stores customer SOPs, product docs, historical context. Queried via RAG.
See Memory layers.
5. Tools (MCP)
Every external action the agent takes goes through a tool. Tools are exposed via MCP servers — one per system. This is the integration layer: adding a new customer system means writing one MCP server, after which every agent can use it.
6. Observability
Every reasoning step, tool call, and decision is logged. The AgentFleet Dashboard surfaces real-time metrics: latency, accuracy, escalation rate, HITL queue length. Anomalies trigger alerts.
See Observability & monitoring.
What’s not in the diagram
Three things newcomers commonly assume are part of the platform but aren’t:
- Customer’s CRM (Salesforce, HubSpot). We integrate, we don’t replace.
- Customer’s TOS / OMS / ERP. Same.
- The conversational front-end for end-users. The customer’s existing channels (their phone tree, their WhatsApp Business number, their email) front Shipsy — we don’t ship a chat widget.
Try it yourself
Open the agent-platform repo and trace one agent (Clara is the most documented — see agents/clara) from inbound call to outbound response. Map each step to a block in the diagram above. If you can’t map a step, that’s a gap in this page — file an issue.
For a guided walkthrough using Claude Code, see Querying the repo with Claude Code.
Sources
- Carrix proposal deck, “Architecture and Key Components” section
- BDO Unibank deck, same section
- agent-platform repo
Changelog
- 26 May 2026: Initial draft.