The agent-platform repo

At a glance

Repo: github.com/shipsy/agent-platform
Stack: FastAPI + LangGraph + LangChain + PostgreSQL + Redis + TaskIQ
Language: Python 3.13 (container runs 3.11-slim)
Three processes: API server, TaskIQ worker, TaskIQ scheduler
277 active branches as of May 2026

Why this matters

Every agent Shipsy deploys — WISMO, Clara, Atlas, freight quotation, address intelligence — runs on this single repo. Understanding its layout means you can trace any agent behavior from API call to LLM response to tool execution. You don’t need to be an engineer to navigate it; you need to know where things live.

Repository structure

agent-platform/
├── app/                        # Main application code
│   ├── api/                    # REST API routes
│   │   ├── dashboard/          # Dashboard endpoints (agent mgmt, jobs, HITL, policies)
│   │   ├── internal/           # Internal API (agent execution, creation, webhooks)
│   │   ├── internal_dashboard/ # Shipsy-employee-only ops (builder library)
│   │   ├── webhook/            # External vendor callbacks
│   │   ├── callback/           # External tool execution (HMAC auth)
│   │   └── public/             # Health checks (no auth)
│   ├── auth/                   # Authentication (ProjectX employee auth, API key, HMAC)
│   ├── communication/          # Multi-channel engine (7 channels, 10+ providers)
│   ├── config/                 # App config, rate limiting, response formatting
│   ├── core/                   # Domain logic — the heart of the platform
│   │   ├── agents/             # Agent base classes and types
│   │   ├── llm/                # LLM service + provider implementations
│   │   ├── mcp/                # MCP integration layer
│   │   ├── memory/             # MemoryStore (key-value with approval workflow)
│   │   ├── middleware/         # Node-level middleware (14 types)
│   │   ├── orchestration/      # AgentOrchestrator + node builders
│   │   ├── policy/             # Policy framework (5 categories)
│   │   └── tools/              # Tool service + registry
│   ├── db/                     # SQLAlchemy models, DAOs, checkpointer, sessions
│   ├── functions/              # Tool implementations (Python function tools)
│   ├── integrations/           # External: ProjectX, LIA, Atlas, geocoding
│   ├── mcp/                    # MCP server (FastMCP, mounted at /mcp)
│   ├── middleware/             # HTTP middleware (rate limiting, CORS, logging, New Relic)
│   ├── models/                 # Pydantic models (state, agent, workflow, job, task, policy)
│   ├── platform/               # Utilities (caching, Elasticsearch, encryption, webhooks)
│   ├── services/               # Business services (agent, job, task, HITL, policy)
│   ├── triggers/               # Workflow trigger engine
│   └── worker/                 # TaskIQ worker (background jobs, follow-ups, timeouts)
├── data/                       # Static configuration catalogs
│   ├── agent/agents.json       # 12 agent templates
│   ├── tool/tools.json         # 18 registered tools
│   ├── prompts/                # 16 system prompt files
│   ├── guardrail/              # 4 guardrail types
│   ├── sop/                    # Standard Operating Procedures
│   └── llm_models.json         # Supported LLM models
├── agents-helper/              # Internal dev docs (5 files)
├── sdd/                        # Software Design Documents (50+ feature HLDs)
├── scripts/                    # DB migration + table creation
├── tests/                      # Unit, integration, middleware, service tests
├── Dockerfile                  # Production container
├── Jenkinsfile                 # CI/CD → AWS ECR → ECS Fargate
├── docker-entrypoint.sh        # Starts all 3 processes
├── main.py                     # Application entrypoint
└── requirements.txt            # Python dependencies

Key architectural concepts

Agents are database-driven DAGs

Agents are not hardcoded Python classes. They are workflow definitions stored in PostgreSQL — directed acyclic graphs (DAGs) of nodes and edges. Clara, Vera, Nexa, Maya, Atlas are customer-facing brand names for workflows built from reusable templates.

Node types

Type	What it does
`start`	Entry point — receives input data
`agent`	Sub-agent with tools, system message, structured output
`llm`	Direct LLM call (no tool access)
`tool_node`	Execute a registered tool
`router`	Conditional branching (JMESPath-based edge conditions)
`condition`	Conditional logic
`prompt`	Prompt template node
`agent_graph`	Inline subgraph (nested workflow)
`end`	Terminal node

Execution model

1. API receives trigger → creates Job (status: QUEUED)
2. TaskIQ worker picks up job → status: RUNNING
3. AgentOrchestrator builds LangGraph StateGraph from workflow DAG
4. Each node executes through 14 middleware layers
5. Tasks created for each node execution (tracking cost, tokens, latency)
6. On completion → status: SUCCESS (or FAILED/INTERRUPTED for HITL)

API routes

Prefix	Auth	Purpose
`/api/internal`	API key + Shipsy employee	Agent execution, creation, HITL actions
`/api/dashboard`	ProjectX user auth	Agent management, jobs, policies, activity
`/api/internal-dashboard`	Shipsy employee Basic Auth	Builder library, metadata
`/api/webhook`	Per-provider HMAC	External vendor callbacks
`/api/callback`	Provider-specific	External tool execution
`/api/public`	None	Health checks
`/mcp`	API key (provisional)	MCP streamable HTTP

Running locally

Three processes in separate terminals:

# 1. Redis
redis-server
 
# 2. FastAPI server
uvicorn main:app --reload
 
# 3. TaskIQ worker (processes background jobs)
taskiq worker app.worker.broker:broker \
    --ack-type when_executed \
    --max-async-tasks 10 \
    --log-level INFO
 
# 4. (Optional) TaskIQ scheduler (periodic tasks)
taskiq scheduler app.worker.broker:scheduler

Access points:

API: http://localhost:8000
Swagger docs: http://localhost:8000/docs
Health check: http://localhost:8000/health

Key files to know

File	Why it matters
`app/core/orchestration/agent_orchestrator.py`	Core graph builder + executor — `register_agent()`, `run_agent()`, `resume_job()`
`app/models/state.py`	LangGraph state definition — every agent shares this State class
`data/agent/agents.json`	12 pre-built agent templates
`data/tool/tools.json`	18 registered tools
`data/prompts/*.txt`	System prompts for each agent persona
`data/guardrail/guardrails.json`	Safety guardrail definitions
`app/mcp/server.py`	MCP server integration
`main.py`	App startup, middleware registration, route mounting
`docker-entrypoint.sh`	How the 3 processes start in production
`agents-helper/`	Internal dev docs: base standards, LangGraph patterns, communication, testing

How to add an agent

Create a workflow definition (via Dashboard or API) with nodes and edges.
Each node references a node type and optionally an agent template from data/agent/agents.json.
Assign tools from data/tool/tools.json to agent nodes.
Write a system prompt (or use an existing one from data/prompts/).
Configure policies (retry, HITL, follow-up) at the node or workflow level.
Trigger via POST /api/trigger/workflow/{workflow_id}/run.

How to add an MCP tool

Add the tool definition to data/tool/tools.json with external_safe: true.
Implement the Python function in app/functions/.
The MCP server auto-registers all external_safe tools on startup.
Tool is now available over MCP at /mcp and via the HMAC external API.

Sources

agent-platform repo
agents-helper/ directory (5 internal dev docs)
sdd/ directory (50+ Software Design Documents)
See Architecture overview for how this fits the bigger picture

Changelog

26 May 2026: Full content from GitHub repo exploration. Directory structure, execution model, API routes, key files.

Eval framework Querying the repo with Claude Code