Memory layers

At a glance

Short-term: LangGraph State object — lives for the duration of one job execution
Checkpointing: AsyncPostgresSaver — persists state to PostgreSQL for resume/retry/HITL
Long-term: MemoryStore — key-value store with proposal/approval workflow
Vector DB: Pinecone (cloud) or FAISS (on-prem) for semantic retrieval

Why this matters

Memory is what separates a one-shot LLM call from an agent that maintains context across a conversation, resumes after interruption, and remembers a customer from yesterday’s call. When scoping a deployment, the memory architecture determines what the agent can “know” and for how long.

The three layers

Layer 1: LangGraph State (short-term)

Every agent execution shares a common State class (app/models/state.py):

Field	Type	Purpose
`messages`	List[BaseMessage]	Conversation history (inherited from `MessagesState`)
`extras`	Dict[str, Any]	Arbitrary key-value store for node-to-node data passing
`responses`	List[Dict]	Agent responses collected during execution
`input_data`	Dict[str, Any]	Original input parameters
`data`	Dict[str, Any]	Accumulated data from tool calls and LLM responses
`execution_variables`	Dict	Job context: org_id, workflow_id, ticket_id, customer info

State uses OverwriteLastValue channels to prevent concurrent update errors when parallel nodes write to the same fields.

Layer 2: Checkpointing (persistence for resume)

The platform uses LangGraph’s AsyncPostgresSaver backed by psycopg_pool.AsyncConnectionPool:

What it stores: Full state snapshot after each node execution
Thread ID: Equals the job ID — each job gets its own checkpoint stream
Singleton: One checkpointer instance per process
Used for:
- HITL resume: When a job is interrupted for human approval, the checkpoint allows resume_job() to pick up exactly where it left off
- Retry: Failed jobs can restart from the last successful checkpoint
- Follow-up context: resume_with_context() injects new information into a paused job

Layer 3: Long-term memory

MemoryStore (app/core/memory/MemoryStore.py):

Simple key-value store with a proposal/approval workflow
Functions: propose_memory_update(), approve_memory_update(), reject_memory_update(), get_memory_state()
Agents can propose storing information (e.g., customer preferences); a human or policy can approve/reject

Vector DB (for semantic retrieval):

Cloud: Pinecone — managed vector database, no infrastructure to maintain
On-prem: FAISS — Facebook’s local vector search library, runs entirely on customer hardware
Used for: RAG (retrieval-augmented generation), knowledge base search, document similarity

Worked example: HITL resume flow

Pinecone vs FAISS

Factor	Pinecone (cloud)	FAISS (on-prem)
Hosting	Managed SaaS	Runs on customer hardware
Scaling	Auto-scales	Manual — need to size GPU/CPU
Cost	Pay per vector stored + queries	No per-query cost, but hardware capex
Latency	~50-100ms p95	Depends on hardware — can be faster
Data residency	Data stored in Pinecone’s cloud	Data stays on-prem
Best for	Cloud deployments, fast time-to-value	Banking, government, strict data residency

Sources

agent-platform repo: app/models/state.py
agent-platform repo: app/db/checkpointer.py
agent-platform repo: app/core/memory/MemoryStore.py
See Deployment modes for Pinecone vs FAISS in each deployment mode
See The agent-platform repo for the full directory structure

Changelog

26 May 2026: Full content from GitHub repo exploration. State model, checkpointing, MemoryStore, vector DB comparison.

Models — choosing & switching Tools & MCP integrations