Memory layers

At a glance

  • Short-term: LangGraph State object — lives for the duration of one job execution
  • Checkpointing: AsyncPostgresSaver — persists state to PostgreSQL for resume/retry/HITL
  • Long-term: MemoryStore — key-value store with proposal/approval workflow
  • Vector DB: Pinecone (cloud) or FAISS (on-prem) for semantic retrieval

Why this matters

Memory is what separates a one-shot LLM call from an agent that maintains context across a conversation, resumes after interruption, and remembers a customer from yesterday’s call. When scoping a deployment, the memory architecture determines what the agent can “know” and for how long.

The three layers

Layer 1: LangGraph State (short-term)

Every agent execution shares a common State class (app/models/state.py):

FieldTypePurpose
messagesList[BaseMessage]Conversation history (inherited from MessagesState)
extrasDict[str, Any]Arbitrary key-value store for node-to-node data passing
responsesList[Dict]Agent responses collected during execution
input_dataDict[str, Any]Original input parameters
dataDict[str, Any]Accumulated data from tool calls and LLM responses
execution_variablesDictJob context: org_id, workflow_id, ticket_id, customer info

State uses OverwriteLastValue channels to prevent concurrent update errors when parallel nodes write to the same fields.

Layer 2: Checkpointing (persistence for resume)

The platform uses LangGraph’s AsyncPostgresSaver backed by psycopg_pool.AsyncConnectionPool:

  • What it stores: Full state snapshot after each node execution
  • Thread ID: Equals the job ID — each job gets its own checkpoint stream
  • Singleton: One checkpointer instance per process
  • Used for:
    • HITL resume: When a job is interrupted for human approval, the checkpoint allows resume_job() to pick up exactly where it left off
    • Retry: Failed jobs can restart from the last successful checkpoint
    • Follow-up context: resume_with_context() injects new information into a paused job

Layer 3: Long-term memory

MemoryStore (app/core/memory/MemoryStore.py):

  • Simple key-value store with a proposal/approval workflow
  • Functions: propose_memory_update(), approve_memory_update(), reject_memory_update(), get_memory_state()
  • Agents can propose storing information (e.g., customer preferences); a human or policy can approve/reject

Vector DB (for semantic retrieval):

  • Cloud: Pinecone — managed vector database, no infrastructure to maintain
  • On-prem: FAISS — Facebook’s local vector search library, runs entirely on customer hardware
  • Used for: RAG (retrieval-augmented generation), knowledge base search, document similarity

Worked example: HITL resume flow

Pinecone vs FAISS

FactorPinecone (cloud)FAISS (on-prem)
HostingManaged SaaSRuns on customer hardware
ScalingAuto-scalesManual — need to size GPU/CPU
CostPay per vector stored + queriesNo per-query cost, but hardware capex
Latency~50-100ms p95Depends on hardware — can be faster
Data residencyData stored in Pinecone’s cloudData stays on-prem
Best forCloud deployments, fast time-to-valueBanking, government, strict data residency

Sources

Changelog

  • 26 May 2026: Full content from GitHub repo exploration. State model, checkpointing, MemoryStore, vector DB comparison.