2 · AI & agentsRAG, memory & vector DBs

RAG, memory & vector DBs

How an agent remembers — within a conversation and across conversations.

Two kinds of memory

Short-termLong-term
ScopeCurrent conversationEverything ever
StorageIn-process (RAM)Vector DB (Pinecone, FAISS, etc.)
Cleared whenSession endsNever (unless you delete)
Used for”What did the user say 3 turns ago?""What’s the customer’s standard SOP for refunds?”
CostFreeStorage + retrieval per query

Most agents need both.

What RAG is

Retrieval-Augmented Generation. Before the LLM generates a response, retrieve relevant context from a knowledge store and put it in the prompt.

Why this matters: without RAG, the model only knows what was in its training data (cut off months/years ago) and what’s in the prompt right now. RAG lets the agent pull in customer-specific SOPs, product docs, recent updates, prior conversations.

Vector DBs in 60 seconds

A vector DB stores embeddings (numerical fingerprints of text) and lets you find “most similar” entries fast. Conceptually: a search engine that finds things by meaning, not keywords.

Vector DBWhen Shipsy uses it
PineconeDefault for cloud deployments
FAISSDefault for on-prem / open-source deployments
Graph DBWhen relationships between entities matter more than similarity

See Memory layers for how Shipsy wires this up.

When RAG fails

Failure modeWhat to do
Retrieved chunks are irrelevantImprove chunking strategy; use hybrid keyword+vector search
Right info exists but isn’t retrievedRe-index; tune similarity threshold; add metadata filters
LLM ignores retrieved contextMove retrieved content closer to the user query; use explicit instructions
Retrieved content contradicts itselfSource-of-truth deduplication; recency filters

What RAG is not

  • Not a replacement for fine-tuning. Different tools. RAG = “here’s the current state of the world.” Fine-tuning = “here’s how I want you to behave.”
  • Not a fix for a bad prompt. Garbage in, garbage out, even with great retrieval.
  • Not free. Embedding costs, storage costs, query latency. Budget for them.

Sources

Changelog

  • 26 May 2026: Initial draft.