RAG, memory & vector DBs

How an agent remembers — within a conversation and across conversations.

Two kinds of memory

	Short-term	Long-term
Scope	Current conversation	Everything ever
Storage	In-process (RAM)	Vector DB (Pinecone, FAISS, etc.)
Cleared when	Session ends	Never (unless you delete)
Used for	”What did the user say 3 turns ago?"	"What’s the customer’s standard SOP for refunds?”
Cost	Free	Storage + retrieval per query

Most agents need both.

What RAG is

Retrieval-Augmented Generation. Before the LLM generates a response, retrieve relevant context from a knowledge store and put it in the prompt.

Why this matters: without RAG, the model only knows what was in its training data (cut off months/years ago) and what’s in the prompt right now. RAG lets the agent pull in customer-specific SOPs, product docs, recent updates, prior conversations.

Vector DBs in 60 seconds

A vector DB stores embeddings (numerical fingerprints of text) and lets you find “most similar” entries fast. Conceptually: a search engine that finds things by meaning, not keywords.

Vector DB	When Shipsy uses it
Pinecone	Default for cloud deployments
FAISS	Default for on-prem / open-source deployments
Graph DB	When relationships between entities matter more than similarity

See Memory layers for how Shipsy wires this up.

When RAG fails

Failure mode	What to do
Retrieved chunks are irrelevant	Improve chunking strategy; use hybrid keyword+vector search
Right info exists but isn’t retrieved	Re-index; tune similarity threshold; add metadata filters
LLM ignores retrieved context	Move retrieved content closer to the user query; use explicit instructions
Retrieved content contradicts itself	Source-of-truth deduplication; recency filters

What RAG is not

Not a replacement for fine-tuning. Different tools. RAG = “here’s the current state of the world.” Fine-tuning = “here’s how I want you to behave.”
Not a fix for a bad prompt. Garbage in, garbage out, even with great retrieval.
Not free. Embedding costs, storage costs, query latency. Budget for them.

Sources

Pinecone docs on RAG
See Memory layers for Shipsy-specific implementation

Changelog

26 May 2026: Initial draft.

Orchestration patterns Tools & MCP