RAG, memory & vector DBs
How an agent remembers — within a conversation and across conversations.
Two kinds of memory
| Short-term | Long-term | |
|---|---|---|
| Scope | Current conversation | Everything ever |
| Storage | In-process (RAM) | Vector DB (Pinecone, FAISS, etc.) |
| Cleared when | Session ends | Never (unless you delete) |
| Used for | ”What did the user say 3 turns ago?" | "What’s the customer’s standard SOP for refunds?” |
| Cost | Free | Storage + retrieval per query |
Most agents need both.
What RAG is
Retrieval-Augmented Generation. Before the LLM generates a response, retrieve relevant context from a knowledge store and put it in the prompt.
Why this matters: without RAG, the model only knows what was in its training data (cut off months/years ago) and what’s in the prompt right now. RAG lets the agent pull in customer-specific SOPs, product docs, recent updates, prior conversations.
Vector DBs in 60 seconds
A vector DB stores embeddings (numerical fingerprints of text) and lets you find “most similar” entries fast. Conceptually: a search engine that finds things by meaning, not keywords.
| Vector DB | When Shipsy uses it |
|---|---|
| Pinecone | Default for cloud deployments |
| FAISS | Default for on-prem / open-source deployments |
| Graph DB | When relationships between entities matter more than similarity |
See Memory layers for how Shipsy wires this up.
When RAG fails
| Failure mode | What to do |
|---|---|
| Retrieved chunks are irrelevant | Improve chunking strategy; use hybrid keyword+vector search |
| Right info exists but isn’t retrieved | Re-index; tune similarity threshold; add metadata filters |
| LLM ignores retrieved context | Move retrieved content closer to the user query; use explicit instructions |
| Retrieved content contradicts itself | Source-of-truth deduplication; recency filters |
What RAG is not
- Not a replacement for fine-tuning. Different tools. RAG = “here’s the current state of the world.” Fine-tuning = “here’s how I want you to behave.”
- Not a fix for a bad prompt. Garbage in, garbage out, even with great retrieval.
- Not free. Embedding costs, storage costs, query latency. Budget for them.
Sources
- Pinecone docs on RAG
- See Memory layers for Shipsy-specific implementation
Changelog
- 26 May 2026: Initial draft.