LLMs in 15 minutes

The just-enough mental model for talking to customers about AI without saying anything embarrassing.

What an LLM is, in one sentence

A statistical model trained on a lot of text that predicts the next word given everything that came before. Scale that up and you get something that can carry on a coherent conversation, write code, summarise documents, and reason through problems.

That’s it. Everything else is an implication of this.

What follows from “predict the next word”

Property	Why
Sometimes confidently wrong (hallucinates)	Predicting plausible text ≠ predicting true text
Better with context	More input → more constraints on the prediction
Has a finite “context window”	Compute scales poorly with input length
Can be steered with examples (few-shot)	Pattern-matching is what it does
Costs ~$/million tokens, not $/query	Pricing reflects compute

Closed vs open-source

	Closed-source (OpenAI, Anthropic, Google)	Open-source (Llama, Mistral, Qwen)
Quality on hard tasks	Higher today	Catching up
Cost per million tokens	Higher	Lower (especially self-hosted)
Data residency	Their cloud	Your cloud / on-prem
Fine-tuning	Limited (their API)	Full control
Latency	Network round-trip	Can be local
When to pick	Customer is fine with cloud, wants best quality	Customer needs on-prem or data residency, or cost-sensitive at scale

Shipsy uses both. See Models — choosing & switching for how the platform routes between them.

Three numbers to know

Context window — how much input the model can see at once. GPT-4o: 128K tokens. Claude Sonnet: 200K. Gemini 1.5: 1M+. (A token ≈ ¾ of a word.)
Cost per million tokens — input is cheap, output is ~3-5× more. Plan budgets accordingly.
Latency — first-token latency vs. full-response latency. For voice agents, first-token matters most.

What LLMs are bad at

Math beyond simple arithmetic (use a tool / Python)
Anything requiring up-to-the-second data (use a tool / API)
Following long, complex instructions perfectly (break it down, add structure)
Doing the same thing twice the same way (temperature, sampling)
Knowing what they don’t know (they’ll often guess)

The job of an agent is to compose LLMs with tools and structure to compensate for these.

What CS folks actually need to remember

The model isn’t magic. It’s a really good text-prediction engine wrapped in helpful API plumbing.
Quality, cost, latency, and data residency are levers — pick the right one for the customer’s situation.
Hallucination is mitigated by grounding (RAG), guardrails, and human-in-the-loop. See RAG, memory & vector DBs and Guardrails.

Sources

Changelog

26 May 2026: Initial draft.

GTM positioning Chatbot vs agent vs RPA