AI Concepts
Plain-English answers to the questions every AI engineer keeps getting asked.
A growing reference library for the patterns that actually show up in production LLM systems: tokens and cost math, prompting as code, RAG and retrieval, agents and tool use, evaluation, and the production layer (latency, caching, routing, security, observability). Use it alongside the roadmap, or as a quick lookup before an interview.
Foundations: working with LLMs
11 topicsTokens and the context window: the unit and the budget
Models do not see characters or words. They see tokens, and the context window is the budget you have to spend on them.
The model is not the chat product
ChatGPT and Claude.ai do a lot of work the raw model does not. Knowing the difference saves you from reinventing it badly.
System, user, assistant: the three roles in every chat call
Every API call is a list of role-tagged messages. The roles are not decoration; the model treats each one differently.
Temperature, top-p, top-k: three knobs people keep confusing
Three sampling parameters with overlapping effects. Only two of them earn their place in production.
Streaming vs blocking: the UX trick that changes nothing about cost
Streaming makes your AI feature feel twice as fast without changing the work the model does.
Token cost math: estimating the bill before you ship
Input tokens and output tokens cost different amounts. Five minutes with a calculator avoids most cost surprises.
TTFT vs total latency: two numbers, two different problems
Total latency is what the bill cares about. Time-to-first-token is what the user feels.
Embeddings and cosine similarity: turning text into a number you can compare
An embedding is a vector. Cosine similarity asks 'do these two vectors point the same way?' That is the whole story behind half of modern AI search.
Picking a model: the honest map of the big four
Claude, GPT, Gemini, Llama. Each one is strong somewhere and weak somewhere else. There is no universally right pick.
Rate limits, retries, and backoff: the boring layer that keeps you online
Every provider has limits. The difference between a flaky feature and a reliable one is a hundred lines of retry logic.
Playground vs production: why the prompt that worked breaks in code
Provider playgrounds quietly do five things your code does not. Knowing which ones saves a day of debugging.
No topics match these filters
Try a different search term or clear the filters.