AI Concepts

AI Concepts

AI Engineering Concepts

Plain-English answers to the questions every AI engineer keeps getting asked.

A growing reference library for the patterns that actually show up in production LLM systems: tokens and cost math, prompting as code, RAG and retrieval, agents and tool use, evaluation, and the production layer (latency, caching, routing, security, observability). Use it alongside the roadmap, or as a quick lookup before an interview.

11Topics
1Sections
11Live now
11 of 11 topics

Foundations: working with LLMs

11 topics
#1 Live

Tokens and the context window: the unit and the budget

Models do not see characters or words. They see tokens, and the context window is the budget you have to spend on them.

#2 Live

The model is not the chat product

ChatGPT and Claude.ai do a lot of work the raw model does not. Knowing the difference saves you from reinventing it badly.

#3 Live

System, user, assistant: the three roles in every chat call

Every API call is a list of role-tagged messages. The roles are not decoration; the model treats each one differently.

#4 Live

Temperature, top-p, top-k: three knobs people keep confusing

Three sampling parameters with overlapping effects. Only two of them earn their place in production.

#5 Live

Streaming vs blocking: the UX trick that changes nothing about cost

Streaming makes your AI feature feel twice as fast without changing the work the model does.

#6 Live

Token cost math: estimating the bill before you ship

Input tokens and output tokens cost different amounts. Five minutes with a calculator avoids most cost surprises.

#7 Live

TTFT vs total latency: two numbers, two different problems

Total latency is what the bill cares about. Time-to-first-token is what the user feels.

#8 Live

Embeddings and cosine similarity: turning text into a number you can compare

An embedding is a vector. Cosine similarity asks 'do these two vectors point the same way?' That is the whole story behind half of modern AI search.

#9 Live

Picking a model: the honest map of the big four

Claude, GPT, Gemini, Llama. Each one is strong somewhere and weak somewhere else. There is no universally right pick.

#10 Live

Rate limits, retries, and backoff: the boring layer that keeps you online

Every provider has limits. The difference between a flaky feature and a reliable one is a hundred lines of retry logic.

#11 Live

Playground vs production: why the prompt that worked breaks in code

Provider playgrounds quietly do five things your code does not. Knowing which ones saves a day of debugging.