An LLM application that runs an observe-plan-act loop, using tool
calls to gather information and take actions over multiple steps
rather than producing a single response.
B
Byte-Pair Encoding (BPE)
A tokenization algorithm that repeatedly merges the most frequent
adjacent byte or character pair, building a vocabulary of subword
units used to split text into tokens.
C
confabulation
Another term for hallucination; emphasizes that the model is
producing coherent-sounding but fabricated content rather than
retrieving stored facts. The two words are used interchangeably
in the literature.
context window
The maximum number of tokens a language model can process in a
single call, including both the prompt (system message,
conversation history, retrieved documents) and the response.
D
data dictionary
A document or table that describes each column in a dataset: its
name, data type, units, allowed values, and meaning. A good data
dictionary lets a new user understand the dataset without reading
the original collection code or paper.
E
embedding
A numeric vector that represents a piece of text in a
high-dimensional space such that semantically similar texts have
similar vectors; used in retrieval-augmented generation to find
relevant documents.
exploratory data analysis
The process of summarizing, visualizing, and investigating a
dataset to understand its structure, identify anomalies, and
uncover patterns before formal modeling or hypothesis testing.
exponential backoff
A retry strategy in which the delay between successive attempts
increases exponentially (e.g., 1s, 2s, 4s, 8s) to reduce load on a
service that has returned a rate-limit or server error.
F
few-shot prompting
A prompting technique that includes one or more example
input-output pairs in the prompt to show the model the desired
format or behavior before posing the actual query.
fine-tuning
Additional training of a pre-trained model on a smaller, curated
dataset to improve its performance on a specific task or domain;
includes supervised fine-tuning (SFT) and reinforcement learning
from human feedback (RLHF).
frontier model
One of the most capable large language models currently available,
typically from a major AI lab and trained at enormous scale; the
term is relative and shifts as new models are released.
G
H
hallucination
The generation of plausible-sounding but factually incorrect text
by a language model, also called confabulation; occurs because the
model generates the statistically most likely continuation of a
prompt rather than verifying facts.
I
J
JSON schema
A vocabulary for describing the structure and constraints of JSON
documents, including required fields, data types, and allowed
values; used in MCP tool definitions to specify the expected
format of inputs and outputs.
JSON-RPC
A lightweight remote procedure call protocol that encodes requests
and responses as JSON; used by the Model Context Protocol to let
LLM clients call tools on a local or remote server.
K
L
linter
A tool that analyzes source code without running it and reports
style violations, unused imports, undefined variables, and other
common mistakes.
Lorenz curve
A plot of the cumulative share of something held by the bottom x
fraction of the population (sorted from least to most); a
perfectly equal distribution produces the 45-degree diagonal,
while any inequality bows the curve below it.
M
MCP server
A program that exposes tools, data sources, or services to AI
models in a standardized way, allowing the model to interact with
external systems like databases and APIs.
Model Context Protocol (MCP)
An open standard using JSON-RPC that allows LLM applications to
connect to external tools and data sources — such as databases,
file systems, and web search — through a common interface.
model parameters
The numeric weights of a neural network, learned during training;
model size is typically measured in the number of parameters, with
current frontier models having hundreds of billions.
N
O
P
pre-training
The initial training phase in which a language model learns to
predict the next token on a large text corpus; produces a
general-purpose model that is subsequently refined by fine-tuning.
precision and recall
Two measures of a classifier's accuracy on a particular class.
Precision is the fraction of positive predictions that are
correct; recall is the fraction of actual positives that were
found. A classifier can improve one at the expense of the other,
so both are needed to evaluate performance honestly.
Q
Q-Q plot
A graph that assesses whether a dataset follows a theoretical
distribution by plotting its quantiles against the expected
quantiles of that distribution, with points falling along a
straight line indicating a good fit.
R
rate limit
A constraint imposed by an API provider on how many requests or
tokens a client may submit per unit of time; exceeded limits
return HTTP 429 errors and require retrying with exponential
back-off.
reinforcement learning from human feedback (RLHF)
A fine-tuning technique in which human raters rank model outputs
and the model is updated to produce higher-ranked responses; used
to improve helpfulness and reduce harmful outputs.
retrieval-augmented generation (RAG)
A technique that retrieves relevant document chunks from an
external source at query time and inserts them into the prompt,
grounding the model's response in accurate source text and
reducing hallucination.
role prompting
A prompting technique that instructs the model to adopt a specific
persona or area of expertise (e.g., "You are an expert
statistician") to improve domain-specific responses.
S
self-attention
A mechanism in transformer models in which each token is weighted
by its relevance to every other token in the context window; the
weights are learned during training and allow the model to capture
long-range dependencies in text.
skill
A Markdown file containing a system prompt that specializes an LLM
application's behavior for a recurring task. Claude Code stores
skills in ~/.claude/ and are available across all projects.
sycophancy
The tendency of an LLM to agree with or flatter the user rather than
provide accurate information; a side effect of training with reinforcement
learning from human feedback, where raters tend to prefer agreeable
responses over correct but contradictory ones.
T
temperature
A parameter that scales the probability distribution over next
tokens before sampling; temperature 0 makes output deterministic,
temperature 1 samples proportionally, and values above 1 produce
more varied and less coherent output.
token
The basic unit of text processed by a language model; roughly 4
characters of English on average, produced by splitting input text
with a tokenization algorithm such as Byte-Pair Encoding.
tokenization
The process of splitting input text into tokens before feeding it
to a language model, typically using an algorithm such as
Byte-Pair Encoding; different models use different tokenizers and
may produce different token counts for the same text.
top-p sampling
A sampling strategy, also called nucleus sampling, that restricts
token selection to the smallest set of tokens whose cumulative
probability exceeds a threshold p (e.g., 0.9), preventing very
low-probability tokens from being chosen.
training cutoff
The date after which new information is not reflected in a model's
weights; events, publications, and API changes after this date are
unknown to the model, which may still generate confident-sounding
text about them by extrapolating from prior patterns.
transformer
A neural network architecture based on stacked self-attention
layers that underlies most modern large language models; each
layer applies self-attention followed by a feed-forward network,
allowing the model to process all tokens in the context window in
parallel.
U
V
vector database
A database that stores documents as embedding vectors and supports
fast retrieval of the most semantically similar documents for a
given query; used as the external knowledge store in
retrieval-augmented generation.