Learning
Evidence-based learning
- Spaced practice
- 5x2 hours is better than 2x5 hours
- Include some older material in every new lesson
- One of the few advantages of traditional lecture-based classes over just-in-time online courses
- Retrieval practice
- Practice using information in the context in which it will actually be used
- Repeated testing improves recall
- Elaboration
- Explain things in detail to yourself or to others
- Writing a good prompt is like asking a good question: it sometimes answers itself
- Interleaving
- Randomize the order in which you practice things to avoid training yourself to recall them in only one order
- A-B-C-A-B-C is better than A-A-B-B-C-C, and A-C-B-C-A-B is better still
- Concrete examples
- Novices don’t know enough yet to make generalizations concrete
- So ask for examples, and ask for specifics about the general principles each one embodies
- Dual coding
- Combine words and visuals
- Today's LLMs struggle with visuals
LLMs as tutors
- The appeal is obvious
- Ask as many questions as you want, whenever you want
- Ask for a different explanation if the first doesn't make sense
- Don't feel stupid in front of other people
- The limitations are equally obvious
- The tutor makes things up
- You never learn how to do things yourself
Getting explanations at the right level
- LLMs calibrate their explanations to cues in the prompt
- "I'm an undergraduate" produces shorter sentences and more analogies
- "I have a Ph.D. in statistics" produces fewer hand-waves and more precision
- Be explicit about what you already know and what you do not
- Weak: "Explain p-values."
- Stronger: "I know a p-value is a probability but I don't understand what it's a probability of. Explain in two paragraphs without assuming I know calculus."
- Ask for multiple explanations if the first one does not click
- "Try a different analogy" is a useful prompt
- Ask for a concrete example before an abstract definition
- Most people understand examples before principles, not the other way around
Using examples and analogies
- Prompt the LLM to show you before it tells you
- "Give me a worked example, then explain the rule it illustrates"
- Prompt it for the example to use data you already know
- "Show me how to compute the mean bill length per species using the penguins dataset"
- Prompt it for a bad example alongside a good one
- "Show me a histogram that is badly binned and one that is well-binned, using the same data"
- Analogies break down: prompt the LLM to explain where its analogy stops working
- "You compared p-values to fire alarms. Where does that analogy fail?"
- An example you can run is more useful than one you can only read
- Paste the code into a notebook and make sure it actually does what the LLM claimed
Test your own understanding
- Explaining something is harder than recognizing a correct explanation
- Write a short paragraph in your own words, then prompt the LLM to check it
- Prompt the LLM to quiz you
- "Ask me three questions about what I just read, then tell me which answers I got wrong"
- Prompt for common misconceptions
- "What do beginners usually get wrong about confidence intervals?"
- Resist the urge to look at the LLM's answer before writing your own
- The moment you read the answer, you will convince yourself you already knew it
What LLMs get wrong as tutors
- Hallucination in explanations: technically fluent but factually wrong
- Particularly common in specialist or niche topics
- Reminder: sycophancy is telling you what you want to hear
- If you say "so it's basically the same as X, right?", the LLM is inclined to agree
- Ask "Is there anything wrong with this description?" rather than "Is this right?"
- Skipping steps
- LLMs often omit the step they consider obvious, which is often exactly the step you are confused about
- Ask "Walk me through this step by step and don't skip anything, even the trivial parts"
- Alternatively, give overlong answers that bury the key point
- Ask for a one-sentence version before the full explanation
- No memory across sessions
- A new conversation starts with no record of what you covered before
- You have to re-establish context every time
Checking what you learned
- An explanation that makes sense in the moment may still be wrong
- The feeling of understanding and actual understanding are not the same thing
- Cross-check key claims against a textbook, documentation, or a reliable reference
- Run the code: if the LLM explains that a function works a certain way, test it on a small example
- Ask a second source, including a different LLM
- Two independent hallucinations that agree are still hallucinations
- Try to use the concept before deciding you understand it
- Explaining penguin bill length distributions becomes easier after looking at the actual histogram
- The hardest cases to catch are explanations that are roughly right but subtly misleading
- Checking for "plausible but wrong" requires knowing enough to be suspicious
Examples
Get an explanation of what standard deviation measures
Standard deviation appears in almost every analysis you will run, but "the spread around the mean" is not precise enough to be useful. You can ask an LLM to explain it at your level, but you need to prompt it specifically enough to get a genuinely useful answer, and then you need to check whether the explanation is correct before you rely on it.
A prompt that sends the LLM in the wrong direction
This prompt produces a generic answer calibrated to no one in particular:
Explain standard deviation.
The LLM will give you a textbook definition, probably followed by a formula using sigma notation, and possibly followed by an example that means nothing to you because it uses abstract numbers rather than data you have seen. Worse, the LLM will often skip the step you are confused about because it considers that step obvious. Fix it by stating what you already know, what you do not understand, and what kind of example you want.
A prompt that works
I have taken one semester of statistics and I know what a mean is. Explain standard deviation in two paragraphs using penguin bill lengths as the example. Do not use a formula. Describe concretely what the number measures and what a larger versus smaller value tells me about the data.
A typical response:
Imagine you have measured the bill lengths of 152 Adelie penguins. The mean bill length is about 38 mm. Some penguins have bills very close to 38 mm, and others have bills several millimetres longer or shorter. Standard deviation measures how far a typical penguin's bill sits from that average. If the standard deviation is 2 mm, most penguins have bills within roughly 2 mm of 38 mm, and very few are more than 4 mm away.
A larger standard deviation indicates the measurements are spread more widely around the mean. If Gentoo penguins had a standard deviation of 5 mm in bill length, you would find many more Gentoos whose bills are 7 or 8 mm from the average than you would among Adelies. Standard deviation does not tell you anything about the shape of the distribution only how tightly or loosely the values cluster around the center.
Checking the explanation
Ask the LLM a deliberately skeptical follow-up: "Is there anything misleading about that explanation?" Compare what it says to a Wikipedia definition. Finally, ask the LLM "Where does your explanation break down?" to see whether it can identify its own limits.
Write your own definition before asking the LLM
Writing your own explanation of a concept before reading an LLM's version is more useful than reading first. Once you see an explanation, you will feel like you already understood it, but this feeling is often wrong. Writing first reveals exactly where your understanding breaks down.
A prompt that sends the LLM in the wrong direction
Reading the LLM's answer first defeats the purpose:
Explain confidence intervals to me, then ask me if I understood.
If you read the explanation before writing anything yourself, you will anchor your understanding on the LLM's version rather than discovering what you actually know. Fix it by writing first, then asking for feedback on what you wrote.
A prompt that works
Write a one-paragraph definition of confidence intervals in your own words before reading any further. Then send this prompt:
Here is my explanation of confidence intervals: [paste your paragraph]. Identify one thing I got right and one thing I got wrong or left out. Do not rewrite my explanation; just point to the specific gap.
A typical response identifies the specific claim that is incorrect, often the interpretation of "95% confidence", and explains why it is wrong without replacing the student's own language.
Checking the output
After getting the LLM's feedback, revise your paragraph and ask it to check the revision, then look up the definition of a confidence interval on Wikipedia. If your revised version matches the LLM's feedback but not Wikipedia, the LLM's feedback may have introduced a different error.
Compare badly-binned and well-binned histograms
The number of bins in a histogram changes what the distribution looks like. Too few bins hide structure; too many bins make noise look like signal. Seeing both extremes on the same data makes the difference concrete in a way that a verbal description cannot.
A prompt that works
Generate two Altair histograms of
bill_length_mmfor Adelie penguins: one with 3 bins and one with 30 bins. Explain what information each version obscures.
A typical response produces both histograms and notes that 3 bins merge distinct groups while 30 bins fragment continuous regions into jagged spikes driven by sampling noise.
Checking the output
Run both histograms and look at them side by side. The 3-bin version should make the distribution look nearly flat; the 30-bin version should show many individual spikes. Then ask the LLM: "How would I choose the number of bins for a dataset I have never seen before?" A good answer mentions rules of thumb such as the square root of sample size or Sturges' formula, but notes that these are not definitive answers.
Find where an analogy breaks down
Analogies make abstract concepts tangible, but every analogy has a point where it misleads. Asking an LLM to identify the limits of its own analogy tests whether it understands the concept well enough to know where the comparison fails.
A prompt that works
You compared p-values to fire alarms: a small p-value is the alarm going off. Explain where that analogy breaks down. Be specific about a case where the analogy would lead a student to draw a wrong conclusion.
A typical response identifies at least one key failure: a fire alarm has a fixed threshold, but a p-value threshold is chosen by the researcher, and different thresholds give different conclusions from the same data.
Checking the output
Evaluate whether the LLM's critique identifies the most important limitation. Another common failure of the fire alarm analogy is that a fire alarm either rings or it does not, while a p-value is a continuous number: a p-value of 0.049 and one of 0.051 describe nearly identical evidence, but the analogy suggests they mean completely different things. If the LLM did not mention this, ask it explicitly: "What happens to the analogy when p=0.049 versus p=0.051?"
Identify common misconceptions about a statistical concept
Knowing what you are likely to get wrong is as useful as knowing what is correct. An LLM can list common misconceptions about a concept, but you must verify those descriptions against a reliable source, because the LLM can describe a misconception incorrectly (i.e., it can be wrong about why it's wrong).
A prompt that works
What do beginners most commonly misunderstand about the difference between a confidence interval and a prediction interval? List two misconceptions and explain why each one is wrong.
A typical response notes that students often believe a 95% confidence interval contains 95% of the data (it does not—a prediction interval does), and that students often treat confidence intervals as probability statements about a fixed parameter (a Bayesian credible interval does this—a frequentist confidence interval does not).
Checking the output
Look up one of the two misconceptions in a statistics textbook or the relevant Wikipedia article and confirm that the LLM's description of it. Send yourself an email reminder to re-check your understanding the next day.
Ask for a brief explanation before the full one
A one-sentence explanation of a concept is more useful than it looks: if you cannot summarize it in one sentence, you probably do not understand it well enough to apply it. Asking for the brief version first also gives you something to check the longer explanation against.
A prompt that works
Explain linear regression in exactly one sentence. Then give the full explanation in one paragraph using the relationship between bill length and body mass in the penguins dataset as the example.
A typical one-sentence response: "Linear regression finds the line through a scatterplot that minimizes the sum of squared vertical distances from each point to the line."
Checking the output
Read the one-sentence explanation and ask if it contains enough information to be useful on its own. Can you use it to identify whether linear regression is the right tool for a problem? Then read the full explanation and check whether every claim in it is consistent with the one-sentence version. If the full explanation introduces something that contradicts or cannot be reconciled with the one-sentence version, one of the two is wrong.
Re-establish context in a new session
LLMs have no memory across sessions (unless you save and reload log files yourself). Understanding how much context you need to re-establish makes you a more effective user of these tools.
A prompt that works
Start a fresh LLM session with no history, then send this prompt:
Continue explaining
group_byin Polars where we left off.
Record what the LLM does. Then send a follow-up that re-establishes context:
I am learning Polars for the first time. I understand that
group_bysplits the data into groups, but I do not understand the.agg()call that comes after it. Explain what.agg()does and why it is necessary.
Checking the output
Compare the two responses. The first should be confused or generic because it has no context. The second should be specific and useful because you provided the missing background. Count the facts you had to re-provide, such as your experience level, the library, the specific concept, and your existing knowledge, and consider whether there is a more efficient way to establish context at the start of a session.
Test for sycophancy
LLMs are trained to produce responses that humans rate positively. This makes them inclined to agree with you, even (or especially) when you are wrong. Testing this directly with a statement you know is false shows you how much to trust agreement.
A prompt that works
Send this prompt to an LLM:
Standard deviation is basically the same as variance, right?
Then, in a separate session, send this version:
Is there anything wrong with saying that standard deviation is basically the same as variance?
Compare the two responses.
Checking the output
The first prompt often produces agreement or mild qualification ("they are related…"). The second prompt, which explicitly invites disagreement, more often produces a clear correction: standard deviation is the square root of variance, and substituting one for the other in an interpretation will be wrong. If both responses agreed with the false statement, the LLM is exhibiting strong sycophancy and you should treat its confirmations of your understanding with more skepticism.
Ask about a function then verify with a query
An LLM's explanation of how a function works is only as useful as your ability to run it. Reading an explanation is passive; running a query that should fail if the explanation is wrong makes the learning active and the verification concrete.
A prompt that works
Explain what the SQL
coalescefunction does. Then write a SQLite query that usescoalesceon the penguins table to returnbody_mass_gfor each penguin, replacing any null value with -1. Show me the first five rows of the result.
A typical response explains that coalesce returns the first non-null argument from a list,
then produces:
select species, coalesce(body_mass_g, -1) as body_mass_g
from penguins
limit 5;
Checking the output
Run the query and find a row where body_mass_g was null in the original data.
Confirm it now shows -1 in the result.
Then run select count(*) from penguins where body_mass_g is null to count the original nulls
and select count(*) from penguins where coalesce(body_mass_g, -1) = -1 to count the replacements.
If the two counts differ, the coalesce expression is not doing what the LLM described.
Compute a concept before deciding you understand it
Reading that "the median is the middle value" is not the same as understanding how the median behaves when the data are skewed, what happens with an even number of values, or how it differs from the mean. Computing the concept on real data reveals details that a verbal explanation leaves out.
A prompt that works
Explain in one sentence how the median differs from the mean. Then write Polars code that computes both the median and the mean of
bill_length_mmfor each penguin species separately.
A typical response explains that the median is the middle value when data are sorted, making it less sensitive to outliers than the mean, and produces:
import polars as pl
df = pl.read_csv("_extras/penguins.csv")
result = df.group_by("species").agg([
pl.col("bill_length_mm").median().alias("median"),
pl.col("bill_length_mm").mean().alias("mean"),
])
Checking the output
Run the code and compare the median and mean for each species. If the median and mean are nearly identical for all three species, the bill length distribution is roughly symmetric within each species. If they differ by more than 0.5 mm for any species, ask the LLM whether that species' distribution is skewed, and confirm by looking at a histogram. This is the kind of detail that "the median is less sensitive to outliers" does not make concrete on its own.
Exercises
- Find a concept in statistics that you find confusing
(e.g., confidence intervals, p-values, or the difference between variance and standard deviation)
- Prompt an LLM to explain it to someone who has taken one semester of statistics
- Write a one-paragraph explanation in your own words
- Prompt the LLM to identify any errors in your explanation
- Prompt an LLM to explain how
group_byworks in Polars, using the penguins dataset as an example- Run the example it provides
- Ask it to re-explain the step you found hardest to follow, using a different analogy
- Prompt an LLM to quiz you on the content of the "How LLMs Work" lesson in this tutorial
- Answer three questions without looking back at the lesson
- Review the LLM's feedback and record which concepts you actually understood versus which ones you only recognized
- Prompt an LLM "Is it accurate to say that an LLM looks up facts from its training data?"
- Then ask "Is there anything misleading about that description?"
- Compare the two responses and explain why the second question changed the answer
- Prompt an LLM to explain a statistical concept, then ask the same question to a different LLM
- Identify one claim both LLMs made
- Verify that claim against a textbook or the relevant Wikipedia article
- Prompt an LLM to walk you through computing the standard deviation of bill lengths
for Adelie penguins step by step, without skipping any steps
- Identify any step it omitted or got wrong
- Verify the final answer by computing it yourself using the penguins dataset