What this lesson covers.
This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. There is no audio narration for this module - it ships as text + interactive lesson only.
If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.
The lesson itself.
Diffusion Math, Slowly
Forward + reverse process · score matching · why noise schedules matter
Noising a clean image, one step at a time
q(xt|xt-1) = N(xt; √(1-βt) xt-1, βt I) adds a small amount of Gaussian noise at each timestep. After T=1000 steps with a properly chosen noise schedule, the image is indistinguishable from pure Gaussian noise. The forward process has no learnable parameters — it's a fixed corruption schedule the model never has to predict.The model only has to learn one thing: predict the noise
pθ(xt-1|xt) is what the U-Net learns. At each timestep, given the noisy image xt, predict the noise ε that was added. Subtract a fraction of that prediction, get a slightly less noisy image xt-1. Repeat T times. The loss is simply L = ||ε - εθ(xt, t)||² — mean squared error between the true noise and the predicted noise.Why noise prediction is equivalent to learning the data distribution
∇x log p(x) — the gradient of the log-probability of the data distribution. Score matching is a well-studied technique going back to Hyvärinen 2005. Diffusion models are score-based models in disguise. This connection unifies DDPM with the earlier NCSN family of models.Linear vs cosine vs sigmoid — the choice that affects everything
βt controls how fast information gets destroyed. The original DDPM paper used a linear schedule from β1=0.0001 to βT=0.02. iDDPM (Nichol & Dhariwal 2021) showed a cosine schedule preserves more signal in the early timesteps, leading to better samples. Modern systems use various variance-preserving / variance-exploding schedules. Practical takeaway: the schedule choice can change FID by several points.Further reading.
The canonical references for this module. External links open in a new tab.
What to read next.
Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.