Demo · Module 14 · Interactive

Drag noise away. Watch an image
step out of static.

Stylized reverse diffusion process
T=1000 (pure noise) → T=0 (final image)
50 denoising steps in real diffusion

latent · 256×256 T = 1000

Timestep

pure gaussian noise

1000 / 1000

1000 · noise 800 600 400 200 0 · image

What you're seeing — and what a real model adds. The image starts as pure Gaussian noise (T=1000). The model has been trained to look at the noisy image at each timestep and predict the noise it sees. Subtract that prediction, you have a slightly less noisy image. Repeat 50 (DDPM) or 28 (DDIM-fast) times and you arrive at T=0 — a coherent image. This demo fakes the denoise by interpolating between pure noise and a chosen target; a real diffusion model would consult a U-Net (or DiT) at every step and the early steps would structure the image while late steps add fine detail. Text conditioning (CLIP embedding of "a cat") gets injected via cross-attention at every step — that's how the model knows what to denoise toward. Modern models (Stable Diffusion 3, FLUX, ERNIE-Image) operate in latent space (32×32×4) instead of pixel space (256×256×3), which is what makes them fast enough to run on a single GPU.

Drag noise away. Watch an imagestep out of static.

Drag noise away. Watch an image
step out of static.