What this lesson covers.
This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. An audio narration runs alongside it - the sticky player at the top of the page plays the full Module 14 clip.
If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.
The lesson itself.
Layer 15 — VAE & Stable Diffusion
Unblurring the Static to Generate Art
As you do this, you force the AI to watch. The AI's only job is to try and guess what the picture looked like one step before you added the static. It learns to "un-smudge" the noise. This is called a Diffusion Model.
2. Then, click "Reverse Diffusion (AI Un-smudge)" to watch the AI mathematically peel away the noise and generate a brand new image from scratch!
Enter the Variational Autoencoder (VAE). It essentially acts like a ZIP file compressor. It shrinks the giant image down into a tiny, mathematical "Latent Space". The AI adds static to this tiny file instead of the huge image. This makes rendering images extremely fast, running easily on standard home computers.
We use an extra system called CLIP. CLIP connects text to images. When you type "A cyberpunk dog", CLIP translates those words into a mathematical compass. As the AI un-smudges the static, it follows the CLIP compass, shifting its brushstrokes so the final image matches your words!
Try it: diffusion denoise.
Drag the T-slider from pure noise (T=1000) down to a coherent image (T=0). Try different target prompts.
Further reading.
The canonical references for this module. External links open in a new tab.
What to read next.
Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.