Demo · Module 24 · Interactive
Walk the latent space.
Swap the conditioning.
LATENT SPACE · drag the dot(0.50, 0.50)
cats
dogs
interiors
landscapes
portraits
GENERATED OUTPUT · same z + different conditioning
ControlNet · structural conditioning
Quick-jumps to interesting latent points
Walk through latent space
What the latent walk is doing. In a latent diffusion model like Stable Diffusion, the model doesn't generate pixels directly — it generates a much smaller latent tensor (typically 4×64×64 for a 512×512 output), and a separate VAE decoder turns that latent into pixels. Drag through the latent space and you traverse a smooth manifold where nearby points produce nearby images. What ControlNet adds. A ControlNet is a separate copy of the U-Net's encoder, trained to take a structural condition (canny edges, depth map, pose skeleton, segmentation mask) and inject it into the generation process at every denoising step. The same latent point, conditioned on different controls, produces different images that share the structural skeleton of the condition. This is what lets you say "I want this composition / pose / depth, but draw it as a watercolor" — the latent encodes style, the ControlNet encodes structure, and the two combine. What this demo simulates: a 2D projection of latent space (real Stable Diffusion is ~16K dimensions, not 2) with hand-placed cluster labels; the generated image is a stylized representation of how the chosen latent + condition would manifest.