Demo · Module 24 · Interactive

Walk the latent space.
Swap the conditioning.

Drag the orange dot in latent space
Same point + different control = different image
Toggle between 4 ControlNet modes

LATENT SPACE · drag the dot(0.50, 0.50)

cats

dogs

interiors

landscapes

portraits

GENERATED OUTPUT · same z + different conditioning

ControlNet · structural conditioning

Quick-jumps to interesting latent points

Walk through latent space

What the latent walk is doing. In a latent diffusion model like Stable Diffusion, the model doesn't generate pixels directly — it generates a much smaller latent tensor (typically 4×64×64 for a 512×512 output), and a separate VAE decoder turns that latent into pixels. Drag through the latent space and you traverse a smooth manifold where nearby points produce nearby images. What ControlNet adds. A ControlNet is a separate copy of the U-Net's encoder, trained to take a structural condition (canny edges, depth map, pose skeleton, segmentation mask) and inject it into the generation process at every denoising step. The same latent point, conditioned on different controls, produces different images that share the structural skeleton of the condition. This is what lets you say "I want this composition / pose / depth, but draw it as a watercolor" — the latent encodes style, the ControlNet encodes structure, and the two combine. What this demo simulates: a 2D projection of latent space (real Stable Diffusion is ~16K dimensions, not 2) with hand-placed cluster labels; the generated image is a stylized representation of how the chosen latent + condition would manifest.

Walk the latent space.Swap the conditioning.

Walk the latent space.
Swap the conditioning.