Modality-first curriculum 1 foundation · 4 tracks 29 lessons · 125 audio clips Open

One foundation.
Four modalities.
Text · Image · Video · Voice.

Approach
Five foundation layers (representation, math, architecture, training, inference) feed four modality tracks. The same transformer eats different tokens; the same attention math powers different generators.
Read order
Start with the foundation strips below. Then pick a track. Tracks share prerequisites — finishing one makes the others easier.
Format
Self-paced web modules with sticky audio narration on every lesson. Two interactive demos, one pre-rendered 3D embedding video, three hero diagrams.
Honesty
Text track is mature. Image track is partial (4 of ~9). Video and Voice tracks each have one canonical lesson and visible "Coming soon" slots showing where future content lands.
§1 · 21 modules
Common Foundation

The five layers every
modality builds on.

Reorder the modules pedagogically, not numerically. Representation → math → architecture → training → inference. Finish all five strips and you have every concept needed to read any modern transformer paper, in any modality. After that, pick a track.

§1A Representation 3 modules · how the world becomes numbers
§1B Math underneath 2 modules · the loss function and what falls out of it
§1C Architecture 6 modules · attention, projections, the assembly line
§1D Training 6 modules · pretraining, alignment, fine-tuning
§1E Inference 4 modules · the decoding loop and where models actually run
§2 - §5 · 4 tracks
Modality tracks

Pick a modality. Same
foundation underneath, different output.

Each track inherits the foundation strips above. Cross-listed lessons (orange-bordered) appear in two tracks because the same lesson serves both. Future-module slots are visible-but-disabled cards showing where new lessons would fit.

§6 · 3 surfaces
Playground

Demos you can
actually poke.

Two interactive single-page demos plus one pre-rendered 3D video. Open them standalone or follow them from the lessons that embed them.