Narration · Module 17
The Highway
0:00 / 0:00
Module 17 · Inference · 8 min

The full pipeline as
one highway.

Where every previous module fits.

Reading time8 min Audionarration available PrerequisitesFoundation SourceTrack A · Gemini
§ 1

What this lesson covers.

This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. An audio narration runs alongside it - the sticky player at the top of the page plays the full Module 17 clip.

If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.

§ 2

The lesson itself.

Interactive lesson · ported from Gemini track Click tabs to navigate · hover cards for details

The Inference Highway

Generating tokens requires running the entire transformer neural network for every single word. See why native generation is slow, how the KV Cache prevents redundant work, and how Speculative Decoding uses a smaller model to predict the future.

No KV Cache
Redundant Computation
KV Cache
State Reuse
Speculative Decoding
Draft & Validate
Standard Generation
The model recalculates all previous tokens on every step. O(N²) scaling leads to massive traffic jams.
2.1 tokens/sec
Past Context Layer
Generation Layer
Validation Layer (Speculative)
§ PAPERS

Further reading.

The canonical references for this module. External links open in a new tab.

§ NEXT

What to read next.

Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.