What this lesson covers.
This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. An audio narration runs alongside it - the sticky player at the top of the page plays the full Module 12 clip.
If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.
The lesson itself.
Layer 3 — The Attention Mechanism
The magic that lets AI "read" context like a human
This is exactly how an AI's Attention Mechanism works. When the AI processes a sentence, it doesn't just read left-to-right. Every single word acts like a detective, looking at every other word in the text and assigning an "attention score" to decide which words are the most important clues.
1. Query (What am I looking for?)
The word "bank" says: "I need a clue to tell me what kind of bank I am."
2. Key (What do I contain?)
The word "river" says: "I am related to water and nature."
3. Value (What is my actual meaning?)
If the Query matches the Key, the "bank" word absorbs the Value of the "river" word. Now, "bank" mathematically contains the concept of water!
Attention lets the AI look at everything all at once. Word 1 looks at Word 100 on the exact same step that Word 100 looks at Word 1. This parallel processing is the secret behind why modern LLMs can be trained so quickly on millions of books using giant graphics cards (GPUs).
Try it: attention heatmap.
Type a sentence. Click any token to make it a query. Three heads + one average compute simultaneously.
Further reading.
The canonical references for this module. External links open in a new tab.
What to read next.
Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.