Demo · Module 10 · Interactive

Drag the temperature. Watch the model
change its mind.

Logits → softmax(logits/T) → probabilities
T → 0: greedy · T → ∞: uniform
8 candidate next-tokens
The cat sat on the
model's next-token distribution ↓
Temperature
balanced
1.00T
0 · greedy 0.3 0.7 · creative 1.0 · neutral 1.5 2.0 3.0 · chaos
Probability distribution · 8 candidates
What temperature actually does. Given a vector of logits from the model's final layer, softmax produces probabilities by computing exp(logit/T) / sum(exp(logit/T)). When T is small the largest logit dominates exponentially — the model becomes greedy and almost always picks "mat". When T = 1.0 you get the model's raw distribution. When T is large the exponentials get squashed toward each other and the distribution flattens — every candidate becomes nearly equally likely, which is what gives you chaotic, low-quality output. The temperature knob lives inside the softmax, not after it. Real-world use: 0.0 for code completion (greedy), 0.7-1.0 for chat, 1.0+ for creative writing, never above ~1.5 in production.