Demo · Module 18 · Interactive
How big does ΔW need to be? Drag the rank.
LoRA Rank
balanced · sweet spot for most tasks
r =8
1 · minimal
4
8 · sweet spot
16
32
64
128
256 · full
Matrix decomposition
ΔW · full fine-tune--
d × d
≈
A--
d × r
B--
r × d
ΔW ≈ A · B · trainable params = d · r · 2 · 96 attention layers
Comparison · full fine-tune vs LoRA
Model preset
Pick a real-world model to fine-tune. Changes d and the number of attention layers.
What this demo shows. When you fine-tune a pretrained model, instead of updating every parameter in a weight matrix
W of shape d × d, LoRA assumes the change ΔW is low-rank — meaning it can be expressed as A · B where A is d × r and B is r × d. You only train A and B; the original W stays frozen. For Llama 3 8B with d=4096 and 32 layers, full fine-tune trains ~16.8M params per attention layer; LoRA with r=8 trains 65,536 params per attention layer — a 256× reduction. Why does this work? Empirically, the change in weights during fine-tuning has very low intrinsic rank — most fine-tuning tasks only need a small subspace of updates. QLoRA goes further: quantize the frozen base model to 4-bit and only the LoRA adapters stay in full precision, cutting VRAM another 4×.