§ 1
What this lesson covers.
This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. There is no audio narration for this module - it ships as text + interactive lesson only.
If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.
Interactive lesson · ported from Gemini track
Click tabs to navigate · hover cards for details
SINGLE-SHOT VS FEW-SHOTHow much reference audio you actually need
Single-shot (3-15 seconds of reference): works for casual sound-alike. Captures broad voice color but loses prosodic patterns. Good for first-pass concept work. Few-shot (30 seconds to 5 minutes): captures prosody, breathing rhythm, characteristic emphasis. Studio-grade (30+ minutes across multiple emotions): gives you a voice that's actually convincing across long-form narration. ElevenLabs' Professional Voice Cloning is in this tier.
WHERE ARTIFACTS COME FROMThe audible signatures of cloned voices
Cloned voices fail in specific ways: (1) consonant smearing at fast speech rates, (2) unnatural breath/pause patterns (the model doesn't know when to breathe), (3) drift in pitch register on emphasized words, (4) emotion bleeding (model adds drama that wasn't in the reference). Each artifact has a fix — usually more / cleaner reference audio.
ETHICAL GUARDRAILSThe non-negotiable patterns
Consent first: the person whose voice you're cloning must explicitly agree, in writing, for each project. Watermarking: every generated audio file should carry an inaudible watermark that can be detected after release. Refusal lists: no cloning of public figures (politicians, journalists, celebrities) without legal review. Audit logs: every clone request logged with the requester and purpose. These aren't optional in 2026.
PRACTICAL WORKFLOWFrom reference recording to deployable voice
(1) Record 5-10 minutes of clean reference in a quiet room. (2) Process to remove background noise, normalize loudness. (3) Train or upload to your TTS provider. (4) Validate on 20+ test sentences across emotional ranges. (5) Generate watermarked production audio. (6) Keep the source recording in cold storage so the voice can be re-trained if the underlying TTS model is upgraded.
The canonical references for this module. External links open in a new tab.
§ NEXT
What to read next.
Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.