Module f-voice-clone · Voice · 8 min

Voice cloning,
ethically.

Single-shot vs few-shot cloning, where artifacts come from, ethical guardrails.

Reading time8 min Audio- Prerequisitesf-tts SourceTrack A · Gemini
§ 1

What this lesson covers.

This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. There is no audio narration for this module - it ships as text + interactive lesson only.

If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.

§ 2

The lesson itself.

Interactive lesson · ported from Gemini track Click tabs to navigate · hover cards for details
Voice · Customization

Voice Cloning

Single-shot vs few-shot · artifacts · ethical guardrails

SINGLE-SHOT VS FEW-SHOT

How much reference audio you actually need

Single-shot (3-15 seconds of reference): works for casual sound-alike. Captures broad voice color but loses prosodic patterns. Good for first-pass concept work. Few-shot (30 seconds to 5 minutes): captures prosody, breathing rhythm, characteristic emphasis. Studio-grade (30+ minutes across multiple emotions): gives you a voice that's actually convincing across long-form narration. ElevenLabs' Professional Voice Cloning is in this tier.
WHERE ARTIFACTS COME FROM

The audible signatures of cloned voices

Cloned voices fail in specific ways: (1) consonant smearing at fast speech rates, (2) unnatural breath/pause patterns (the model doesn't know when to breathe), (3) drift in pitch register on emphasized words, (4) emotion bleeding (model adds drama that wasn't in the reference). Each artifact has a fix — usually more / cleaner reference audio.
ETHICAL GUARDRAILS

The non-negotiable patterns

Consent first: the person whose voice you're cloning must explicitly agree, in writing, for each project. Watermarking: every generated audio file should carry an inaudible watermark that can be detected after release. Refusal lists: no cloning of public figures (politicians, journalists, celebrities) without legal review. Audit logs: every clone request logged with the requester and purpose. These aren't optional in 2026.
PRACTICAL WORKFLOW

From reference recording to deployable voice

(1) Record 5-10 minutes of clean reference in a quiet room. (2) Process to remove background noise, normalize loudness. (3) Train or upload to your TTS provider. (4) Validate on 20+ test sentences across emotional ranges. (5) Generate watermarked production audio. (6) Keep the source recording in cold storage so the voice can be re-trained if the underlying TTS model is upgraded.
§ PAPERS

Further reading.

The canonical references for this module. External links open in a new tab.

§ NEXT

What to read next.

Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.