Module f-ltx · Video · 8 min

LTX architecture
and shot composition.

What's inside a modern video diffusion model - shot composition, motion control.

Reading time8 min Audio- Prerequisitesf-video-diff SourceTrack A · Gemini
§ 1

What this lesson covers.

This module is one of 42 in the curriculum. Below is the canonical interactive lesson — tabs, cards, and diagrams from the source repo, rendered inside the course shell. There is no audio narration for this module - it ships as text + interactive lesson only.

If you prefer to read first and play with the demos after, the interactive lesson sits below this section. If you'd rather hear it narrated while you scroll, hit play on the sticky audio bar at the top — or just let it autoplay.

§ 2

The lesson itself.

Interactive lesson · ported from Gemini track Click tabs to navigate · hover cards for details
Video · Architecture

LTX Architecture & Shot Composition

Lightricks LTX-Video · what's inside a modern video diffusion model · motion control

LTX-VIDEO

Lightricks' open-weight video diffusion (2024-11)

LTX-Video 0.9 was the first open-weight video model that ran on consumer GPUs (24GB VRAM). LTX 2.3 (2025) extended this with longer-form generation and better motion. Architecture: a DiT-style transformer trained jointly on text-to-video and image-to-video. Generates 5-second clips at 768×512 @ 24fps in under 30 seconds on a 4090.
SHOT COMPOSITION

Pacing is a prompt input, not a render setting

Modern video models reward thinking in shots rather than seconds. A 5-second clip can be one slow pan, three cuts, or a single dynamic action — you control this through prompt language and motion-guidance inputs. Macalinao Studio practice: write the shot in script form ("opens on wide establishing shot, slow push-in over 3 seconds, then cut to close-up") before generating.
MOTION CONTROL

Camera motion specifiers and trajectory conditioning

LTX-Video and Sora both accept motion specifiers: "static camera", "slow dolly forward", "orbit left", "crash zoom". Some systems also accept trajectory inputs — a bezier curve through 3D space that the virtual camera follows. This is the bridge between AI generation and traditional film grammar.
PRODUCTION REALITY

What you can ship today vs what's a year away

Ship today: 5-15 second clips with one subject, predictable camera moves, stylized aesthetics. Not yet: minute-plus narrative with multiple characters, faces holding consistent across cuts, dialogue lip-sync. The studio strategy: design around the strengths (short stylized shots, voiceover instead of dialogue), wait for the gaps to close.
§ PAPERS

Further reading.

The canonical references for this module. External links open in a new tab.

§ NEXT

What to read next.

Use the pager below to move sequentially through the curriculum, or jump to any module from the course index. Each track has a "Prereq: ↑ foundation" callout so you can backfill anything that wasn't clear.