02 / Live examples

Real renders from the actual endpoint. Prompts are shown beneath each clip so you can copy the pattern.

See Seedance 2.0 in action.

015s
Chef dialogue

A chef in a professional kitchen at dusk ties an apron in front of a pass window and looks up to say: 'Tonight, the kitchen opens at seven.'

025s
Product turntable

A matte black wireless headphone slowly rotates on a concrete pedestal under one warm key light.

035s
Lakeside dawn

A calm coastal lake at dawn, camera slow push-in from behind a lighthouse toward the still water. Golden sunrise breaks across the horizon.

045s
Server room dialogue

Two engineers in a server room look at a monitor. First says: 'That timing is off by 40 ms.' Second answers: 'Then we ship the rollback.'

055s
Falcon at sunset

Wide slow-motion shot of a falcon launching from a cliff at sunset, backlit by gold, feathers rim-lit, one long distant drumbeat over low wind.

065s
Vintage synth

A vintage analog synthesizer on a polished wooden desk, camera slow dolly past the glowing knobs and patch cables. Soft backlight, warm bokeh, tubes warming up.

05 / FAQ

Frequentlyasked.

01How much does Seedance 2.0 cost on fal.ai?

Standard tier at `bytedance/seedance-2.0/text-to-video` bills at roughly $0.3034 per second of 720p output, with native audio included at no extra cost. A 5 second 720p render lands at $1.52. The Fast tier at `bytedance/seedance-2.0/fast/text-to-video` drops to $0.2419 per second on the same schema, about 20 percent cheaper for iteration. Reference-to-video with video refs applies a 0.6x duration multiplier, effectively $0.1814 per second. Math follows the token formula (height x width x duration x 24) divided by 1024 at $0.014 per 1k tokens on standard and $0.0112 per 1k on Fast. Validate at fal.ai/pricing.

02What is the max resolution?

Seedance 2.0 caps at 720p. You pick between 480p and 720p on `bytedance/seedance-2.0/text-to-video`, with no 1080p or 4K option. The legacy Seedance 1.5 Pro endpoint still reaches 1080p if you need it, and Veo 3.1 pushes to 4K for broadcast work. For social delivery, creator content, and preview reviews, 720p with native audio is where most teams land; the cinematic grade and token budget go further at 720p than stretched 1080p. Upscale downstream with a dedicated upscaler if your target needs more pixel density.

03How many reference images, videos, and audios can I pass?

Up to 12 files total split across three channels: 9 images, 3 videos, and 3 audios. That is the full multimodal surface on `bytedance/seedance-2.0/reference-to-video`. Images hold character, wardrobe, and composition anchors. Video refs drive camera move and motion rhythm. Audio refs drive room tone, ambience, and voice character. Duration caps still apply to the output: 4 to 15 seconds per single shot. Reference-to-video with video refs uses a 0.6x duration multiplier, so you pay for 60 percent of rendered time when video conditioning is active.

04Can I generate videos with images of real people?

The face filter that blocks generation of real individuals without identity verification sits at the ByteDance model layer, not at fal.ai. No API provider has a bypass. Operators who need portraits of real people route through identity verification and licensed talent pipelines, which is the standard across every major commercial video provider. AI-generated portraits (faces that do not match a real person) remain the documented path on `bytedance/seedance-2.0/reference-to-video` when you want character continuity without licensing overhead. Brand campaigns with cleared talent use the verified-identity intake.

05Seedance 2.0 vs Kling 3.0 Pro: which should I pick?

Pick `bytedance/seedance-2.0/text-to-video` when your brief uses multiple reference channels (image plus video plus audio), when you need 15 seconds of single-shot duration, and when you want native audio in a single pass. Pick Kling 3.0 Pro when you need 1080p output, when motion smoothness is the top priority, and when you are already in a Kling storyboarding flow. Seedance leads the head-to-head on I2V Arena Elo (1346 vs 1282) and on the multimodal surface. Kling leads on resolution ceiling and per-second price at 1080p.

06What is the Fast tier?

`bytedance/seedance-2.0/fast/text-to-video` and `bytedance/seedance-2.0/fast/reference-to-video` are quicker-turnaround variants at $0.2419 per second (about 20 percent cheaper than the $0.3034 standard tier). Same input schema, same 12-file multimodal surface, same 720p ceiling, same duration caps. Use Fast for iteration passes where you want 20 versions of a single shot before committing to the final render. Native audio remains included at no extra cost. Token formula shifts from $0.014 per 1k to $0.0112 per 1k, which is where the Fast savings come from.

07How do I call it from Python?

Install `fal-client`, set `FAL_KEY` in your environment, and subscribe to `bytedance/seedance-2.0/text-to-video`. The input dictionary mirrors the TypeScript SDK: prompt, duration (4 to 15), resolution (480p or 720p), aspect_ratio, generate_audio, and seed. Use `fal_client.subscribe` for synchronous waits or `fal_client.submit` for async jobs with webhooks. The queue returns logs you can stream with `with_logs=True`. Full schema and code shape live on the endpoint page under fal.ai/models. The Fast tier swaps the endpoint path to `bytedance/seedance-2.0/fast/text-to-video` without any other code change.

08What happens when a render fails?

The fal async queue behind `bytedance/seedance-2.0/text-to-video` returns structured errors with the rule id or timeout reason. Soft retries: drop the duration from 15 seconds to 8, switch the seed, or loosen the prompt if a content rule fired. Hard retries: route the same input to the Fast tier for a lighter compute path, or fall back to Seedance 1.5 Pro if you need 1080p. The queue logs surface in your dashboard, and webhooks report completion state to your server. Transient queue failures retry once automatically; committed failures surface a clear error code.

09Why run Seedance 2.0 on fal.ai?

Eight reasons. One, fal.ai is ByteDance's chosen enterprise partner for Seedance 2.0 with day-one access to all six endpoints. Two, a single FAL_KEY speaks to 600+ models, so your pipeline does not fragment across providers. Three, serverless scale with no cold starts plus an async queue that supports webhooks for fan-out. Four, the Fast tier at $0.2419 per second for iteration budgets on `bytedance/seedance-2.0/fast/text-to-video`. Five, regional points of presence for lower latency. Six, one `@fal-ai/client` SDK in TypeScript, Python, and Swift. Seven, free signup credits to kick the tires. Eight, Slack and Discord access to the fal team when a pipeline question needs a human.

10What formats and aspect ratios does Seedance 2.0 support?

`bytedance/seedance-2.0/text-to-video` accepts aspect ratios 21:9, 16:9, 4:3, 1:1, 3:4, 9:16, plus an auto mode that picks from the prompt. Duration ranges 4, 5, 8, 10, 12, and 15 seconds. Resolution is 480p or 720p. Output is MP4 with H.264 video and AAC audio when generate_audio is on. Seed input is supported for reproducible renders. For vertical social deliverables, 9:16 at 720p is the common pick; for cinematic widescreen, 21:9 is available natively without cropping downstream. Native audio runs at the same encode as the video track.

01 / Overview

Seedance 2.0at a glance.

01

Seedance 2.0 is ByteDance Seed's flagship video model, announced February 12, 2026 and live on fal.ai as its chosen enterprise partner since April 15, 2026. The signature edge is a 12-file multimodal reference surface: up to 9 images, 3 videos, and 3 audios passed into a single call, combined with native audio generation in the same forward pass. You brief a shot with a character still, a camera-move reference clip, and a room-tone sample, and Seedance 2.0 returns the take with audio baked in. No separate TTS stitch, no lip sync pass, no identity matcher layered on top. On the Artificial Analysis Arena, Seedance 2.0 sits at rank two on both legs of the leaderboard with Elo 1270 on text-to-video and 1346 on image-to-video. HappyHorse 1.0 currently leads the Arena; Seedance 2.0 is the strongest of the major commercial endpoints that is generally available without partner allowlist.

02

The honest caveats matter. Max output resolution is 720p. You do not get 1080p or 4K out of Seedance 2.0, and if your delivery target is broadcast, you will upscale downstream or pick a different model. The model-layer face filter blocks generation of real people without identity verification. This is enforced at the ByteDance model layer, not at fal.ai, so no API provider has a bypass. Operators that need portraits of real individuals route through licensed talent and verified identity pipelines; AI-generated portraits remain the documented path across all commercial video providers. Where Seedance 2.0 wins: 15 second maximum single-shot duration beats Veo 3.1's 8 seconds, the 12-file multimodal reference surface is unique across the cohort, and native audio is included at no extra charge while Veo 3.1 charges $0.40 per second at 1080p.

03

Against the field, the picks sort cleanly. Against Kling 3.0 Pro: pick Seedance for longer duration, multimodal ref control, and joint audio. Against Veo 3.1: pick Seedance when $0.30 per second beats $0.40 per second and 15 seconds beats 8 seconds, concede to Veo when you need 4K or the broadcast color pipeline. Against Grok Imagine v1.0: pick Seedance for cinematic narrative work and multi-reference briefs, concede to Grok on raw iteration speed and $0.07 per second pricing. The Fast tier at $0.2419 per second gives you the same schema on iteration budgets, and six endpoints (text-to-video, image-to-video, reference-to-video plus Fast variants of each) cover every entry point a production pipeline needs.

01 / Who it's for
  • 01Indie film directors briefing cinematic narrative shots with multi-reference control
  • 02Agency teams producing short-form content that needs native audio in one pass
  • 03Ad studios working with licensed talent portraits and brand-approved reference imagery
  • 04Research groups benchmarking hosted video generation against the Arena leaderboard
  • 05Creative pipelines that need the 12-file multimodal surface for character continuity
02 / When to pick
  • 01Your brief uses more than one reference channel (image plus video plus audio)
  • 02You need 15 second single-shot duration where Veo 3.1 tops out at 8 seconds
  • 03You want identical character and motion across multiple shots using shared refs
  • 04Your budget prefers $0.30 per second for cinematic work over $0.40 per second
  • 05You need native audio baked into the render without a separate synthesis pass
03 / Infrastructure

fal.ai is ByteDance's chosen enterprise partner for Seedance 2.0, giving you day-one access to all six endpoints, the full 12-file multimodal reference surface, and the Fast tier at $0.2419 per second for iteration. One fal.subscribe call, one billing dashboard, one SDK for every other model you need to route to next.

02 / Integration

Call Seedance 2.0in under 20 lines.

01example.tsTYPESCRIPT
01import { fal } from "@fal-ai/client";
02
03fal.config({ credentials: process.env.FAL_KEY });
04
05// Seedance 2.0 text-to-video on fal.ai
06const result = await fal.subscribe("bytedance/seedance-2.0/text-to-video", {
07 input: {
08 prompt: "A chef in a professional kitchen at dusk looks to camera and says: 'Tonight, the kitchen opens at seven.' Warm key light, shallow depth of field, soft extractor hum, 24fps cinematic grade.",
09 duration: 5, // 4 to 15 seconds
10 resolution: "720p", // 480p or 720p
11 aspect_ratio: "16:9",
12 generate_audio: true, // native audio on by default
13 seed: 42,
14 },
15 logs: true,
16 onQueueUpdate: (update) => {
17 if (update.status === "IN_PROGRESS") {
18 update.logs?.map((log) => log.message).forEach(console.log);
19 }
20 },
21});
22
23console.log(result.data.video.url);
Expected output
{ video: { url: "https://v3.fal.media/files/..." }, seed: 42 }
Full API reference
03 / Pricing

What Seedance 2.0costs on fal.ai.

01bytedance/seedance-2.0/text-to-video
$0.3034per second (720p)

5s 720p audio on

$1.52
02bytedance/seedance-2.0/image-to-video
$0.3024per second (720p)

5s 720p from still

$1.51
03bytedance/seedance-2.0/reference-to-video
$0.3024per second (image refs)

5s 720p, 4 image refs

$1.51
04bytedance/seedance-2.0/reference-to-video
$0.1814per second (video refs, 0.6x)

5s 720p, 2 video refs

$0.91
05bytedance/seedance-2.0/fast/text-to-video
$0.2419per second (720p)

5s 720p audio on

$1.21
06bytedance/seedance-2.0/fast/reference-to-video
$0.14515per second (video refs, 0.6x)

5s 720p, 2 video refs

$0.73

Pricing via token formula (h x w x duration x 24) / 1024 at $0.014/1k on standard, $0.0112/1k on Fast. Native audio included at no extra cost.

Official pricing page
04 / Comparison

Seedance 2.0vs the field.

01 · PRIMARYbytedance/seedance-2.0/text-to-video
Seedance 2.0
Res
720p
Dur
15s
Price
$0.30/s
Elo
1270 T2V / 1346 I2V

12-file multimodal refs, native audio, cinematic narrative

02fal-ai/kling-video/v3/pro/text-to-video
Kling 3.0 Pro
Res
1080p
Dur
10s
Price
$0.17 to $0.20/s
Elo
1247 T2V / 1282 I2V

Motion smoothness, storyboarding

03fal-ai/veo3.1
Veo 3.1
Res
4K
Dur
8s
Price
$0.40/s
Elo
1209 T2V / 1243 I2V

Broadcast-grade color, cinematic finish

04xai/grok-imagine-video/text-to-video
Grok Imagine v1.0
Res
720p
Dur
15s
Price
$0.07/s
Elo
1232 T2V / 1325 I2V

Fastest and cheapest iteration

Seedance 2.0 leads the Kling/Veo/Grok cohort on duration, multimodal ref control, and I2V Arena Elo. Pick it when your brief needs more than one reference channel and you want audio baked in.

By the numbers

The numbers.

What this publication is and isn't, in numbers.

01/ Published posts
10

Each one is dated, second-person, and opinionated.

02/ Topic categories
9

Filter by the constraint you care about.

03/ Total reading time
58min

Total length of every post in the archive.

04/ Em-dashes tolerate
0

Not a single U+2014 survives our ship check.

05/ Featured picks
1

Editor-selected cover stories.

06/ Posts illustrated
100%

Custom covers on every featured post.

Topic pulse

What we write about most.

Keyword frequency across every post. The bigger the word, the more often we come back to it.

Also reading