Skip to content
All guides
Workflow10 min

From still to video, same person

Reference-to-video routes a character anchor into the video provider so the face survives the motion. Plus: which video model fits which kind of clip.

Video is the format that breaks identity for most AI tools. A beautiful still and then a moving clip that looks like a different person. SocialAF’s video router solves that by sending the character’s reference images straight into the video model’s reference-to-video endpoint when one exists, and using the closest equivalent fallback when it doesn’t.

Every clip in the “In motion” section on the landing page is routed this way. The examples below show the actual prompt + model used for each.

Step 01

Switch to Video mode + pick the character

Same Generate page, top-right tab from Image to Video. Pick the character you already built. The reference array is already attached; the studio detects char-refs are present and starts routing through the reference-to-video pipeline when you submit.

Step 02

Write a scene prompt that describes motion + frame

Video prompts are different from image prompts. Add motion vocabulary: how the camera moves, how the subject moves, what beats happen across the clip. Don’t over-script . Five seconds is two beats max.

Notice the example below: one subject move (rowing), one camera move (dolly-in), one light cue (warm window). That’s the right density for a 5s clip.

wan-reference-to-video16:9Slow dolly, gym mirror

Prompt

the same woman performing a single dumbbell row, slow camera dolly-in, gym mirror catching warm window light, sweat detail, cinematic 35mm

Step 03

Pick the right video model

The catalog has six families with different personalities. Quick guide:

  • Kling 3 Pro . Default for cinematic clips with native audio support. R2v sibling routes refs through kling-reference-to-video.
  • Veo 3.1 Standard . Google Veo. Top-tier prompt adherence and motion realism. More credits per second; pick when the shot needs to be hero-level.
  • Wan 2.7 . Best quality-per-credit ratio. Solid generalist for social-media-length output. Every example on this page was rendered with it.
  • Seedance 2 High . ByteDance’s 1080p tier. Good with stylized motion, dance, sports.
  • Happy Horse 1.0 . Newest, fast and cheap, dedicated r2v variant. Use for drafting before paying for Veo.
  • Grok Imagine Video . Fixed price per generation. No reference-to-video variant. UI gates it when refs are attached, so the studio steers you elsewhere automatically.
wan-reference-to-video16:9Cliff jump, slow-mo

Prompt

the same man jumping from a coastal cliff into turquoise ocean, slow motion drop, splash on entry, photoreal adventure sports

Step 04

Set duration to match the platform

3-5 seconds for vertical social (TikTok, Reels). 8-10 for editorial b-roll. 15 for a longer hero shot. Per-second models charge proportionally; per-generation models like Grok ignore the duration field and run at their fixed length.

If you’re drafting, render at 5s. Once the framing works, regenerate at 10-15s for the keeper.

wan-reference-to-video16:9Alley stride, slow motion

Prompt

the same woman walking through a sunlit narrow stone alley in a cream blazer and black trousers, slow dolly-in, photoreal fashion editorial

Step 05

Optional: feed a start frame from your gallery

When a character has refs AND you supply a start image, the studio prefers the reference-to-video route (refs are a stronger identity signal than a single frame). When you don’t supply refs, the start frame becomes the first rendered frame via the model’s image-to-video slug.

Use the start-frame option when you have a still you love and want to animate it specifically. Otherwise let the refs drive.

wan-reference-to-video16:9Mid-leap, slow-mo stage

Prompt

the same man performing a high leap on a black stage with a magenta spotlight, slow-motion, dust particles in beam, photoreal stage cinematography

Step 06

Submit, wait, scan, iterate

Video generation is slower than image. Plan on 60-180 seconds for most providers, longer for Veo or Kling Pro at 1080p. The studio polls until the clip lands and uploads to your gallery.

Same diagnosis loop as image: face drift means swap to a provider with a dedicated r2v sibling (Kling, Wan, Seedance, Happy Horse). Motion drift means tighten the motion vocabulary in the prompt or change the model.

wan-reference-to-video16:9Wood-fire pizza pull

Prompt

the same man pulling a pizza from a wood-fired oven with a long peel, flames visible behind, slight slow motion on the pull, photoreal restaurant cinematography

Video carries the face. Now the next layer: making the clip actually speak in your character’s voice.

Now go build.

The whole pipeline is in your dashboard. Start with a character, ship every format from there.