Video Generation
Generate and edit videos with AI-powered models
Video generation models for marketing, motion, and prototyping
Video models turn a prompt — or a still frame, or a short reference clip — into a moving picture. The category is the youngest and most volatile in the catalog: every quarter brings a new flagship that resets the quality bar. Reach for one when you need motion content faster than a human editor can produce it.
51 models available
Google Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Kling v3 Omni
Most versatile: multi-reference images, video editing, native audio
OpenAI Sora 2
OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.
Runway Gen 4.5
Top-ranked for motion quality and visual fidelity
Sora
OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.
AnimateDiff
Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.
AnimateDiff Evolved
Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.
AnimateDiff Lightning
ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.
Champ Human Animation
Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.
CogVideoX-5B (open)
Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
DreamGaussian 4D
4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.
DynamiCrafter
Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.
EchoMimic
Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.
FILM Frame Interpolation
Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.
Google Veo 3 Fast
Faster cheaper Veo 3 with audio
Google Veo 3.1 Fast
Faster Veo 3.1 with image-to-video and audio
Grok Imagine Video
xAI video with native audio and lip-sync, up to 15s
Hailuo / MiniMax Video-01
MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.
Hailuo 2.3
Minimax model for realistic human motion and VFX
HunyuanVideo
Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.
HunyuanVideo
Tencent's open-source video generation model. Strong visual quality with diverse style support.
Kling 1.6 Pro
Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.
LivePortrait
Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.
LTX-Video (Lightricks)
Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).
Luma Dream Machine v1.6
Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.
Luma Ray Flash 2
Fast affordable video with I2V support
MagicAnimate
ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.
Minimax Video
MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.
Mochi 1
Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.
MOFA-Video
Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.
MuseTalk
Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.
Pika 2.0 (Official)
Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.
PixVerse v5.6
Physics-accurate video generation up to 1080p
Real-CUGAN
Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.
RIFE Frame Interpolation
Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.
Runway Gen-3 Alpha Turbo
Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).
SadTalker
Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.
Seedance Lite
Budget ByteDance video, fast and cheap
Seedance Pro
ByteDance video with T2V and I2V, up to 1080p
StreamingT2V
Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.
SwinIR Video
SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.
ToonCrafter
Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.
V-Express
Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.
VideoCrafter
Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.
Wan 2.1 (Alibaba)
Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.
Wan 2.2 Image-to-Video
Ultra-cheap I2V. Upload image and animate it.
Wan 2.2 Text-to-Video
Ultra-cheap T2V for pennies
Wav2Lip
Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.
Top video generation picks
Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreZhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
Learn moreGoogle's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreTop-ranked for motion quality and visual fidelity
Learn morePricing in video is per-second of output rather than per-token or per-call. A flagship five-second clip costs anywhere from twenty cents (Kling 1.6, Hunyuan Video) to about a euro (Veo 3, Runway Gen-3 Alpha). Sound-on tiers cost more than silent tiers. Resolution multipliers stack on top of duration: 720p is the standard, 1080p costs roughly 2× more, and 4K is rare and expensive.
The trade-off here is duration versus coherence. Most commercial models cap output at five to ten seconds because longer clips drift — characters change clothes, backgrounds morph, and physics breaks down. For longer narratives, generate a sequence of shorter shots and stitch them in post. Image-to-video (start frame + motion prompt) typically produces more stable results than pure text-to-video, especially for characters and product shots.
Watch out for the five-second wall: virtually every model on the market today maxes out around five seconds of continuous output, and quality degrades sharply when you push for more. If your script needs ten seconds, plan for two shots. Also watch out for sound: most models ship silent and you have to overlay audio separately — only Veo 3 and a few research previews ship integrated audio so far.
Top picks above cover the flagship realism leader, the cheapest workhorse, the longest-clip model, and the fastest preview option in the category.
Popular use cases
Common patterns built with video generation on Railwail.
Related comparisons
Side-by-side reviews of the most-compared models in this category.
Frequently asked questions
Start Building with AI
Access all models through a single API. Get free credits when you sign up — no credit card required.