Video Generation
Generate and edit videos with AI-powered models
Modelos de geração de vídeo para marketing, motion e prototipagem
Os modelos de vídeo transformam um prompt — ou um frame estático, ou um pequeno clip de referência — em imagem em movimento. É a categoria mais jovem e volátil do catálogo: a cada trimestre surge um novo flagship que recoloca o nível de qualidade. Recorra a um quando precisar de conteúdo em movimento mais depressa do que um editor humano consegue produzir.
51 models available
Google Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Kling v3 Omni
Most versatile: multi-reference images, video editing, native audio
OpenAI Sora 2
OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.
Runway Gen 4.5
Top-ranked for motion quality and visual fidelity
Sora
OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.
AnimateDiff
Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.
AnimateDiff Evolved
Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.
AnimateDiff Lightning
ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.
Champ Human Animation
Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.
CogVideoX-5B (open)
Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
DreamGaussian 4D
4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.
DynamiCrafter
Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.
EchoMimic
Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.
FILM Frame Interpolation
Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.
Google Veo 3 Fast
Faster cheaper Veo 3 with audio
Google Veo 3.1 Fast
Faster Veo 3.1 with image-to-video and audio
Grok Imagine Video
xAI video with native audio and lip-sync, up to 15s
Hailuo / MiniMax Video-01
MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.
Hailuo 2.3
Minimax model for realistic human motion and VFX
HunyuanVideo
Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.
HunyuanVideo
Tencent's open-source video generation model. Strong visual quality with diverse style support.
Kling 1.6 Pro
Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.
LivePortrait
Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.
LTX-Video (Lightricks)
Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).
Luma Dream Machine v1.6
Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.
Luma Ray Flash 2
Fast affordable video with I2V support
MagicAnimate
ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.
Minimax Video
MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.
Mochi 1
Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.
MOFA-Video
Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.
MuseTalk
Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.
Pika 2.0 (Official)
Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.
PixVerse v5.6
Physics-accurate video generation up to 1080p
Real-CUGAN
Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.
RIFE Frame Interpolation
Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.
Runway Gen-3 Alpha Turbo
Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).
SadTalker
Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.
Seedance Lite
Budget ByteDance video, fast and cheap
Seedance Pro
ByteDance video with T2V and I2V, up to 1080p
StreamingT2V
Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.
SwinIR Video
SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.
ToonCrafter
Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.
V-Express
Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.
VideoCrafter
Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.
Wan 2.1 (Alibaba)
Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.
Wan 2.2 Image-to-Video
Ultra-cheap I2V. Upload image and animate it.
Wan 2.2 Text-to-Video
Ultra-cheap T2V for pennies
Wav2Lip
Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.
Top video generation picks
Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreZhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
Learn moreGoogle's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreTop-ranked for motion quality and visual fidelity
Learn moreO pricing em vídeo é por segundo de output e não por token nem por chamada. Um clip flagship de cinco segundos custa entre vinte cêntimos (Kling 1.6, Hunyuan Video) e cerca de um euro (Veo 3, Runway Gen-3 Alpha). Os tiers com som custam mais do que os mudos. Os multiplicadores de resolução somam-se à duração: 720p é o padrão, 1080p custa cerca de 2× mais e 4K é raro e caro.
O compromisso aqui é duração versus coerência. A maioria dos modelos comerciais limita o output a cinco-dez segundos porque clips mais longos derivam — personagens trocam de roupa, fundos transformam-se e a física falha. Para narrativas mais longas, gere uma sequência de planos mais curtos e cole em pós-produção. O image-to-video (frame inicial + prompt de movimento) costuma produzir resultados mais estáveis do que o puro text-to-video, sobretudo para personagens e planos de produto.
Atenção ao muro dos cinco segundos: praticamente todos os modelos no mercado atualmente esgotam-se nos cinco segundos de output contínuo, e a qualidade cai abruptamente se for empurrada mais além. Se o argumento exige dez segundos, planeie dois planos. Atenção também ao som: a maior parte dos modelos sai muda e tem de sobrepor áudio à parte — só o Veo 3 e algumas research previews entregam áudio integrado por enquanto.
As top picks acima cobrem o líder de realismo flagship, o cavalo de batalha mais barato, o modelo de clip mais longo e a opção de preview mais rápida da categoria.
Popular use cases
Common patterns built with video generation on Railwail.
Related comparisons
Side-by-side reviews of the most-compared models in this category.
Frequently asked questions
Start Building with AI
Access all models through a single API. Get free credits when you sign up — no credit card required.