Video Generation
Generate and edit videos with AI-powered models
Modelli di generazione video per marketing, motion e prototipi
I modelli video trasformano un prompt — o un fotogramma fisso, o una breve clip di riferimento — in un'immagine in movimento. È la categoria più giovane e volatile del catalogo: ogni trimestre un nuovo flagship rialza l'asticella della qualità. Si ricorre a uno di questi modelli quando serve contenuto in movimento più veloce di quanto un editor umano possa produrre.
51 models available
Google Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Kling v3 Omni
Most versatile: multi-reference images, video editing, native audio
OpenAI Sora 2
OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.
Runway Gen 4.5
Top-ranked for motion quality and visual fidelity
Sora
OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.
AnimateDiff
Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.
AnimateDiff Evolved
Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.
AnimateDiff Lightning
ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.
Champ Human Animation
Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.
CogVideoX-5B (open)
Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
DreamGaussian 4D
4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.
DynamiCrafter
Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.
EchoMimic
Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.
FILM Frame Interpolation
Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.
Google Veo 3 Fast
Faster cheaper Veo 3 with audio
Google Veo 3.1 Fast
Faster Veo 3.1 with image-to-video and audio
Grok Imagine Video
xAI video with native audio and lip-sync, up to 15s
Hailuo / MiniMax Video-01
MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.
Hailuo 2.3
Minimax model for realistic human motion and VFX
HunyuanVideo
Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.
HunyuanVideo
Tencent's open-source video generation model. Strong visual quality with diverse style support.
Kling 1.6 Pro
Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.
LivePortrait
Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.
LTX-Video (Lightricks)
Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).
Luma Dream Machine v1.6
Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.
Luma Ray Flash 2
Fast affordable video with I2V support
MagicAnimate
ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.
Minimax Video
MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.
Mochi 1
Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.
MOFA-Video
Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.
MuseTalk
Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.
Pika 2.0 (Official)
Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.
PixVerse v5.6
Physics-accurate video generation up to 1080p
Real-CUGAN
Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.
RIFE Frame Interpolation
Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.
Runway Gen-3 Alpha Turbo
Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).
SadTalker
Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.
Seedance Lite
Budget ByteDance video, fast and cheap
Seedance Pro
ByteDance video with T2V and I2V, up to 1080p
StreamingT2V
Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.
SwinIR Video
SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.
ToonCrafter
Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.
V-Express
Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.
VideoCrafter
Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.
Wan 2.1 (Alibaba)
Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.
Wan 2.2 Image-to-Video
Ultra-cheap I2V. Upload image and animate it.
Wan 2.2 Text-to-Video
Ultra-cheap T2V for pennies
Wav2Lip
Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.
Top video generation picks
Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreZhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
Learn moreGoogle's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreTop-ranked for motion quality and visual fidelity
Learn moreIl pricing del video è per secondo di output, non per token o per chiamata. Una clip flagship da cinque secondi costa da venti centesimi (Kling 1.6, Hunyuan Video) a circa un euro (Veo 3, Runway Gen-3 Alpha). I tier con audio costano più dei tier silent. I moltiplicatori di risoluzione si sommano alla durata: 720p è lo standard, 1080p costa circa 2× di più e il 4K è raro e costoso.
Il compromesso qui è durata contro coerenza. La maggior parte dei modelli commerciali limita l'output a cinque-dieci secondi perché le clip più lunghe vanno alla deriva — i personaggi cambiano vestito, gli sfondi si trasformano e la fisica salta. Per narrazioni più lunghe, generate una sequenza di shot più brevi e cuciteli in post. Image-to-video (frame iniziale + prompt di movimento) di solito produce risultati più stabili del puro text-to-video, soprattutto per personaggi e foto prodotto.
Attenzione al muro dei cinque secondi: praticamente tutti i modelli sul mercato oggi si fermano intorno ai cinque secondi di output continuo, e la qualità peggiora bruscamente se si spinge oltre. Se la sceneggiatura richiede dieci secondi, prevedete due shot. Attenzione anche all'audio: la maggior parte dei modelli esce muta e dovete sovrapporre l'audio separatamente — solo Veo 3 e qualche research preview hanno audio integrato per ora.
Le top picks qui sopra coprono il leader del realismo flagship, il workhorse più economico, il modello con la clip più lunga e l'opzione di preview più veloce della categoria.
Popular use cases
Common patterns built with video generation on Railwail.
Related comparisons
Side-by-side reviews of the most-compared models in this category.
Frequently asked questions
Start Building with AI
Access all models through a single API. Get free credits when you sign up — no credit card required.