Video Generation
Generate and edit videos with AI-powered models
Modèles de génération vidéo pour marketing, motion et prototypage
Les modèles vidéo transforment un prompt — ou une image fixe, ou un court clip de référence — en image en mouvement. C'est la catégorie la plus jeune et la plus volatile du catalogue : chaque trimestre amène un nouveau phare qui repousse la barre. On y a recours quand on veut du contenu en mouvement plus vite qu'un monteur humain ne peut le produire.
51 models available
Google Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Kling v3 Omni
Most versatile: multi-reference images, video editing, native audio
OpenAI Sora 2
OpenAI's second-generation Sora video model. Realistic motion, improved physics, audio support.
Runway Gen 4.5
Top-ranked for motion quality and visual fidelity
Sora
OpenAI video generation model. Create realistic and imaginative videos from text prompts up to 20 seconds.
AnimateDiff
Plug-and-play motion module that animates personalized Stable Diffusion models without further training. 16-frame clips at 512x512.
AnimateDiff Evolved
Community fork of AnimateDiff with improved motion modules, beta scheduler control and ControlNet integration for richer animation control.
AnimateDiff Lightning
ByteDance distillation of AnimateDiff. 4-step sampling for over 10x faster inference at comparable quality to multi-step base model.
Champ Human Animation
Champ controllable human image animation. Uses 3D parametric guidance (SMPL) for realistic full-body motion transfer from a single reference image.
CogVideoX-5B (open)
Zhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
DreamGaussian 4D
4D Gaussian-splatting generator extending DreamGaussian to video. Image-conditioned dynamic 3D scenes with view-consistent motion.
DynamiCrafter
Tencent DynamiCrafter. Animates still images into short videos preserving texture and structure, with strong open-domain coverage.
EchoMimic
Ant Group EchoMimic. Lifelike audio-driven portrait animation with editable landmark conditioning for fine-grained motion control.
FILM Frame Interpolation
Google FILM frame interpolation. Synthesizes high-quality intermediate frames between near-duplicate inputs, designed for large motion gaps.
Google Veo 3 Fast
Faster cheaper Veo 3 with audio
Google Veo 3.1 Fast
Faster Veo 3.1 with image-to-video and audio
Grok Imagine Video
xAI video with native audio and lip-sync, up to 15s
Hailuo / MiniMax Video-01
MiniMax's Hailuo video-01. 6s 1280x720 clips with strong cinematic motion and physical realism.
Hailuo 2.3
Minimax model for realistic human motion and VFX
HunyuanVideo
Tencent's 13B open-weights video diffusion transformer. SOTA among open video models at release.
HunyuanVideo
Tencent's open-source video generation model. Strong visual quality with diverse style support.
Kling 1.6 Pro
Kuaishou's Kling 1.6 Pro. Premium cinematic motion and physics realism, ~$0.07/sec.
LivePortrait
Kuaishou LivePortrait. Efficient portrait animation driven by reference videos with stitching, retargeting and motion-control parameters.
LTX-Video (Lightricks)
Lightricks' 2B DiT video model. Realtime generation on consumer GPUs (~6s @ H100, 24fps).
Luma Dream Machine v1.6
Luma's Dream Machine 1.6. 720p text/image-to-video with strong motion and camera control.
Luma Ray Flash 2
Fast affordable video with I2V support
MagicAnimate
ByteDance MagicAnimate. Temporally consistent human-image animation driven by a DensePose motion sequence with strong identity preservation.
Minimax Video
MiniMax's video generation model. Fast, high-quality video output with text-to-video capabilities.
Mochi 1
Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.
MOFA-Video
Motion-Field-Adapter video generator. Controllable image animation from trajectories, keypoints or audio with a strong identity preservation prior.
MuseTalk
Tencent MuseTalk real-time lip-sync model. Audio-driven mouth-region editing in latent space at 30+ fps on a single GPU.
Pika 2.0 (Official)
Pika Labs' 2.0 release. Cinematic text/image-to-video with scene composition controls.
PixVerse v5.6
Physics-accurate video generation up to 1080p
Real-CUGAN
Real-CUGAN anime-focused upscaler. 2x/3x/4x super-resolution tuned for animation, line-art, and illustrated content.
RIFE Frame Interpolation
Real-Time Intermediate Flow Estimation. Doubles or quadruples FPS of an existing video via learned optical-flow-based frame interpolation.
Runway Gen-3 Alpha Turbo
Runway's faster, cheaper Gen-3 variant. Image-to-video at 5 credits/sec (~$0.05/sec).
SadTalker
Stylized audio-driven talking-head generator. Synthesizes 3D motion coefficients from audio to animate a single portrait image with natural head movements.
Seedance Lite
Budget ByteDance video, fast and cheap
Seedance Pro
ByteDance video with T2V and I2V, up to 1080p
StreamingT2V
Picsart StreamingT2V. Generates long, consistent videos by chaining short autoregressive clips with motion and appearance memory.
SwinIR Video
SwinIR transformer-based super-resolution and denoising applied per-frame to video. Handles classic, real-world and lightweight upscaling.
ToonCrafter
Tencent ToonCrafter generative cartoon interpolation model. Synthesizes smooth in-between frames between two cartoon keyframes.
V-Express
Tencent V-Express. Audio-driven portrait animation with progressive training, weak-condition learning, and expressive lip sync.
VideoCrafter
Tencent VideoCrafter latent video diffusion. Text-to-video and image-to-video generation up to 2s at 1024x576 with strong motion fidelity.
Wan 2.1 (Alibaba)
Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.
Wan 2.2 Image-to-Video
Ultra-cheap I2V. Upload image and animate it.
Wan 2.2 Text-to-Video
Ultra-cheap T2V for pennies
Wav2Lip
Lip-sync model that re-syncs a target video's lip movement to an arbitrary audio track. Robust to identity and language with a lip-sync discriminator loss.
Top video generation picks
Hand-picked across four common criteria — resolved against the live catalog so the picks track price and performance changes.
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreZhipu/Tsinghua's 5B open text-to-video model. 720x480 @ 8fps, 6s clips, image-to-video variant available.
Learn moreGoogle's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Learn moreTop-ranked for motion quality and visual fidelity
Learn moreLa tarification en vidéo est à la seconde de sortie, pas au token ni à l'appel. Un clip phare de cinq secondes coûte entre vingt centimes (Kling 1.6, Hunyuan Video) et environ un euro (Veo 3, Runway Gen-3 Alpha). Les tiers avec son coûtent plus cher que les tiers muets. Les multiplicateurs de résolution s'ajoutent à la durée : 720p est le standard, 1080p coûte environ 2× plus, et la 4K est rare et chère.
Le compromis ici est durée contre cohérence. La plupart des modèles commerciaux plafonnent la sortie à cinq ou dix secondes parce qu'au-delà, les clips dérivent — les personnages changent de tenue, les arrière-plans se déforment et la physique s'effondre. Pour des récits plus longs, générez une séquence de plans courts et assemblez-les en post-production. L'image-à-vidéo (frame de départ + prompt de mouvement) produit typiquement des résultats plus stables que le pur texte-à-vidéo, surtout pour les personnages et les shots produit.
Attention au mur des cinq secondes : pratiquement tous les modèles du marché plafonnent aujourd'hui à environ cinq secondes de sortie continue, et la qualité chute fortement quand on insiste. Si votre script demande dix secondes, prévoyez deux plans. Attention aussi au son : la plupart des modèles livrent muet et il faut superposer l'audio séparément — seuls Veo 3 et quelques previews de recherche livrent un audio intégré pour le moment.
Les top picks ci-dessus couvrent le leader réalisme phare, le cheval de trait le moins cher, le modèle au clip le plus long et l'option preview la plus rapide de la catégorie.
Popular use cases
Common patterns built with video generation on Railwail.
Related comparisons
Side-by-side reviews of the most-compared models in this category.
Frequently asked questions
Start Building with AI
Access all models through a single API. Get free credits when you sign up — no credit card required.