Wan 2.2 Image-to-Video
Ultra-cheap I2V. Upload image and animate it.
Wan 2.2 Image-to-Video is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.
Image References
Examples
See what Wan 2.2 Image-to-Video can generate
Animate
"Subject turns head and smiles"
Pricing
API Integration
Use our OpenAI-compatible API to integrate Wan 2.2 Image-to-Video into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("wan-i2v", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("wan-i2v", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("wan-i2v", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.2 Image-to-Video
Alibaba's Tongyi Lab runs the Qwen LLM family and the Wanxiang generative-media family (image and video). After Wan 2.0 (mid-2024) and Wan 2.1 (early 2025), the team released Wan 2.2 in 2025 as the next-generation open-weight video model. Wan 2.2 ships as two purpose-tuned variants -- Image-to-Video (Wan-I2V) and Text-to-Video (Wan-T2V) -- alongside an audio/A2V variant. Wan 2.2 Image-to-Video is positioned for animating a user-supplied first frame with strong identity preservation and rich motion, and remains under the permissive Wan-series licence that enables broad commercial and research use.
Visit Alibaba (Tongyi Wanxiang Lab) →Wan 2.2 Image-to-Video is a Diffusion Transformer operating on a 3D causal Wan-VAE latent. The model is initialised from a Wan 2.2 base and fine-tuned for image-to-video with dedicated conditioning adapters that inject features from the user-provided first frame at multiple resolutions of the DiT, ensuring strong identity preservation and continuity. Text conditioning uses a Qwen-family multilingual encoder with strong Chinese-English capability. Native generation is 5 seconds at 720p / 24 fps (with 1080p in extended modes). Wan 2.2's MoE-style scaling and improved 3D RoPE enable better motion physics and reduced identity drift relative to Wan 2.1. The training recipe includes a curriculum mixing image-to-image, image-to-short-video and image-to-long-video data with synthetic dense captions. The release is open-weight on Hugging Face and GitHub under the Wan-series permissive licence.
- Parameters
- 14 billion (I2V flagship); smaller variants available
- Context
- unknown
- Open-weight image-to-video model with strong identity preservation
- 5-second 720p / 24 fps generation natively, 1080p in extended modes
- Bilingual Chinese/English prompts via Qwen-based text encoder
- Identity-preserving conditioning on a user first frame
- Permissive Wan-series licence for research and commercial use
- Active community ecosystem on Hugging Face / ComfyUI
- Compatible with LoRA fine-tunes and reference adapters
- Reproducible training recipe documented in technical reports
- Best for: open-source image animation pipelines, e-commerce product motion, character animation.
Curated multi-million-clip multilingual video corpus with paired first-frame conditioning data and dense bilingual captions; specifics documented in the Wan 2.2 technical materials.
License: Open weights under an Apache-style permissive licence (Wan-series release).
Known limitations
- Native duration 5 seconds
- Identity can drift on long extensions
- No native audio
- High VRAM requirements for the 14B flagship
- Closed leaders (Veo 3, Sora 2, Kling v3) still ahead on absolute fidelity
Frequently asked questions
Related Models
View all Video GenerationGoogle Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Start using Wan 2.2 Image-to-Video today
Get started with free credits. No credit card required. Access Wan 2.2 Image-to-Video and 100+ other models through a single API.