Wan 2.2 Image-to-Video

New
Replicate
Video Generation

Ultra-cheap I2V. Upload image and animate it.

Queue video with Wan 2.2 Image-to-Video
Video generation runs asynchronously — we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job — typically 30 s to 2 min.
TL;DR·Last updated March 25, 2026

Wan 2.2 Image-to-Video is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.

Try Wan 2.2 Image-to-Video

Image References

Sign in to generate — 50 free credits on sign-up

Examples

See what Wan 2.2 Image-to-Video can generate

0:05

Animate

"Subject turns head and smiles"

Pricing

Price per Generation
Per generation€0.10

API Integration

Use our OpenAI-compatible API to integrate Wan 2.2 Image-to-Video into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("wan-i2v", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("wan-i2v", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("wan-i2v", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.10
Avg. latency
30.0s
Est. duration
30s
Developer
Replicate
Category
Video Generation
Supported Formats
mp4
Tags
budget
i2v
fast

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.2 Image-to-Video

About Alibaba (Tongyi Wanxiang Lab)
Founded 1999 · Hangzhou, China

Alibaba's Tongyi Lab runs the Qwen LLM family and the Wanxiang generative-media family (image and video). After Wan 2.0 (mid-2024) and Wan 2.1 (early 2025), the team released Wan 2.2 in 2025 as the next-generation open-weight video model. Wan 2.2 ships as two purpose-tuned variants -- Image-to-Video (Wan-I2V) and Text-to-Video (Wan-T2V) -- alongside an audio/A2V variant. Wan 2.2 Image-to-Video is positioned for animating a user-supplied first frame with strong identity preservation and rich motion, and remains under the permissive Wan-series licence that enables broad commercial and research use.

Visit Alibaba (Tongyi Wanxiang Lab) →
Architecture
Diffusion Transformer (DiT) with image-conditioning adapters and 3D causal VAE

Wan 2.2 Image-to-Video is a Diffusion Transformer operating on a 3D causal Wan-VAE latent. The model is initialised from a Wan 2.2 base and fine-tuned for image-to-video with dedicated conditioning adapters that inject features from the user-provided first frame at multiple resolutions of the DiT, ensuring strong identity preservation and continuity. Text conditioning uses a Qwen-family multilingual encoder with strong Chinese-English capability. Native generation is 5 seconds at 720p / 24 fps (with 1080p in extended modes). Wan 2.2's MoE-style scaling and improved 3D RoPE enable better motion physics and reduced identity drift relative to Wan 2.1. The training recipe includes a curriculum mixing image-to-image, image-to-short-video and image-to-long-video data with synthetic dense captions. The release is open-weight on Hugging Face and GitHub under the Wan-series permissive licence.

Parameters
14 billion (I2V flagship); smaller variants available
Context
unknown
What it can do
  • Open-weight image-to-video model with strong identity preservation
  • 5-second 720p / 24 fps generation natively, 1080p in extended modes
  • Bilingual Chinese/English prompts via Qwen-based text encoder
  • Identity-preserving conditioning on a user first frame
  • Permissive Wan-series licence for research and commercial use
  • Active community ecosystem on Hugging Face / ComfyUI
  • Compatible with LoRA fine-tunes and reference adapters
  • Reproducible training recipe documented in technical reports
  • Best for: open-source image animation pipelines, e-commerce product motion, character animation.
Training & License

Curated multi-million-clip multilingual video corpus with paired first-frame conditioning data and dense bilingual captions; specifics documented in the Wan 2.2 technical materials.

License: Open weights under an Apache-style permissive licence (Wan-series release).

Known limitations
  • Native duration 5 seconds
  • Identity can drift on long extensions
  • No native audio
  • High VRAM requirements for the 14B flagship
  • Closed leaders (Veo 3, Sora 2, Kling v3) still ahead on absolute fidelity

Frequently asked questions

Start using Wan 2.2 Image-to-Video today

Get started with free credits. No credit card required. Access Wan 2.2 Image-to-Video and 100+ other models through a single API.