What is Wan 2.2 Image-to-Video?

Wan 2.2 Image-to-Video is video generation AI model developed by Replicate. Ultra-cheap I2V. Upload image and animate it. Access it through Railwail's unified, OpenAI-compatible API at €0.000 per 1M input tokens.

How much does Wan 2.2 Image-to-Video cost via Railwail?

Per-call: €0.10. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Wan 2.2 Image-to-Video?

Wan 2.2 Image-to-Video supports a unknown context window — enough for typical AI workloads.

How fast is Wan 2.2 Image-to-Video?

Average response latency: 30.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is Wan 2.2 Image-to-Video better than Google Veo 2?

It depends on your use case. Wan 2.2 Image-to-Video (Replicate) and Google Veo 2 (Google DeepMind) are both strong choices in video generation. Compare them side-by-side at /compare/wan-i2v-vs-google-veo-2.

Wan 2.2 Image-to-Video

Name: Wan 2.2 Image-to-Video
Brand: Replicate
SKU: wan-i2v
Price: 0.1 EUR
Availability: InStock

New

Replicate

Video Generation

Ultra-cheap I2V. Upload image and animate it.

Queue video with Wan 2.2 Image-to-Video

Video generation runs asynchronously — we'll queue a job and you can track it in your history.

Generates as an async job — typically 30 s to 2 min.

TL;DR·Last updated March 25, 2026

Wan 2.2 Image-to-Video is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.

Try Wan 2.2 Image-to-Video

Prompt

Image References

Image to Animate

Upload

Duration

Aspect Ratio

Examples

See what Wan 2.2 Image-to-Video can generate

0:05

Animate

"Subject turns head and smiles"

Pricing

Price per Generation

Per generation€0.10

API Integration

Use our OpenAI-compatible API to integrate Wan 2.2 Image-to-Video into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("wan-i2v", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("wan-i2v", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("wan-i2v", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.10

Avg. latency

30.0s

Est. duration

30s

Developer

Replicate

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.2 Image-to-Video

About Alibaba (Tongyi Wanxiang Lab)

Founded 1999 · Hangzhou, China

Alibaba's Tongyi Lab runs the Qwen LLM family and the Wanxiang generative-media family (image and video). After Wan 2.0 (mid-2024) and Wan 2.1 (early 2025), the team released Wan 2.2 in 2025 as the next-generation open-weight video model. Wan 2.2 ships as two purpose-tuned variants -- Image-to-Video (Wan-I2V) and Text-to-Video (Wan-T2V) -- alongside an audio/A2V variant. Wan 2.2 Image-to-Video is positioned for animating a user-supplied first frame with strong identity preservation and rich motion, and remains under the permissive Wan-series licence that enables broad commercial and research use.

Visit Alibaba (Tongyi Wanxiang Lab) →

Architecture

Diffusion Transformer (DiT) with image-conditioning adapters and 3D causal VAE

Wan 2.2 Image-to-Video is a Diffusion Transformer operating on a 3D causal Wan-VAE latent. The model is initialised from a Wan 2.2 base and fine-tuned for image-to-video with dedicated conditioning adapters that inject features from the user-provided first frame at multiple resolutions of the DiT, ensuring strong identity preservation and continuity. Text conditioning uses a Qwen-family multilingual encoder with strong Chinese-English capability. Native generation is 5 seconds at 720p / 24 fps (with 1080p in extended modes). Wan 2.2's MoE-style scaling and improved 3D RoPE enable better motion physics and reduced identity drift relative to Wan 2.1. The training recipe includes a curriculum mixing image-to-image, image-to-short-video and image-to-long-video data with synthetic dense captions. The release is open-weight on Hugging Face and GitHub under the Wan-series permissive licence.

Parameters: 14 billion (I2V flagship); smaller variants available
Context: unknown

What it can do

Open-weight image-to-video model with strong identity preservation
5-second 720p / 24 fps generation natively, 1080p in extended modes
Bilingual Chinese/English prompts via Qwen-based text encoder
Identity-preserving conditioning on a user first frame
Permissive Wan-series licence for research and commercial use
Active community ecosystem on Hugging Face / ComfyUI
Compatible with LoRA fine-tunes and reference adapters
Reproducible training recipe documented in technical reports
Best for: open-source image animation pipelines, e-commerce product motion, character animation.

Training & License

Curated multi-million-clip multilingual video corpus with paired first-frame conditioning data and dense bilingual captions; specifics documented in the Wan 2.2 technical materials.

License: Open weights under an Apache-style permissive licence (Wan-series release).

Known limitations

Native duration 5 seconds
Identity can drift on long extensions
No native audio
High VRAM requirements for the 14B flagship
Closed leaders (Veo 3, Sora 2, Kling v3) still ahead on absolute fidelity

Research papers

Frequently asked questions

Related Models

View all Video Generation

Google Veo 2

Google DeepMind

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00

Google Veo 3

Google DeepMind

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.75

Google Veo 3 (Replicate)

Google DeepMind

Google's Veo 3 served via Replicate. Text-to-video with native synchronized audio generation. High-fidelity motion and scene coherence in short clips.

€8.00

Google Veo 3.1

Google DeepMind

Latest Veo with image-to-video and context-aware audio

€6.00

Start using Wan 2.2 Image-to-Video today

Get started with free credits. No credit card required. Access Wan 2.2 Image-to-Video and 100+ other models through a single API.

Get Started Free Browse All Models