Wan 2.2 Text-to-Video

New
Replicate
Video Generation

Ultra-cheap T2V for pennies

Queue video with Wan 2.2 Text-to-Video
Video generation runs asynchronously — we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job — typically 30 s to 2 min.
TL;DR·Last updated March 25, 2026

Wan 2.2 Text-to-Video is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.

Try Wan 2.2 Text-to-Video
Sign in to generate — 50 free credits on sign-up

Examples

See what Wan 2.2 Text-to-Video can generate

0:05

Quick

"Cat playing with yarn on wooden floor"

Pricing

Price per Generation
Per generation€0.10

API Integration

Use our OpenAI-compatible API to integrate Wan 2.2 Text-to-Video into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("wan-t2v", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("wan-t2v", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("wan-t2v", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.10
Avg. latency
30.0s
Est. duration
30s
Developer
Replicate
Category
Video Generation
Supported Formats
mp4
Tags
budget
fast

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.2 Text-to-Video

About Alibaba (Tongyi Wanxiang Lab)
Founded 1999 · Hangzhou, China

Alibaba's Tongyi Lab in Hangzhou runs the Qwen LLM family and the Wanxiang generative-media family. After Wan 2.0 (mid-2024) and Wan 2.1 (early 2025), the team released Wan 2.2 in 2025 as the next-generation open-weight video model. Wan 2.2 ships as purpose-tuned variants for Text-to-Video, Image-to-Video and Audio/A2V. Wan 2.2 Text-to-Video is the flagship pure-text-conditioned variant and replaces Wan 2.1 T2V-14B as the principal open-weight text-to-video reference for the Chinese research community. The Wan team consistently rank near the top of open-model VBench leaderboards and ship reproducible training code under a permissive Wan-series licence.

Visit Alibaba (Tongyi Wanxiang Lab) →
Architecture
Diffusion Transformer (DiT) with 3D causal VAE; MoE-style scaling

Wan 2.2 Text-to-Video is a Diffusion Transformer operating on a 3D causal Wan-VAE latent. Wan 2.2 introduces architectural refinements over Wan 2.1: improved 3D Rotary Position Embeddings, larger attention windows, and (in the flagship) a Mixture-of-Experts feed-forward design that routes tokens to specialist experts. Text conditioning uses a Qwen-family multilingual encoder with strong Chinese-English capability. The denoiser is trained with Flow Matching on a curated multi-million-clip multilingual video corpus with synthetic dense bilingual captions. Native generation is 5 seconds at 720p / 24 fps (with 1080p extensions). The training recipe and weights are open-source on Hugging Face and GitHub under the Wan-series permissive licence, designed to enable broad commercial and research use.

Parameters
14 billion (flagship); smaller variants available
Context
unknown
What it can do
  • Open-weight text-to-video flagship at 14B parameters (smaller variants available)
  • 5-second 720p / 24 fps generation natively, 1080p extensions
  • Bilingual Chinese/English prompts via Qwen-based text encoder
  • MoE-style scaling and improved 3D RoPE in flagship variant
  • Permissive Wan-series licence for research and commercial use
  • Top-tier results on VBench among open-weight models
  • Active community ecosystem (LoRAs, fine-tunes, ComfyUI nodes)
  • Reproducible training recipe and code
  • Best for: open-source video pipelines, research, on-prem creative tooling, branded fine-tunes.
Training & License

Curated multi-million-clip multilingual video corpus filtered for aesthetics, motion and caption quality, with dense bilingual captions; specifics documented in Wan technical materials.

License: Open weights under an Apache-style permissive licence (Wan-series release).

Known limitations
  • Native duration 5 seconds
  • No native audio
  • High VRAM requirements for the 14B flagship
  • Closed leaders (Veo 3, Sora 2, Kling v3) still ahead on absolute fidelity
  • Resolution capped at 720p natively (1080p only in extended modes)

Frequently asked questions

Start using Wan 2.2 Text-to-Video today

Get started with free credits. No credit card required. Access Wan 2.2 Text-to-Video and 100+ other models through a single API.