Wan 2.1 (Alibaba)

Replicate
Video Generation

Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.

Queue video with Wan 2.1 (Alibaba)
Video generation runs asynchronously — we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job — typically 30 s to 2 min.
TL;DR·Last updated May 16, 2026

Wan 2.1 (Alibaba) is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.

Try Wan 2.1 (Alibaba)
Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Wan 2.1 (Alibaba) into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("wan-2-1-alibaba", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("wan-2-1-alibaba", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("wan-2-1-alibaba", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Replicate
Category
Video Generation
Supported Formats
text
image
Tags
alibaba
wan
text-to-video
image-to-video
open-weights
moe
pricing-tbd

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.1 (Alibaba)

About Alibaba (Tongyi Wanxiang Lab)
Founded 1999 · Hangzhou, China

Alibaba was founded in 1999 by Jack Ma and 17 co-founders in Hangzhou and is one of China's two largest cloud and e-commerce conglomerates. Its Tongyi Lab (Tongyi Qianwen / Wanxiang) runs Alibaba's foundation-model research and shipped the Qwen large-language-model family, the Wanxiang image generator and the Wan video-generation family. Wan 2.0 launched in mid-2024 and Wan 2.1 in early 2025 as an open-weight diffusion-transformer family covering text-to-video, image-to-video and first-/last-frame conditioning, with a permissive licence designed to enable broad community adoption. Alibaba positioned Wan 2.1 as the strongest open-weight Chinese video model at its launch, with public weights and reproducible training recipes.

Visit Alibaba (Tongyi Wanxiang Lab) →
Architecture
Diffusion Transformer (DiT) family at 1.3B and 14B parameters with 3D causal VAE

Wan 2.1 is a family of open-weight Diffusion Transformer (DiT) video models released by Alibaba's Tongyi Wanxiang Lab. The family includes a 1.3B-parameter text-to-video model that runs on consumer GPUs and a 14B-parameter text-to-video and image-to-video model targeted at high-fidelity creative work. All variants operate on a 3D causal Variational Autoencoder (Wan-VAE) that compresses video into a spatio-temporal latent grid, then denoise that latent with a DiT trained using Flow Matching. Text conditioning uses a multilingual encoder built on the Qwen LLM family with strong Chinese-English capability. The Wan 2.1 release on GitHub and Hugging Face also includes weights and inference code for first-frame and last-frame conditioning. Native generation is 5 seconds at 832x480 (1.3B) or 720p / 24 fps (14B). The team report SOTA-class results on VBench among open models.

Parameters
1.3 billion (T2V-1.3B) and 14 billion (T2V-14B / I2V-14B)
Context
unknown
What it can do
  • Open-weight DiT family at 1.3B (consumer GPU) and 14B (high-fidelity) sizes
  • Text-to-video, image-to-video, first-frame and last-frame conditioning
  • Bilingual Chinese/English prompts via Qwen-based text encoder
  • Permissive licence aimed at broad community adoption
  • Strong performance on VBench among open-weight models
  • Runs locally on 24-48 GB consumer GPUs (1.3B variant)
  • Active community ecosystem (LoRAs, ComfyUI nodes, fine-tunes)
  • Reproducible training recipe documented in technical report
  • Best for: open-source video pipelines, research, on-prem creative tools, custom fine-tunes.
Training & License

Curated multi-million-clip multilingual video corpus filtered for aesthetics, motion and caption quality, with dense bilingual captions; specifics documented in the Wan 2.1 technical report.

License: Open weights under an Apache-style permissive licence (Wan 2.1 release), suitable for research and commercial use.

Known limitations
  • Native duration 5 seconds
  • Resolution capped at 720p in 14B base model, 832x480 in 1.3B
  • No native audio
  • High VRAM requirements for 14B variant
  • Quality below closed leaders (Veo 3, Sora 2, Kling v3) at similar duration

Frequently asked questions

Start using Wan 2.1 (Alibaba) today

Get started with free credits. No credit card required. Access Wan 2.1 (Alibaba) and 100+ other models through a single API.