How much does Wan 2.1 (Alibaba) cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Wan 2.1 (Alibaba)?

Wan 2.1 (Alibaba) supports a unknown context window — enough for typical AI workloads.

How fast is Wan 2.1 (Alibaba)?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Wan 2.1 (Alibaba) better than Google Veo 2?

It depends on your use case. Wan 2.1 (Alibaba) (Replicate) and Google Veo 2 (Google DeepMind) are both strong choices in video generation. Compare them side-by-side at /compare/wan-2-1-alibaba-vs-google-veo-2.

Does Wan 2.1 (Alibaba) support image input (vision)?

Yes — Wan 2.1 (Alibaba) accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: text, image.

Wan 2.1 (Alibaba)

Name: Wan 2.1 (Alibaba)
Brand: Replicate
SKU: wan-2-1-alibaba
Availability: InStock

Replicate

Video Generation

Alibaba's Wan 2.1 open-weights video diffusion model. 14B MoE-based, supports T2V and I2V.

Queue video with Wan 2.1 (Alibaba)

Video generation runs asynchronously — we'll queue a job and you can track it in your history.

Generates as an async job — typically 30 s to 2 min.

TL;DR·Last updated June 24, 2026

Wan 2.1 (Alibaba) is video generation AI model from Replicate, priced at €0.000 per 1M input tokens with a unknown context window.

Try Wan 2.1 (Alibaba)

Prompt

Duration

Aspect Ratio

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Wan 2.1 (Alibaba) into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("wan-2-1-alibaba", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("wan-2-1-alibaba", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("wan-2-1-alibaba", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Replicate

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.1 (Alibaba)

About Alibaba (Tongyi Wanxiang Lab)

Founded 1999 · Hangzhou, China

Alibaba was founded in 1999 by Jack Ma and 17 co-founders in Hangzhou and is one of China's two largest cloud and e-commerce conglomerates. Its Tongyi Lab (Tongyi Qianwen / Wanxiang) runs Alibaba's foundation-model research and shipped the Qwen large-language-model family, the Wanxiang image generator and the Wan video-generation family. Wan 2.0 launched in mid-2024 and Wan 2.1 in early 2025 as an open-weight diffusion-transformer family covering text-to-video, image-to-video and first-/last-frame conditioning, with a permissive licence designed to enable broad community adoption. Alibaba positioned Wan 2.1 as the strongest open-weight Chinese video model at its launch, with public weights and reproducible training recipes.

Visit Alibaba (Tongyi Wanxiang Lab) →

Architecture

Diffusion Transformer (DiT) family at 1.3B and 14B parameters with 3D causal VAE

Wan 2.1 is a family of open-weight Diffusion Transformer (DiT) video models released by Alibaba's Tongyi Wanxiang Lab. The family includes a 1.3B-parameter text-to-video model that runs on consumer GPUs and a 14B-parameter text-to-video and image-to-video model targeted at high-fidelity creative work. All variants operate on a 3D causal Variational Autoencoder (Wan-VAE) that compresses video into a spatio-temporal latent grid, then denoise that latent with a DiT trained using Flow Matching. Text conditioning uses a multilingual encoder built on the Qwen LLM family with strong Chinese-English capability. The Wan 2.1 release on GitHub and Hugging Face also includes weights and inference code for first-frame and last-frame conditioning. Native generation is 5 seconds at 832x480 (1.3B) or 720p / 24 fps (14B). The team report SOTA-class results on VBench among open models.

Parameters: 1.3 billion (T2V-1.3B) and 14 billion (T2V-14B / I2V-14B)
Context: unknown

What it can do

Open-weight DiT family at 1.3B (consumer GPU) and 14B (high-fidelity) sizes
Text-to-video, image-to-video, first-frame and last-frame conditioning
Bilingual Chinese/English prompts via Qwen-based text encoder
Permissive licence aimed at broad community adoption
Strong performance on VBench among open-weight models
Runs locally on 24-48 GB consumer GPUs (1.3B variant)
Active community ecosystem (LoRAs, ComfyUI nodes, fine-tunes)
Reproducible training recipe documented in technical report
Best for: open-source video pipelines, research, on-prem creative tools, custom fine-tunes.

Training & License

Curated multi-million-clip multilingual video corpus filtered for aesthetics, motion and caption quality, with dense bilingual captions; specifics documented in the Wan 2.1 technical report.

License: Open weights under an Apache-style permissive licence (Wan 2.1 release), suitable for research and commercial use.

Known limitations

Native duration 5 seconds
Resolution capped at 720p in 14B base model, 832x480 in 1.3B
No native audio
High VRAM requirements for 14B variant
Quality below closed leaders (Veo 3, Sora 2, Kling v3) at similar duration

Research papers

Wan: Open and Advanced Large-Scale Video Generative Models (2025) →

Frequently asked questions

Related Models

View all Video Generation

Google Veo 2

Google DeepMind

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00

Google Veo 3

Google DeepMind

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.75

Google Veo 3 (Replicate)

Google DeepMind

Google's Veo 3 served via Replicate. Text-to-video with native synchronized audio generation. High-fidelity motion and scene coherence in short clips.

€8.00

Google Veo 3.1

Google DeepMind

Latest Veo with image-to-video and context-aware audio

€6.00

Start using Wan 2.1 (Alibaba) today

Get started with free credits. No credit card required. Access Wan 2.1 (Alibaba) and 100+ other models through a single API.

Get Started Free Browse All Models

Wan 2.1 (Alibaba)

Pricing

API Integration

Deep dive — Alibaba (Tongyi Wanxiang Lab)'s Wan 2.1 (Alibaba)

Research papers

Frequently asked questions

What is Wan 2.1 (Alibaba)?

How much does Wan 2.1 (Alibaba) cost via Railwail?

What is the context window of Wan 2.1 (Alibaba)?

How fast is Wan 2.1 (Alibaba)?

Is Wan 2.1 (Alibaba) better than Google Veo 2?

Does Wan 2.1 (Alibaba) support image input (vision)?

Related Models

Google Veo 2

Google Veo 3

Google Veo 3 (Replicate)

Google Veo 3.1

Start using Wan 2.1 (Alibaba) today