HunyuanVideo

Tencent
Video Generation

Tencent's open-source video generation model. Strong visual quality with diverse style support.

Queue video with HunyuanVideo
Video generation runs asynchronously — we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job — typically 30 s to 2 min.
TL;DR·Last updated March 4, 2026

HunyuanVideo is video generation AI model from Tencent, priced at €0.000 per 1M input tokens with a unknown context window.

Try HunyuanVideo
Sign in to generate — 50 free credits on sign-up

Examples

See what HunyuanVideo can generate

0:08

Underwater Scene

"Camera gliding through a vibrant coral reef teeming with tropical fish, sunlight filtering through the crystal-clear water creating dancing light patterns on the ocean floor, documentary style"

0:06

Fantasy Animation

"A tiny glowing fairy emerging from an opening flower bud in an enchanted forest, sparkles trailing behind her wings as she takes flight, magical atmosphere with bioluminescent plants"

Pricing

Price per Generation
Per generation€2.00

API Integration

Use our OpenAI-compatible API to integrate HunyuanVideo into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("hunyuan-video", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("hunyuan-video", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("hunyuan-video", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€2.00
Avg. latency
120.0s
Est. duration
2min
Developer
Tencent
Category
Video Generation
Supported Formats
mp4
Tags
open-source

Deep dive — Tencent's HunyuanVideo

About Tencent
Founded 1998 · Shenzhen, China

Tencent Holdings is one of the largest technology and entertainment conglomerates in the world, founded in 1998 by Ma Huateng (Pony Ma) and four co-founders in Shenzhen. Tencent's AI Lab, founded in 2016, and the Hunyuan team (the company's foundation-model group) developed the Hunyuan family of text, image, 3D and video models. HunyuanVideo (released December 2024 with the 13B foundation model open-sourced) was the largest publicly released open-weight video diffusion model at the time, and rapidly became a popular base for community fine-tunes (LoRAs, control nets, audio-driven extensions). The model is hosted on Hugging Face and GitHub under a custom licence that allows research and limited commercial use.

Visit Tencent →
Architecture
Diffusion Transformer with dual-stream design (separate text/video streams)

HunyuanVideo is a 13B-parameter Diffusion Transformer (DiT) operating on a 3D causal VAE latent. Its denoiser uses a dual-stream / single-stream design inspired by FLUX-1: dedicated text and video streams first process their modalities separately and then concatenate for joint self-attention. Text conditioning combines a CLIP-like vision-language encoder and a multilingual large-language-model text encoder (MLLM) for stronger prompt understanding, especially on long captions. The model is trained with Flow Matching and 3D Rotary Position Embeddings, and uses a progressive curriculum from images to short videos to long videos at 720p / 24 fps. The accompanying paper details a meticulously curated multi-billion-clip dataset with hierarchical filtering and dense bilingual captions. HunyuanVideo supports text-to-video natively and the open release includes a high-quality 5-second / 720p model; community follow-ups added image-to-video, audio-driven and longer-duration variants.

Parameters
13 billion
Context
unknown
What it can do
  • 13B open-weight text-to-video model (one of the largest released openly)
  • Generates ~5-second clips at 720p / 24 fps natively
  • Bilingual Chinese/English prompt handling via MLLM text encoder
  • Dual-stream DiT architecture inspired by FLUX-1
  • Strong prompt adherence on long, dense captions
  • Active community ecosystem (image-to-video, LoRAs, audio-sync)
  • Runs on 60-80 GB GPU memory (FP16), 40 GB with optimisations
  • Permissive licence for non-commercial research and limited commercial use
  • Best for: open-source video pipelines, research, on-prem creative tooling.
Training & License

Multi-billion-clip curated video corpus with hierarchical aesthetic, motion and caption-quality filtering, plus dense bilingual captions produced by an in-house MLLM captioner.

License: Tencent Hunyuan Community Licence: weights free for research and limited commercial use with attribution; restrictions apply above thresholds and for certain jurisdictions.

Known limitations
  • Native duration ~5 seconds
  • 720p only (extensions require upscalers)
  • High VRAM requirements vs smaller open models
  • No native audio
  • Licence has thresholds for very large commercial deployers

Frequently asked questions

Start using HunyuanVideo today

Get started with free credits. No credit card required. Access HunyuanVideo and 100+ other models through a single API.