How much does HunyuanVideo cost via Railwail?

Per-call: €5.00. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of HunyuanVideo?

HunyuanVideo supports a unknown context window — enough for typical AI workloads.

How fast is HunyuanVideo?

Average response latency: 120.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is HunyuanVideo better than Google Veo 2?

It depends on your use case. HunyuanVideo (Tencent) and Google Veo 2 (Google DeepMind) are both strong choices in video generation. Compare them side-by-side at /compare/hunyuan-video-vs-google-veo-2.

HunyuanVideo

Name: HunyuanVideo
Brand: Replicate
SKU: hunyuan-video
Price: 5 EUR
Availability: InStock

Popular

Tencent

Video Generation

Tencent's HunyuanVideo, a 13B open-weights text-to-video diffusion transformer. Produces high-motion, photorealistic clips with smooth temporal consistency and was one of the first open models to rival closed systems on motion quality.

Queue video with HunyuanVideo

Video generation runs asynchronously — we'll queue a job and you can track it in your history.

Generates as an async job — typically 30 s to 2 min.

TL;DR·Last updated June 24, 2026

HunyuanVideo is video generation AI model from Tencent, priced at €0.000 per 1M input tokens with a unknown context window.

About this model

HunyuanVideo from Tencent is a large open-source text-to-video model built on a DiT architecture with a unified full-attention design. It is known for realistic physics, large-scale motion and good text alignment, and ships its own weights, making it a popular base for fine-tuning and LoRA training.

Try HunyuanVideo

Prompt

Duration

Aspect Ratio

Examples

See what HunyuanVideo can generate

0:08

Underwater Scene

"Camera gliding through a vibrant coral reef teeming with tropical fish, sunlight filtering through the crystal-clear water creating dancing light patterns on the ocean floor, documentary style"

0:06

Fantasy Animation

"A tiny glowing fairy emerging from an opening flower bud in an enchanted forest, sparkles trailing behind her wings as she takes flight, magical atmosphere with bioluminescent plants"

Pricing

Price per Generation

Per generation€5.00

API Integration

Use our OpenAI-compatible API to integrate HunyuanVideo into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("hunyuan-video", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("hunyuan-video", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("hunyuan-video", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€5.00

Avg. latency

120.0s

Est. duration

2min

Developer

Tencent

Deep dive — Tencent's HunyuanVideo

About Tencent

Founded 1998 · Shenzhen, China

Tencent Holdings is one of the largest technology and entertainment conglomerates in the world, founded in 1998 by Ma Huateng (Pony Ma) and four co-founders in Shenzhen. Tencent's AI Lab, founded in 2016, and the Hunyuan team (the company's foundation-model group) developed the Hunyuan family of text, image, 3D and video models. HunyuanVideo (released December 2024 with the 13B foundation model open-sourced) was the largest publicly released open-weight video diffusion model at the time, and rapidly became a popular base for community fine-tunes (LoRAs, control nets, audio-driven extensions). The model is hosted on Hugging Face and GitHub under a custom licence that allows research and limited commercial use.

Visit Tencent →

Architecture

Diffusion Transformer with dual-stream design (separate text/video streams)

HunyuanVideo is a 13B-parameter Diffusion Transformer (DiT) operating on a 3D causal VAE latent. Its denoiser uses a dual-stream / single-stream design inspired by FLUX-1: dedicated text and video streams first process their modalities separately and then concatenate for joint self-attention. Text conditioning combines a CLIP-like vision-language encoder and a multilingual large-language-model text encoder (MLLM) for stronger prompt understanding, especially on long captions. The model is trained with Flow Matching and 3D Rotary Position Embeddings, and uses a progressive curriculum from images to short videos to long videos at 720p / 24 fps. The accompanying paper details a meticulously curated multi-billion-clip dataset with hierarchical filtering and dense bilingual captions. HunyuanVideo supports text-to-video natively and the open release includes a high-quality 5-second / 720p model; community follow-ups added image-to-video, audio-driven and longer-duration variants.

Parameters: 13 billion
Context: unknown

What it can do

13B open-weight text-to-video model (one of the largest released openly)
Generates ~5-second clips at 720p / 24 fps natively
Bilingual Chinese/English prompt handling via MLLM text encoder
Dual-stream DiT architecture inspired by FLUX-1
Strong prompt adherence on long, dense captions
Active community ecosystem (image-to-video, LoRAs, audio-sync)
Runs on 60-80 GB GPU memory (FP16), 40 GB with optimisations
Permissive licence for non-commercial research and limited commercial use
Best for: open-source video pipelines, research, on-prem creative tooling.

Training & License

Multi-billion-clip curated video corpus with hierarchical aesthetic, motion and caption-quality filtering, plus dense bilingual captions produced by an in-house MLLM captioner.

License: Tencent Hunyuan Community Licence: weights free for research and limited commercial use with attribution; restrictions apply above thresholds and for certain jurisdictions.

Known limitations

Native duration ~5 seconds
720p only (extensions require upscalers)
High VRAM requirements vs smaller open models
No native audio
Licence has thresholds for very large commercial deployers

Research papers

HunyuanVideo: A Systematic Framework For Large Video Generative Models (2024) →

Frequently asked questions

Related Models

View all Video Generation

Google Veo 2

Google DeepMind

Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.

€5.00

Google Veo 3

Google DeepMind

Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.

€0.75

Google Veo 3 (Replicate)

Google DeepMind

Google's Veo 3 served via Replicate. Text-to-video with native synchronized audio generation. High-fidelity motion and scene coherence in short clips.

€8.00

Google Veo 3.1

Google DeepMind

Latest Veo with image-to-video and context-aware audio

€6.00

Start using HunyuanVideo today

Get started with free credits. No credit card required. Access HunyuanVideo and 100+ other models through a single API.

Get Started Free Browse All Models

HunyuanVideo

About this model

Examples

Pricing

API Integration

Deep dive — Tencent's HunyuanVideo

Research papers

Frequently asked questions

What is HunyuanVideo?

How much does HunyuanVideo cost via Railwail?

What is the context window of HunyuanVideo?

How fast is HunyuanVideo?

Is HunyuanVideo better than Google Veo 2?

Related Models

Google Veo 2

Google Veo 3

Google Veo 3 (Replicate)

Google Veo 3.1

Start using HunyuanVideo today