Mochi 1

Genmo
Video Generation

Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.

Queue video with Mochi 1
Video generation runs asynchronously β€” we'll queue a job and you can track it in your history.
Sign in to try this model with €5 free credits.
Sign in
Generates as an async job β€” typically 30 s to 2 min.
TL;DRΒ·Last updated May 16, 2026

Mochi 1 is video generation AI model from Genmo, priced at €0.000 per 1M input tokens with a unknown context window.

Try Mochi 1
Sign in to generate β€” 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Mochi 1 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple β€” just pass a string
const reply = await rw.run("mochi-1-genmo", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("mochi-1-genmo", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("mochi-1-genmo", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Genmo
Category
Video Generation
Supported Formats
text
Tags
genmo
mochi
text-to-video
open-weights
apache-2
pricing-tbd

Deep dive β€” Genmo's Mochi 1

About Genmo
Founded 2023 Β· San Francisco, USA

Genmo was founded in 2023 by Paras Jain (CEO) and Ajay Jain (CTO), both PhDs from UC Berkeley's BAIR lab, with a focus on open-source generative video. The company released the early Replay product and the smaller GEN-1 video model before launching Mochi 1 in October 2024 as a 10B-parameter open-weight text-to-video model under the Apache 2.0 licence -- at the time the largest open-source video model with a fully permissive licence. Genmo positions Mochi 1 as a research foundation for the community to fine-tune, extend and study, in deliberate contrast to closed competitors. The company has raised over $30M from investors including NEA and the Y Combinator network.

Visit Genmo β†’
Architecture
Asymmetric Diffusion Transformer (AsymmDiT) at 10B parameters with custom 3D VAE

Mochi 1 is a 10B-parameter Asymmetric Diffusion Transformer (AsymmDiT) operating on a high-compression 3D causal Variational Autoencoder. The 'asymmetric' design uses dramatically more parameters for the video stream than for the text stream while sharing self-attention, on the hypothesis that visual modeling is the bottleneck for video generation. Position information uses 3D RoPE; the model is trained with Rectified Flow Matching at full resolution and high motion intensity. Text conditioning uses a T5-XXL encoder. Mochi 1 generates 5-second clips at 480p / 30 fps natively (with a 720p HD variant in preview at launch). Training is done on a curated, filtered video corpus with dense captions produced by an in-house captioner. The team explicitly report ablations on resolution scheduling, motion intensity filtering and caption quality.

Parameters
10 billion
Context
unknown
What it can do
  • 10B open-weight text-to-video model under Apache 2.0 (most permissive in class)
  • Asymmetric DiT design biased toward visual capacity
  • 5-second 480p / 30 fps clips natively, 720p variant in preview
  • Strong motion fidelity and prompt adherence relative to model size
  • Permissive licence enables fully commercial fine-tunes and derivatives
  • Runs on multi-GPU setups (4x H100) or quantised on single 80 GB H100
  • Active community ecosystem: image-to-video, LoRAs, ComfyUI nodes
  • Public benchmark contributions and reproducible training recipe
  • Best for: open research, commercial fine-tunes, custom pipelines.
Training & License

Curated filtered video corpus with dense captions from an in-house captioner; specific token / clip counts shared in the Genmo technical blog.

License: Apache 2.0 (fully permissive open-source licence on weights, code and inference).

Known limitations
  • Native resolution 480p in base model (720p in preview at launch)
  • 5-second clip duration
  • No audio
  • High inference cost relative to smaller open models
  • Quality below closed leaders (Veo 3, Sora 2, Kling v3) at similar duration

Frequently asked questions

Start using Mochi 1 today

Get started with free credits. No credit card required. Access Mochi 1 and 100+ other models through a single API.