Mochi 1
Genmo's 10B open-weights text-to-video model. AsymmDiT architecture, 5.4s @ 480p.
Mochi 1 is video generation AI model from Genmo, priced at β¬0.000 per 1M input tokens with a unknown context window.
Pricing
API Integration
Use our OpenAI-compatible API to integrate Mochi 1 into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("mochi-1-genmo", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("mochi-1-genmo", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("mochi-1-genmo", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β Genmo's Mochi 1
Genmo was founded in 2023 by Paras Jain (CEO) and Ajay Jain (CTO), both PhDs from UC Berkeley's BAIR lab, with a focus on open-source generative video. The company released the early Replay product and the smaller GEN-1 video model before launching Mochi 1 in October 2024 as a 10B-parameter open-weight text-to-video model under the Apache 2.0 licence -- at the time the largest open-source video model with a fully permissive licence. Genmo positions Mochi 1 as a research foundation for the community to fine-tune, extend and study, in deliberate contrast to closed competitors. The company has raised over $30M from investors including NEA and the Y Combinator network.
Visit Genmo βMochi 1 is a 10B-parameter Asymmetric Diffusion Transformer (AsymmDiT) operating on a high-compression 3D causal Variational Autoencoder. The 'asymmetric' design uses dramatically more parameters for the video stream than for the text stream while sharing self-attention, on the hypothesis that visual modeling is the bottleneck for video generation. Position information uses 3D RoPE; the model is trained with Rectified Flow Matching at full resolution and high motion intensity. Text conditioning uses a T5-XXL encoder. Mochi 1 generates 5-second clips at 480p / 30 fps natively (with a 720p HD variant in preview at launch). Training is done on a curated, filtered video corpus with dense captions produced by an in-house captioner. The team explicitly report ablations on resolution scheduling, motion intensity filtering and caption quality.
- Parameters
- 10 billion
- Context
- unknown
- 10B open-weight text-to-video model under Apache 2.0 (most permissive in class)
- Asymmetric DiT design biased toward visual capacity
- 5-second 480p / 30 fps clips natively, 720p variant in preview
- Strong motion fidelity and prompt adherence relative to model size
- Permissive licence enables fully commercial fine-tunes and derivatives
- Runs on multi-GPU setups (4x H100) or quantised on single 80 GB H100
- Active community ecosystem: image-to-video, LoRAs, ComfyUI nodes
- Public benchmark contributions and reproducible training recipe
- Best for: open research, commercial fine-tunes, custom pipelines.
Curated filtered video corpus with dense captions from an in-house captioner; specific token / clip counts shared in the Genmo technical blog.
License: Apache 2.0 (fully permissive open-source licence on weights, code and inference).
Known limitations
- Native resolution 480p in base model (720p in preview at launch)
- 5-second clip duration
- No audio
- High inference cost relative to smaller open models
- Quality below closed leaders (Veo 3, Sora 2, Kling v3) at similar duration
Research papers
Frequently asked questions
Related Models
View all Video GenerationGoogle Veo 2
Google's state-of-the-art video generation model. Simulates real-world physics with various visual styles.
Google Veo 3
Google's Veo 3. High-fidelity text-to-video with native audio generation, up to 8s clips.
Google Veo 3.1
Latest Veo with image-to-video and context-aware audio
Kling v3
Cinematic video up to 15s with multi-shot and native audio
Start using Mochi 1 today
Get started with free credits. No credit card required. Access Mochi 1 and 100+ other models through a single API.