MusicGen
Meta's music generation model. Generate up to 1 minute of music from text descriptions.
MusicGen is audio & music AI model from Meta, priced at €0.000 per 1M input tokens with a unknown context window.
Examples
See what MusicGen can generate
Lo-Fi Chill
0:30
"Lo-fi hip hop beat with warm vinyl crackle, mellow jazz piano chords, soft brushed drums, and a gentle bass line, perfect for studying and relaxation, 85 BPM"
Epic Orchestral
0:20
"Cinematic orchestral piece building from a solo cello melody to a full symphonic crescendo with timpani, brass fanfare, and soaring strings, dramatic and emotional, suitable for a movie trailer"
Pricing
API Integration
Use our OpenAI-compatible API to integrate MusicGen into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("musicgen", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("musicgen", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("musicgen", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — Meta AI (FAIR)'s MusicGen
Meta AI is the research division of Meta Platforms, originally founded as Facebook AI Research (FAIR) in 2013 by Yann LeCun. FAIR has produced many influential open-source releases including PyTorch, RoBERTa, Wav2Vec 2.0, the LLaMA family, and the audio research stack AudioCraft. MusicGen was published by FAIR's audio team in June 2023 (Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Defossez) as part of AudioCraft, a unified library that also includes EnCodec and AudioGen. AudioCraft was released under the permissive CC-BY-NC and MIT licences and remains one of the most popular reference implementations for controllable music generation in academia.
Visit Meta AI (FAIR) →MusicGen is a single-stage auto-regressive Transformer that generates music by predicting tokens from a 32 kHz EnCodec residual vector quantiser at 50 Hz with four codebooks. The model interleaves the codebook streams using a delay pattern so that a single Transformer can model all four codebooks jointly, avoiding the cascaded models used in prior work such as MusicLM. Conditioning is provided by frozen T5 text encoders for text prompts and optionally by a chromagram-conditioned encoder for melody control, which lets users hum or upload a reference melody. The model was trained on 20,000 hours of licensed music (10K high-quality internal Meta tracks plus ShutterStock and Pond5 licensed catalogues), exclusively instrumental. Output is 32 kHz stereo or mono with a maximum length of 30 seconds in the public checkpoint; longer pieces are produced by sliding-window continuation. AudioCraft also ships training and fine-tuning scripts.
- Parameters
- 300M (small), 1.5B (medium), 3.3B (large), 3.3B (melody)
- Context
- 30 tokens
- Text-to-music generation across many genres and moods
- Melody conditioning via chroma features (hum or upload a tune)
- Single-stage generation with no cascade of models (simpler than MusicLM)
- Up to 30-second clips at 32 kHz, sliding window for longer pieces
- Multiple sizes from 300M to 3.3B for laptops to data-centre GPUs
- Fully open weights with reproducible training recipe
- Best for: research on controllable music generation, indie composers, game soundtrack prototyping
20,000 hours of licensed instrumental music: 10K hours from Meta's internal music dataset plus ShutterStock and Pond5 commercial libraries. All vocals were removed and the corpus is instrumental-only.
License: Code under MIT; pretrained weights under CC-BY-NC 4.0 (non-commercial). Commercial use requires separate licensing.
Known limitations
- Instrumental only, no vocal generation
- Non-commercial licence restricts production use
- 30-second clip length without continuation
- Difficulty with very specific instrument arrangements or named artists
- No fine-grained per-bar control
Frequently asked questions
Start using MusicGen today
Get started with free credits. No credit card required. Access MusicGen and 100+ other models through a single API.