Bark
Suno's text-to-audio model. Generates realistic speech, music, and sound effects.
Bark is audio & music AI model from Suno, priced at β¬0.000 per 1M input tokens with a unknown context window.
Examples
See what Bark can generate
Nature Ambience
0:15
"Peaceful forest soundscape with gentle bird songs, a babbling brook in the distance, rustling leaves in a light breeze, and occasional soft wind chimes, ambient and calming"
Retro Synth
0:20
"1980s synthwave track with pulsing arpeggiated synthesizers, heavy reverb snare drums, driving bass line, and shimmering pad layers, nostalgic and energetic, 120 BPM"
Pricing
API Integration
Use our OpenAI-compatible API to integrate Bark into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("bark", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("bark", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("bark", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β Suno AI's Bark
Suno AI was founded in 2022 by Mikey Shulman (CEO), Georg Kucsko, Martin Camacho and Keenan Freyberg, several of whom previously worked on machine learning at Kensho Technologies. The company specialises in generative audio for music and speech, and is best known for the Suno music generation app and the open-source Bark text-to-audio model released in April 2023. Bark was open-sourced under MIT licence on GitHub and quickly became one of the most popular community TTS models. Suno raised a Series B of $125M in May 2024 led by Lightspeed Venture Partners with participation from Founder Collective, Matrix and Nat Friedman, reaching a valuation in the high hundreds of millions. The company is a frequent target of music-industry lawsuits (RIAA filed in June 2024) over training data provenance.
Visit Suno AI βBark is a fully generative text-to-audio model that produces highly realistic multilingual speech, music, background noise and non-verbal sounds (laughter, sighs, crying). It is implemented as a cascade of three GPT-style decoder-only Transformers: a semantic model that converts BPE-tokenised text to semantic audio tokens, a coarse acoustic model that predicts the first two codebooks of Meta's EnCodec neural audio codec at 24 kHz, and a fine acoustic model that predicts the remaining six codebooks. EnCodec then decodes the eight-codebook representation back to a 24 kHz waveform. Unlike conventional TTS, Bark has no phoneme front-end and learns prosody, accent and non-speech events directly from data. Output is capped at roughly 13 seconds per generation and longer audio is produced by chaining segments with optional voice-clone prompts. Suno released only the inference weights; the training corpus was not disclosed.
- Parameters
- ~1.5B across three stacked Transformer models (text, coarse, fine)
- Context
- 13 tokens
- Multilingual text-to-speech across 100+ languages with code-switching
- Non-verbal vocalisations: [laughs], [sighs], [music], [gasps], [clears throat]
- Background music and sound effect generation alongside speech
- Voice presets and prompt-based voice cloning from short audio history prompts
- Singing and humming generation
- Open weights under MIT licence, runs on a single consumer GPU
- 24 kHz mono output via EnCodec decoder
- Best for: indie game audio, podcast prototyping, creative TTS, on-device research
Suno has not officially disclosed the training corpus. Community analysis suggests a large mix of public-domain audiobooks, podcasts, web-scraped speech and music. The model was trained on tens of thousands of hours of audio.
License: MIT licence for code; weights released for research and commercial use, but generated voices marked as not for impersonation. Suno's own commercial music product is separate and not Bark-based.
Known limitations
- Generation is non-deterministic and prompt-sensitive
- Maximum 13 s per chunk requires stitching for long audio
- Cannot follow strict prosody or SSML
- Voice cloning quality varies and is not commercial-grade
- Slow on CPU; needs GPU for real-time
Frequently asked questions
Start using Bark today
Get started with free credits. No credit card required. Access Bark and 100+ other models through a single API.