Bark is audio & music AI model developed by Suno. Suno's text-to-audio model. Generates realistic speech, music, and sound effects. Access it through Railwail's unified, OpenAI-compatible API at €0.000 per 1M input tokens.

How much does Bark cost via Railwail?

Per-call: €0.50. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Bark?

Bark supports a unknown context window — enough for typical AI workloads.

Average response latency: 15.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is Bark better than MusicGen?

It depends on your use case. Bark (Suno) and MusicGen (Meta) are both strong choices in audio & music. Compare them side-by-side at /compare/bark-vs-musicgen.

Bark

Name: Bark
Brand: Replicate
SKU: bark
Price: 0.5 EUR
Availability: InStock

Suno

Audio & Music

Suno's text-to-audio model. Generates realistic speech, music, and sound effects.

Queue audio with Bark

Music and sound effects run asynchronously — we'll queue a job and you can track it in your history.

Generates as an async job — typically 30 s to 2 min.

TL;DR·Last updated March 4, 2026

Bark is audio & music AI model from Suno, priced at €0.000 per 1M input tokens with a unknown context window.

Try Bark

Prompt

Duration

Format

Examples

See what Bark can generate

Nature Ambience

0:15

"Peaceful forest soundscape with gentle bird songs, a babbling brook in the distance, rustling leaves in a light breeze, and occasional soft wind chimes, ambient and calming"

Retro Synth

0:20

"1980s synthwave track with pulsing arpeggiated synthesizers, heavy reverb snare drums, driving bass line, and shimmering pad layers, nostalgic and energetic, 120 BPM"

Pricing

Price per Generation

Per generation€0.50

API Integration

Use our OpenAI-compatible API to integrate Bark into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("bark", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("bark", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("bark", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.50

Avg. latency

15.0s

Est. duration

15s

Developer

Suno

Deep dive — Suno AI's Bark

About Suno AI

Founded 2022 · Cambridge, Massachusetts, USA

Suno AI was founded in 2022 by Mikey Shulman (CEO), Georg Kucsko, Martin Camacho and Keenan Freyberg, several of whom previously worked on machine learning at Kensho Technologies. The company specialises in generative audio for music and speech, and is best known for the Suno music generation app and the open-source Bark text-to-audio model released in April 2023. Bark was open-sourced under MIT licence on GitHub and quickly became one of the most popular community TTS models. Suno raised a Series B of $125M in May 2024 led by Lightspeed Venture Partners with participation from Founder Collective, Matrix and Nat Friedman, reaching a valuation in the high hundreds of millions. The company is a frequent target of music-industry lawsuits (RIAA filed in June 2024) over training data provenance.

Visit Suno AI →

Architecture

Transformer-based text-to-audio with neural codec (EnCodec) tokens

Bark is a fully generative text-to-audio model that produces highly realistic multilingual speech, music, background noise and non-verbal sounds (laughter, sighs, crying). It is implemented as a cascade of three GPT-style decoder-only Transformers: a semantic model that converts BPE-tokenised text to semantic audio tokens, a coarse acoustic model that predicts the first two codebooks of Meta's EnCodec neural audio codec at 24 kHz, and a fine acoustic model that predicts the remaining six codebooks. EnCodec then decodes the eight-codebook representation back to a 24 kHz waveform. Unlike conventional TTS, Bark has no phoneme front-end and learns prosody, accent and non-speech events directly from data. Output is capped at roughly 13 seconds per generation and longer audio is produced by chaining segments with optional voice-clone prompts. Suno released only the inference weights; the training corpus was not disclosed.

Parameters: ~1.5B across three stacked Transformer models (text, coarse, fine)
Context: 13 tokens

What it can do

Multilingual text-to-speech across 100+ languages with code-switching
Non-verbal vocalisations: [laughs], [sighs], [music], [gasps], [clears throat]
Background music and sound effect generation alongside speech
Voice presets and prompt-based voice cloning from short audio history prompts
Singing and humming generation
Open weights under MIT licence, runs on a single consumer GPU
24 kHz mono output via EnCodec decoder
Best for: indie game audio, podcast prototyping, creative TTS, on-device research

Training & License

Suno has not officially disclosed the training corpus. Community analysis suggests a large mix of public-domain audiobooks, podcasts, web-scraped speech and music. The model was trained on tens of thousands of hours of audio.

License: MIT licence for code; weights released for research and commercial use, but generated voices marked as not for impersonation. Suno's own commercial music product is separate and not Bark-based.

Known limitations

Generation is non-deterministic and prompt-sensitive
Maximum 13 s per chunk requires stitching for long audio
Cannot follow strict prosody or SSML
Voice cloning quality varies and is not commercial-grade
Slow on CPU; needs GPU for real-time

Research papers

Frequently asked questions

Related Models

View all Audio & Music

MusicGen

MAGNeT

Community

MAGNeT is Meta's masked, non-autoregressive audio generator. Instead of predicting tokens left to right it fills masked audio tokens in parallel over a few decoding steps, so generation is faster than autoregressive MusicGen at similar quality. This Replicate packaging exposes the text-to-music and text-to-sound variants.

€2.00

Stable Audio Open 1.0

Replicate

Stability AI's Stable Audio Open generates short audio from text prompts, tuned for sound effects, drum loops, instrument riffs and production elements rather than full songs. Open weights, latent diffusion over a 44.1kHz audio autoencoder, with a configurable seconds_total up to about 47 seconds.

€2.00

Udio V1.5

Replicate

AI music generation with studio-quality output. Generate full songs with vocals, instruments, and production.

€2.00

Start using Bark today

Get started with free credits. No credit card required. Access Bark and 100+ other models through a single API.

Get Started Free Browse All Models

Bark

Examples

Pricing

API Integration

Deep dive — Suno AI's Bark

Research papers

Frequently asked questions

What is Bark?

How much does Bark cost via Railwail?

What is the context window of Bark?

How fast is Bark?

Is Bark better than MusicGen?

Related Models

MusicGen

MAGNeT

Stable Audio Open 1.0

Udio V1.5

Start using Bark today