How much does ElevenLabs Multilingual V2 cost via Railwail?

Per-call: €1.00. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of ElevenLabs Multilingual V2?

ElevenLabs Multilingual V2 supports a unknown context window — enough for typical AI workloads.

How fast is ElevenLabs Multilingual V2?

Average response latency: 3.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is ElevenLabs Multilingual V2 better than AudioLDM 2?

It depends on your use case. ElevenLabs Multilingual V2 (ElevenLabs) and AudioLDM 2 (AudioLDM) are both strong choices in text-to-speech. Compare them side-by-side at /compare/elevenlabs-multilingual-v2-vs-audioldm-2.

ElevenLabs Multilingual V2

Name: ElevenLabs Multilingual V2
Brand: ElevenLabs
SKU: elevenlabs-multilingual-v2
Price: 1 EUR
Availability: InStock

Popular

ElevenLabs

Text-to-Speech

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

Speak with ElevenLabs Multilingual V2

Type any text and hear it spoken in a chosen voice.

Voice

Audio player appears here.

TL;DR·Last updated March 4, 2026

ElevenLabs Multilingual V2 is text-to-speech AI model from ElevenLabs, priced at €0.000 per 1M input tokens with a unknown context window.

Try ElevenLabs Multilingual V2

Text to speak

Voice

Speed

Examples

See what ElevenLabs Multilingual V2 can generate

Narration

Input text:

"Welcome to the future of artificial intelligence. In this episode, we explore how large language models are reshaping industries from healthcare to creative arts, and what it means for the next decade of human progress."

0:04

Podcast Intro

Input text:

"Hey everyone, welcome back to another episode of Tech Unfiltered! I'm your host, and today we have an incredible guest who just shipped one of the most downloaded apps of the year. Grab your coffee, because this conversation is going to be a wild ride."

0:04

Pricing

Price per Generation

Per generation€1.00

API Integration

Use our OpenAI-compatible API to integrate ElevenLabs Multilingual V2 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("elevenlabs-multilingual-v2", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("elevenlabs-multilingual-v2", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("elevenlabs-multilingual-v2", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€1.00

Avg. latency

3.0s

Est. duration

Developer

ElevenLabs

Deep dive — ElevenLabs's ElevenLabs Multilingual V2

About ElevenLabs

Founded 2022 · London, UK / New York, USA

ElevenLabs was founded in 2022 by Piotr Dabkowski and Mati Staniszewski, two Polish friends with backgrounds at Google and Palantir respectively. The product mission was to fix the poor dubbing experience for non-English films by producing AI voices with realistic intonation and emotion across many languages. The company is headquartered in London and New York with engineering centres in Warsaw and the Bay Area, and has raised over $280M across rounds led by Andreessen Horowitz and ICONIQ at valuations rising from $100M (Series A, 2023) to $3.3B (Series C, 2025). Multilingual V2, released in August 2023, became the default model for production audiobook and dubbing workflows at companies like Storytel, TheSoul Publishing and many indie publishers, and remained the flagship until v3 was previewed in 2025.

Visit ElevenLabs →

Architecture

Proprietary autoregressive Transformer TTS with neural codec

ElevenLabs Multilingual V2 is a hosted autoregressive Transformer text-to-speech model that predicts neural-codec audio tokens conditioned on text and a speaker embedding. The speaker embedding is obtained either from a stock voice (the curated 'Voice Library'), an Instant Voice Clone (1 minute of reference audio) or a Professional Voice Clone fine-tuned from 30 minutes of clean recording. Multilingual V2 supports 29 languages with strong code-switching between them in the same paragraph. Output is 24/44.1 kHz MP3 or PCM through a hosted API and the ElevenLabs Studio editor. The model is the production workhorse for ElevenLabs' Dubbing Studio (auto-translate plus voice match) and the AI Audiobook product. ElevenLabs has not published a technical paper, but the system architecturally resembles published neural-codec language-model TTS such as VALL-E and Voicebox.

Parameters: Undisclosed
Context: 5K tokens

What it can do

29 languages with seamless code-switching
Voice Library with thousands of community and stock voices
Instant Voice Clone (~1 min audio) and Professional Voice Clone (~30 min)
Stability and similarity sliders for per-request prosody control
Studio editor for multi-paragraph long-form projects
Dubbing Studio with automatic translation and voice matching
Up to ~5,000 characters per request
Best for: production audiobooks, multilingual dubbing, content localisation

Training & License

Not disclosed. Mix of licensed professional voice talent, public-domain audiobooks and opt-in user voice contributions.

License: Proprietary commercial SaaS. Commercial use permitted on paid plans; customer voice clones remain customer property.

Known limitations

Higher per-character price than OpenAI TTS-1 or Cartesia Sonic
Latency in the 300-600 ms range; slower than Sonic for real-time use
Limited SSML support
No on-premises deployment
29-language list smaller than v3's 70+

Research papers

Frequently asked questions

Related Models

View all Text-to-Speech

AudioLDM 2

AudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

€0.01

Cartesia Sonic

Custom

Cartesia's ultra-low-latency TTS (~90ms TTFB). State-space model with voice cloning support.

Free

Chatterbox

Replicate

Resemble AI's open Chatterbox TTS. Zero-shot voice cloning from a short audio prompt with an exaggeration control for emotion intensity, plus CFG weight to balance pacing and fidelity.

€2.00

Edge TTS