ElevenLabs Multilingual V2

Popular
ElevenLabs
Text-to-Speech

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

Speak with ElevenLabs Multilingual V2
Type any text and hear it spoken in a chosen voice.
Sign in to try this model with €5 free credits.
Sign in
Audio player appears here.
TL;DR·Last updated March 4, 2026

ElevenLabs Multilingual V2 is text-to-speech AI model from ElevenLabs, priced at €0.000 per 1M input tokens with a unknown context window.

Try ElevenLabs Multilingual V2

1x

Sign in to generate — 50 free credits on sign-up

Examples

See what ElevenLabs Multilingual V2 can generate

Narration

Input text:

"Welcome to the future of artificial intelligence. In this episode, we explore how large language models are reshaping industries from healthcare to creative arts, and what it means for the next decade of human progress."

0:04

Podcast Intro

Input text:

"Hey everyone, welcome back to another episode of Tech Unfiltered! I'm your host, and today we have an incredible guest who just shipped one of the most downloaded apps of the year. Grab your coffee, because this conversation is going to be a wild ride."

0:04

Pricing

Price per Generation
Per generation€1.00

API Integration

Use our OpenAI-compatible API to integrate ElevenLabs Multilingual V2 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("elevenlabs-multilingual-v2", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("elevenlabs-multilingual-v2", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("elevenlabs-multilingual-v2", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€1.00
Avg. latency
3.0s
Est. duration
3s
Developer
ElevenLabs
Category
Text-to-Speech
Supported Formats
mp3
Tags
natural
multilingual
popular

Deep dive — ElevenLabs's ElevenLabs Multilingual V2

About ElevenLabs
Founded 2022 · London, UK / New York, USA

ElevenLabs was founded in 2022 by Piotr Dabkowski and Mati Staniszewski, two Polish friends with backgrounds at Google and Palantir respectively. The product mission was to fix the poor dubbing experience for non-English films by producing AI voices with realistic intonation and emotion across many languages. The company is headquartered in London and New York with engineering centres in Warsaw and the Bay Area, and has raised over $280M across rounds led by Andreessen Horowitz and ICONIQ at valuations rising from $100M (Series A, 2023) to $3.3B (Series C, 2025). Multilingual V2, released in August 2023, became the default model for production audiobook and dubbing workflows at companies like Storytel, TheSoul Publishing and many indie publishers, and remained the flagship until v3 was previewed in 2025.

Visit ElevenLabs →
Architecture
Proprietary autoregressive Transformer TTS with neural codec

ElevenLabs Multilingual V2 is a hosted autoregressive Transformer text-to-speech model that predicts neural-codec audio tokens conditioned on text and a speaker embedding. The speaker embedding is obtained either from a stock voice (the curated 'Voice Library'), an Instant Voice Clone (1 minute of reference audio) or a Professional Voice Clone fine-tuned from 30 minutes of clean recording. Multilingual V2 supports 29 languages with strong code-switching between them in the same paragraph. Output is 24/44.1 kHz MP3 or PCM through a hosted API and the ElevenLabs Studio editor. The model is the production workhorse for ElevenLabs' Dubbing Studio (auto-translate plus voice match) and the AI Audiobook product. ElevenLabs has not published a technical paper, but the system architecturally resembles published neural-codec language-model TTS such as VALL-E and Voicebox.

Parameters
Undisclosed
Context
5K tokens
What it can do
  • 29 languages with seamless code-switching
  • Voice Library with thousands of community and stock voices
  • Instant Voice Clone (~1 min audio) and Professional Voice Clone (~30 min)
  • Stability and similarity sliders for per-request prosody control
  • Studio editor for multi-paragraph long-form projects
  • Dubbing Studio with automatic translation and voice matching
  • Up to ~5,000 characters per request
  • Best for: production audiobooks, multilingual dubbing, content localisation
Training & License

Not disclosed. Mix of licensed professional voice talent, public-domain audiobooks and opt-in user voice contributions.

License: Proprietary commercial SaaS. Commercial use permitted on paid plans; customer voice clones remain customer property.

Known limitations
  • Higher per-character price than OpenAI TTS-1 or Cartesia Sonic
  • Latency in the 300-600 ms range; slower than Sonic for real-time use
  • Limited SSML support
  • No on-premises deployment
  • 29-language list smaller than v3's 70+

Frequently asked questions

Start using ElevenLabs Multilingual V2 today

Get started with free credits. No credit card required. Access ElevenLabs Multilingual V2 and 100+ other models through a single API.