ElevenLabs v3 (alpha)

ElevenLabs
Text-to-Speech

ElevenLabs' v3 alpha TTS. Most expressive voice model with audio tags and laughter, higher latency.

Speak with ElevenLabs v3 (alpha)
Type any text and hear it spoken in a chosen voice.
Sign in to try this model with €5 free credits.
Sign in
Audio player appears here.
TL;DR·Last updated May 16, 2026

ElevenLabs v3 (alpha) is text-to-speech AI model from ElevenLabs, priced at €0.300 per 1M input tokens with a unknown context window.

Try ElevenLabs v3 (alpha)

1x

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate ElevenLabs v3 (alpha) into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("eleven-v3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("eleven-v3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("eleven-v3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
ElevenLabs
Category
Text-to-Speech
Supported Formats
text
Tags
elevenlabs
tts
expressive
alpha
per-character

Deep dive — ElevenLabs's ElevenLabs v3 (alpha)

About ElevenLabs
Founded 2022 · London, UK / New York, USA

ElevenLabs was founded in 2022 by Piotr Dabkowski (CTO, ex-Google ML engineer) and Mati Staniszewski (CEO, ex-Palantir), two Polish high-school friends frustrated with the poor quality of TV-show dubbing in Polish. The company set out to build voice AI that captures intonation and emotion across languages. Headquartered in London and New York with engineering hubs in Warsaw and the Bay Area, ElevenLabs raised a $19M Series A in June 2023 led by Andreessen Horowitz, a $80M Series B in January 2024 also led by a16z at a $1.1B valuation, and a $180M Series C in January 2025 at a $3.3B valuation co-led by a16z and ICONIQ. ElevenLabs v3 (alpha) was previewed in 2025 as the next generation flagship model with expressive emotion tags, longer context and more languages, succeeding the Multilingual V2 family that became the de-facto standard for AI dubbing.

Visit ElevenLabs →
Architecture
Proprietary autoregressive Transformer TTS with neural codec and emotion/prosody conditioning

ElevenLabs v3 (alpha) is the company's 2025 flagship text-to-speech model and the first ElevenLabs system to expose explicit emotion and event tags inside text input ([whispers], [laughs], [angry], [sighs]). It is a proprietary Transformer-based autoregressive model that predicts neural-codec audio tokens conditioned on a text prompt and a speaker embedding obtained from a few seconds of reference audio (Instant Voice Clone) or a fully fine-tuned voice (Professional Voice Clone, requires ~30 minutes of clean audio). v3 expands language coverage from 29 (v2) to 70+ languages, lengthens the input window to roughly 10,000 characters per request, and adds dialogue mode for multi-speaker scenes. ElevenLabs has not published a technical paper; product blog posts describe internal improvements in speaker disentanglement, code-switching and emotional range. v3 is offered through the same hosted API and Studio UI as Multilingual V2 but at higher latency and price.

Parameters
Undisclosed
Context
10K tokens
What it can do
  • Expressive emotion and event tags ([laughs], [whispers], [angry], [crying])
  • 70+ languages with high-quality code-switching
  • Multi-speaker dialogue mode for podcast and audiobook generation
  • Instant Voice Clone from ~1 minute of audio and Professional Voice Clone from ~30 minutes
  • Long-form input up to ~10,000 characters per request
  • Studio editor for multi-paragraph projects with per-line speaker control
  • Best for: audiobooks, dubbing, narrative podcasts, character voices for games
Training & License

Not disclosed. ElevenLabs licences professional voice talent, uses public-domain audiobooks and crowd-sourced opt-in voice contributions; commercial recordings are excluded per their public statements.

License: Proprietary commercial SaaS. Commercial use of generated audio is permitted on paid plans; voice clones remain customer property.

Known limitations
  • Higher latency than v2 Turbo or Cartesia Sonic
  • Tag interpretation occasionally inconsistent in alpha
  • Hard refusal for likeness of named public figures without verified consent
  • Closed weights, no on-premise deployment
  • Pricing per character is among the highest in the market

Frequently asked questions

Start using ElevenLabs v3 (alpha) today

Get started with free credits. No credit card required. Access ElevenLabs v3 (alpha) and 100+ other models through a single API.