ElevenLabs Scribe v1

ElevenLabs
Speech-to-Text

ElevenLabs' STT. 99 languages, word-level timestamps, speaker diarization, audio-event tagging.

Transcribe with ElevenLabs Scribe v1
Upload an audio file and get a written transcript.
Sign in to try this model with €5 free credits.
Sign in
Transcript appears here.
TL;DRΒ·Last updated May 16, 2026

ElevenLabs Scribe v1 is speech-to-text AI model from ElevenLabs, priced at €0.000 per 1M input tokens with a unknown context window.

Try ElevenLabs Scribe v1

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Sign in to generate β€” 50 free credits on sign-up

Pricing

Price per Generation
Per generation€0.004

API Integration

Use our OpenAI-compatible API to integrate ElevenLabs Scribe v1 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple β€” just pass a string
const reply = await rw.run("scribe-v1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("scribe-v1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("scribe-v1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.004
Developer
ElevenLabs
Category
Speech-to-Text
Supported Formats
audio
Tags
elevenlabs
scribe
stt
transcription
diarization
per-minute

Deep dive β€” ElevenLabs's ElevenLabs Scribe v1

About ElevenLabs
Founded 2022 Β· London, UK / New York, USA

ElevenLabs was founded in 2022 by Piotr Dabkowski and Mati Staniszewski, two Polish technologists who had worked at Google and Palantir. The company became famous for high-quality multilingual text-to-speech and AI dubbing, and in February 2025 expanded into the inverse problem with Scribe v1, the company's first dedicated automatic speech recognition model. Scribe was developed in part to power ElevenLabs' Dubbing Studio (transcribe source audio, translate, then re-synthesise in the target language), and is offered as a standalone API to enterprise customers who want a single vendor for the full STT-to-TTS pipeline. ElevenLabs has raised over $280M to date, with a Series C in January 2025 at a $3.3B valuation.

Visit ElevenLabs β†’
Architecture
Proprietary encoder-decoder speech-to-text Transformer

ElevenLabs Scribe v1 is a hosted automatic speech recognition model launched in February 2025. ElevenLabs has not published a technical report, but the launch blog describes a Transformer encoder-decoder ASR architecture trained on a large multilingual speech corpus covering 99 languages, with particular emphasis on accuracy in long-tail languages where Whisper Large v3 underperforms. Scribe outperformed Whisper Large v3 and Deepgram Nova-2 in the company's published FLEURS and Common Voice evaluations across many language pairs, and ranked first overall in a head-to-head benchmark on Hindi, Mandarin, German and Italian. The model supports speaker diarisation up to 32 speakers, word-level timestamps with sub-100 ms precision, character-level confidence scores, automatic non-speech event detection ([applause], [laughter], [music]) and audio-event classification. Maximum file size is 1 GB and maximum audio length is 2 hours per request.

Parameters
Undisclosed
Context
7.2K tokens
What it can do
  • 99-language multilingual ASR including many low-resource languages
  • Speaker diarisation up to 32 speakers
  • Word-level timestamps with sub-100 ms precision
  • Non-speech event detection ([applause], [laughter], [music])
  • Character-level confidence scores
  • Up to 2 hours per request, 1 GB file limit
  • Direct integration with ElevenLabs Dubbing Studio (STT to translate to TTS)
  • Best for: dubbing pipelines, multilingual transcription, podcast indexing, media analytics
Training & License

Not disclosed. ElevenLabs reports training on a 'large multilingual corpus' with curation for long-tail languages; data is described as a mix of licensed and crowd-sourced opt-in audio.

License: Proprietary commercial API. Commercial use permitted on paid plans.

Known limitations
  • No streaming mode at launch (file-based only)
  • Hard cap of 2 hours per request
  • Pricing per minute higher than Deepgram Nova-3 for English
  • Closed weights, hosted only
  • Diarisation accuracy degrades in noisy cross-talk

Frequently asked questions

Start using ElevenLabs Scribe v1 today

Get started with free credits. No credit card required. Access ElevenLabs Scribe v1 and 100+ other models through a single API.