How much does ElevenLabs Scribe v1 cost via Railwail?

Per-call: €0.004. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of ElevenLabs Scribe v1?

ElevenLabs Scribe v1 supports a unknown context window — enough for typical AI workloads.

How fast is ElevenLabs Scribe v1?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is ElevenLabs Scribe v1 better than Whisper Large V3?

It depends on your use case. ElevenLabs Scribe v1 (ElevenLabs) and Whisper Large V3 (OpenAI) are both strong choices in speech-to-text. Compare them side-by-side at /compare/scribe-v1-vs-whisper-large-v3.

Does ElevenLabs Scribe v1 support audio input?

Yes — ElevenLabs Scribe v1 processes audio input. Supported formats: audio. Use the standard Railwail API endpoint with audio content blocks.

ElevenLabs Scribe v1

Name: ElevenLabs Scribe v1
Brand: ElevenLabs
SKU: scribe-v1
Price: 0.0037 EUR
Availability: InStock

ElevenLabs

Speech-to-Text

ElevenLabs' STT. 99 languages, word-level timestamps, speaker diarization, audio-event tagging.

Transcribe with ElevenLabs Scribe v1

Upload an audio file and get a written transcript.

Drop or pick an audio file (MP3, WAV, M4A, FLAC).

Language

Transcript appears here.

TL;DR·Last updated May 16, 2026

ElevenLabs Scribe v1 is speech-to-text AI model from ElevenLabs, priced at €0.000 per 1M input tokens with a unknown context window.

Try ElevenLabs Scribe v1

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Language

Pricing

Price per Generation

Per generation€0.004

API Integration

Use our OpenAI-compatible API to integrate ElevenLabs Scribe v1 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("scribe-v1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("scribe-v1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("scribe-v1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.004

Developer

ElevenLabs

Deep dive — ElevenLabs's ElevenLabs Scribe v1

About ElevenLabs

Founded 2022 · London, UK / New York, USA

ElevenLabs was founded in 2022 by Piotr Dabkowski and Mati Staniszewski, two Polish technologists who had worked at Google and Palantir. The company became famous for high-quality multilingual text-to-speech and AI dubbing, and in February 2025 expanded into the inverse problem with Scribe v1, the company's first dedicated automatic speech recognition model. Scribe was developed in part to power ElevenLabs' Dubbing Studio (transcribe source audio, translate, then re-synthesise in the target language), and is offered as a standalone API to enterprise customers who want a single vendor for the full STT-to-TTS pipeline. ElevenLabs has raised over $280M to date, with a Series C in January 2025 at a $3.3B valuation.

Visit ElevenLabs →

Architecture

Proprietary encoder-decoder speech-to-text Transformer

ElevenLabs Scribe v1 is a hosted automatic speech recognition model launched in February 2025. ElevenLabs has not published a technical report, but the launch blog describes a Transformer encoder-decoder ASR architecture trained on a large multilingual speech corpus covering 99 languages, with particular emphasis on accuracy in long-tail languages where Whisper Large v3 underperforms. Scribe outperformed Whisper Large v3 and Deepgram Nova-2 in the company's published FLEURS and Common Voice evaluations across many language pairs, and ranked first overall in a head-to-head benchmark on Hindi, Mandarin, German and Italian. The model supports speaker diarisation up to 32 speakers, word-level timestamps with sub-100 ms precision, character-level confidence scores, automatic non-speech event detection ([applause], [laughter], [music]) and audio-event classification. Maximum file size is 1 GB and maximum audio length is 2 hours per request.

Parameters: Undisclosed
Context: 7.2K tokens

What it can do

99-language multilingual ASR including many low-resource languages
Speaker diarisation up to 32 speakers
Word-level timestamps with sub-100 ms precision
Non-speech event detection ([applause], [laughter], [music])
Character-level confidence scores
Up to 2 hours per request, 1 GB file limit
Direct integration with ElevenLabs Dubbing Studio (STT to translate to TTS)
Best for: dubbing pipelines, multilingual transcription, podcast indexing, media analytics

Training & License

Not disclosed. ElevenLabs reports training on a 'large multilingual corpus' with curation for long-tail languages; data is described as a mix of licensed and crowd-sourced opt-in audio.

License: Proprietary commercial API. Commercial use permitted on paid plans.

Known limitations