Whisper Large v3 Turbo

Popular
OpenAI
Speech-to-Text

OpenAI's distilled Whisper Large v3. ~216x realtime, 99+ languages, MIT-licensed weights.

Transcribe with Whisper Large v3 Turbo
Upload an audio file and get a written transcript.
Sign in to try this model with €5 free credits.
Sign in
Transcript appears here.
TL;DR·Last updated May 16, 2026

Whisper Large v3 Turbo is speech-to-text AI model from OpenAI, priced at €0.000 per 1M input tokens with a unknown context window.

Try Whisper Large v3 Turbo

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generation€0.006

API Integration

Use our OpenAI-compatible API to integrate Whisper Large v3 Turbo into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("whisper-large-v3-turbo", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("whisper-large-v3-turbo", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("whisper-large-v3-turbo", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.006
Developer
OpenAI
Category
Speech-to-Text
Supported Formats
audio
Tags
openai
whisper
stt
transcription
open-weights
multilingual
per-minute

Deep dive — OpenAI's Whisper Large v3 Turbo

About OpenAI
Founded 2015 · San Francisco, California, USA

OpenAI was founded in December 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and John Schulman, restructured to capped-profit OpenAI LP in 2019. Whisper Large v3 Turbo was released in October 2024 as a distilled fast variant of Whisper Large v3, designed to deliver approximately 8x faster inference at near-identical accuracy by reducing the decoder depth from 32 to 4 layers. The release was led by the original Whisper authors (Alec Radford, Jong Wook Kim, Tao Xu) and remained under the MIT licence. Turbo was distributed via GitHub, the Hugging Face Hub and the OpenAI Whisper API as the new default model where supported, replacing many in-production deployments of Whisper Large v3 within weeks of launch.

Visit OpenAI
Architecture
Distilled encoder-decoder Transformer (4-layer decoder) for speech recognition

Whisper Large v3 Turbo is a distilled variant of Whisper Large v3 that keeps the same 32-layer audio encoder and 128-mel front-end but shrinks the decoder from 32 to just 4 Transformer layers, taking the total parameter count from 1.55B to 809M. The smaller decoder gives roughly 8x faster inference on long-form audio (and 4-5x faster on short clips) at a WER cost of approximately 0.5-1 percentage points on most benchmarks. The model was distilled on the same multilingual corpus as Large v3 (5 million hours total, of which 4 million are pseudo-labelled) with knowledge-distillation losses from the Large v3 teacher. Translation-to-English capability was deliberately removed to focus capacity on transcription quality. The 30-second sliding window, 99-language coverage and special task tokens are unchanged. Turbo runs in real-time on consumer GPUs (RTX 3060) and at 3-4x real-time on Apple Silicon CPUs via whisper.cpp.

Parameters
809M
Context
30 tokens
What it can do
  • 8x faster long-form transcription than Whisper Large v3
  • 99-language transcription with automatic language detection
  • Word-level timestamps preserved
  • Runs in real-time on a single consumer GPU (RTX 3060 / M2 Pro)
  • Half the memory footprint of Large v3 (809M vs 1.55B)
  • Open weights under MIT licence
  • Drop-in replacement for Large v3 in most pipelines
  • Best for: production ASR on commodity hardware, on-premise transcription, batch processing
Training & License

Distilled from Whisper Large v3 on the same 5-million-hour multilingual audio corpus with knowledge-distillation losses. Translation-to-English data was excluded.

License: MIT licence for code and weights; commercial use permitted.

Known limitations
  • No translation-to-English mode (transcription only)
  • WER 0.5-1 pp worse than Large v3 on average
  • Same 30-second hard window requires chunking
  • Same hallucination behaviour on silent / music-only audio
  • No native diarisation

Frequently asked questions

Start using Whisper Large v3 Turbo today

Get started with free credits. No credit card required. Access Whisper Large v3 Turbo and 100+ other models through a single API.