Whisper Large V3

Popular
OpenAI
Speech-to-Text

OpenAI's Whisper model. State-of-the-art speech recognition supporting 99+ languages.

Transcribe with Whisper Large V3
Upload an audio file and get a written transcript.
Sign in to try this model with €5 free credits.
Sign in
Transcript appears here.
TL;DR·Last updated March 4, 2026

Whisper Large V3 is speech-to-text AI model from OpenAI, priced at €0.000 per 1M input tokens with a unknown context window.

Try Whisper Large V3

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Sign in to generate — 50 free credits on sign-up

Examples

See what Whisper Large V3 can generate

Meeting Notes

Transcription output:

Good morning everyone. Let's start with the sprint review. The authentication module shipped on Friday and we've had zero critical bugs reported so far. Sarah, can you walk us through the performance metrics? We're seeing a forty percent reduction in login latency which is well above our target.

Interview Transcript

Transcription output:

So tell me about your experience with distributed systems. I spent three years at a fintech startup building event-driven microservices using Kafka and Kubernetes. The biggest challenge was handling exactly-once delivery semantics across our payment processing pipeline. We ended up implementing an idempotency layer that reduced duplicate transactions by ninety-nine point nine percent.

Pricing

Price per Generation
Per generation€0.30

API Integration

Use our OpenAI-compatible API to integrate Whisper Large V3 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("whisper-large-v3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("whisper-large-v3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("whisper-large-v3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.30
Avg. latency
5.0s
Est. duration
5s
Developer
OpenAI
Category
Speech-to-Text
Supported Formats
json
text
srt
vtt
Tags
multilingual
popular

Deep dive — OpenAI's Whisper Large V3

About OpenAI
Founded 2015 · San Francisco, California, USA

OpenAI was founded in December 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and John Schulman, restructured to capped-profit OpenAI LP in 2019. Whisper was first released as an open-weights speech recognition model in September 2022 by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever, and quickly became the de-facto open ASR baseline thanks to its zero-shot multilingual quality. Whisper Large v3 was released in November 2023 alongside the GPT-4 Turbo launch as the final open checkpoint in the original Whisper line. OpenAI later released Whisper Large v3 Turbo (October 2024), a distilled faster variant. All Whisper weights remain available under MIT licence on GitHub and Hugging Face and are widely used in production by competitors.

Visit OpenAI →
Architecture
Encoder-decoder Transformer for multitask speech recognition

Whisper Large v3 is a 1.55B-parameter encoder-decoder Transformer trained for multitask audio understanding. Input is a 30-second log-mel spectrogram with 128 mel bins (up from 80 in v2) processed by a 32-layer audio encoder; the decoder is a 32-layer Transformer that emits special task tokens for language identification, transcription, translation-to-English and timestamp prediction. Whisper was trained on 5 million hours of audio in total: 680k hours of weakly supervised multilingual web audio for v1/v2 plus an additional 4 million hours of pseudo-labelled audio generated by Whisper Large v2 for v3, yielding approximately 1 million hours of weakly supervised and 4 million hours of pseudo-labelled data. The model covers 99 languages, with very strong performance on English long-form (~5% WER on LibriSpeech test-clean) and significantly improved low-resource coverage. Audio longer than 30 seconds is processed with a sliding window. The model is released under MIT licence.

Parameters
1.55B (Large v3)
Context
30 tokens
What it can do
  • 99-language transcription with automatic language detection
  • Translation of any supported language directly to English text
  • Word-level and segment-level timestamps
  • 30-second context window with sliding window for long-form audio
  • Open weights under MIT licence, runs on a single 16 GB GPU in fp16
  • Strong robustness to accents, background noise and technical jargon
  • Best for: open-source ASR pipelines, research baselines, on-premise transcription
Training & License

5 million hours total: ~680k hours of weakly supervised multilingual audio scraped from the public web (subtitles aligned with audio) plus ~4 million hours of pseudo-labelled audio generated by Whisper Large v2. 17% of the supervised set is non-English speech.

License: MIT licence for code and weights; commercial use permitted.

Known limitations
  • 30-second hard window requires chunking for long audio
  • Known to hallucinate transcripts on silent / music-only segments
  • WER on low-resource languages still much higher than English
  • No native diarisation (must be combined with pyannote)
  • Slow on CPU; requires GPU for real-time

Frequently asked questions

Start using Whisper Large V3 today

Get started with free credits. No credit card required. Access Whisper Large V3 and 100+ other models through a single API.