How much does Whisper Large V3 cost via Railwail?

Per-call: €0.30. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Whisper Large V3?

Whisper Large V3 supports a unknown context window — enough for typical AI workloads.

How fast is Whisper Large V3?

Average response latency: 5.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is Whisper Large V3 better than Incredibly Fast Whisper?

It depends on your use case. Whisper Large V3 (OpenAI) and Incredibly Fast Whisper (Community) are both strong choices in speech-to-text. Compare them side-by-side at /compare/whisper-large-v3-vs-incredibly-fast-whisper.

Whisper Large V3

Name: Whisper Large V3
Brand: OpenAI
SKU: whisper-large-v3
Price: 0.3 EUR
Availability: InStock

Popular

OpenAI

Speech-to-Text

OpenAI's Whisper model. State-of-the-art speech recognition supporting 99+ languages.

Transcribe with Whisper Large V3

Upload an audio file and get a written transcript.

Drop or pick an audio file (MP3, WAV, M4A, FLAC).

Language

Transcript appears here.

TL;DR·Last updated March 4, 2026

Whisper Large V3 is speech-to-text AI model from OpenAI, priced at €0.000 per 1M input tokens with a unknown context window.

Try Whisper Large V3

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Language

Examples

See what Whisper Large V3 can generate

Meeting Notes

Transcription output:

Good morning everyone. Let's start with the sprint review. The authentication module shipped on Friday and we've had zero critical bugs reported so far. Sarah, can you walk us through the performance metrics? We're seeing a forty percent reduction in login latency which is well above our target.

Interview Transcript

Transcription output:

So tell me about your experience with distributed systems. I spent three years at a fintech startup building event-driven microservices using Kafka and Kubernetes. The biggest challenge was handling exactly-once delivery semantics across our payment processing pipeline. We ended up implementing an idempotency layer that reduced duplicate transactions by ninety-nine point nine percent.

Pricing

Price per Generation

Per generation€0.30

API Integration

Use our OpenAI-compatible API to integrate Whisper Large V3 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("whisper-large-v3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("whisper-large-v3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("whisper-large-v3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.30

Avg. latency

5.0s

Est. duration

Developer

OpenAI

Deep dive — OpenAI's Whisper Large V3

About OpenAI

Founded 2015 · San Francisco, California, USA

OpenAI was founded in December 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and John Schulman, restructured to capped-profit OpenAI LP in 2019. Whisper was first released as an open-weights speech recognition model in September 2022 by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey and Ilya Sutskever, and quickly became the de-facto open ASR baseline thanks to its zero-shot multilingual quality. Whisper Large v3 was released in November 2023 alongside the GPT-4 Turbo launch as the final open checkpoint in the original Whisper line. OpenAI later released Whisper Large v3 Turbo (October 2024), a distilled faster variant. All Whisper weights remain available under MIT licence on GitHub and Hugging Face and are widely used in production by competitors.

Visit OpenAI →

Architecture

Encoder-decoder Transformer for multitask speech recognition

Whisper Large v3 is a 1.55B-parameter encoder-decoder Transformer trained for multitask audio understanding. Input is a 30-second log-mel spectrogram with 128 mel bins (up from 80 in v2) processed by a 32-layer audio encoder; the decoder is a 32-layer Transformer that emits special task tokens for language identification, transcription, translation-to-English and timestamp prediction. Whisper was trained on 5 million hours of audio in total: 680k hours of weakly supervised multilingual web audio for v1/v2 plus an additional 4 million hours of pseudo-labelled audio generated by Whisper Large v2 for v3, yielding approximately 1 million hours of weakly supervised and 4 million hours of pseudo-labelled data. The model covers 99 languages, with very strong performance on English long-form (~5% WER on LibriSpeech test-clean) and significantly improved low-resource coverage. Audio longer than 30 seconds is processed with a sliding window. The model is released under MIT licence.

Parameters: 1.55B (Large v3)
Context: 30 tokens

What it can do

99-language transcription with automatic language detection
Translation of any supported language directly to English text
Word-level and segment-level timestamps
30-second context window with sliding window for long-form audio
Open weights under MIT licence, runs on a single 16 GB GPU in fp16
Strong robustness to accents, background noise and technical jargon
Best for: open-source ASR pipelines, research baselines, on-premise transcription

Training & License

5 million hours total: ~680k hours of weakly supervised multilingual audio scraped from the public web (subtitles aligned with audio) plus ~4 million hours of pseudo-labelled audio generated by Whisper Large v2. 17% of the supervised set is non-English speech.

License: MIT licence for code and weights; commercial use permitted.

Known limitations

30-second hard window requires chunking for long audio
Known to hallucinate transcripts on silent / music-only segments
WER on low-resource languages still much higher than English
No native diarisation (must be combined with pyannote)
Slow on CPU; requires GPU for real-time

Research papers

Frequently asked questions

Related Models

View all Speech-to-Text

Incredibly Fast Whisper

Community

Whisper Large v3 wrapped with Hugging Face Transformers optimizations (batched inference, flash attention) for very high throughput. Transcribes hours of audio in minutes on a single GPU. Maintained by Vaibhav Srivastav. Good when you need bulk transcription fast.

€1.00

Whisper

OpenAI

OpenAI's Whisper running on Replicate. General-purpose speech recognition trained on 680k hours of multilingual audio. Transcribes and translates 99 languages, robust to accents and background noise, and outputs plain text, segments, or word-level timestamps.

€2.00