How much does Deepgram Nova-3 cost via Railwail?

Per-call: €0.004. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Deepgram Nova-3?

Deepgram Nova-3 supports a unknown context window — enough for typical AI workloads.

How fast is Deepgram Nova-3?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Deepgram Nova-3 better than Incredibly Fast Whisper?

It depends on your use case. Deepgram Nova-3 (Custom) and Incredibly Fast Whisper (Community) are both strong choices in speech-to-text. Compare them side-by-side at /compare/nova-3-vs-incredibly-fast-whisper.

Does Deepgram Nova-3 support audio input?

Yes — Deepgram Nova-3 processes audio input. Supported formats: audio. Use the standard Railwail API endpoint with audio content blocks.

Deepgram Nova-3

Name: Deepgram Nova-3
Brand: Custom
SKU: nova-3
Price: 0.0043 EUR
Availability: InStock

Custom

Speech-to-Text

Deepgram's flagship STT. First to offer realtime multilingual transcription with self-serve customization.

Transcribe with Deepgram Nova-3

Upload an audio file and get a written transcript.

Drop or pick an audio file (MP3, WAV, M4A, FLAC).

Language

Transcript appears here.

TL;DR·Last updated June 24, 2026

Deepgram Nova-3 is speech-to-text AI model from Custom, priced at €0.000 per 1M input tokens with a unknown context window.

Try Deepgram Nova-3

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Language

Direct API access coming soon

Pricing

Price per Generation

Per generation€0.004

API Integration

Use our OpenAI-compatible API to integrate Deepgram Nova-3 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("nova-3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("nova-3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("nova-3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.004

Developer

Custom

Deep dive — Deepgram's Deepgram Nova-3

About Deepgram

Founded 2015 · San Francisco, California, USA

Deepgram was founded in 2015 by Scott Stephenson (CEO) and Noah Shutty after they finished PhDs in dark-matter physics at the University of Michigan and built a Hadoop pipeline to search audio archives. The company pivoted to commercial speech recognition in 2018 and became one of the first speech vendors to ship a fully end-to-end deep-learning ASR pipeline rather than the classical Kaldi stack. Deepgram has raised over $86M in venture funding from Tiger Global, Y Combinator, Wing, Madrona and NVIDIA's NVentures, with a 2024 Series C at a reported $1B+ valuation. The Nova model family launched in 2023 (Nova-1), followed by Nova-2 (2024) and Nova-3 (January 2025), positioning Deepgram as the fastest commercial ASR provider with sub-300 ms streaming and self-hostable deployments.

Visit Deepgram →

Architecture

End-to-end encoder-decoder speech-to-text with self-supervised pretraining

Deepgram Nova-3 is an end-to-end deep-learning automatic speech recognition model trained on what Deepgram describes as the largest commercial training corpus among production ASR providers, including 50,000+ hours of curated multi-domain audio plus self-supervised pretraining on hundreds of thousands of hours of unlabelled speech. Nova-3 introduced a unified multilingual model that natively handles English, Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Mandarin, Korean and 10+ more, with on-the-fly code-switching inside a single audio file. New capabilities include keyword prompting (custom vocabulary at request time without retraining), Self-Hosted edition for HIPAA/SOC2 deployments, and improved noise robustness on call-centre and phone audio. The streaming variant runs at sub-300 ms latency over WebSocket, while the pre-recorded variant supports up to 4 hours per file. Outputs include diarisation, punctuation, smart formatting, language detection and topic/intent metadata.

Parameters: Undisclosed (multi-billion)
Context: 14.4K tokens

What it can do

Real-time streaming ASR at sub-300 ms latency
Multilingual single model with code-switching across 36+ languages
Keyword prompting for runtime custom vocabulary
Speaker diarisation, punctuation and smart formatting
Phone-call optimised (8 kHz, low-bandwidth, noisy)
Self-Hosted deployment for HIPAA / SOC2 / on-premises
Up to 4 hours of audio per pre-recorded request
Best for: call centres, real-time meeting transcription, voice agents, compliance recording

Training & License

Self-supervised pretraining on hundreds of thousands of hours of unlabelled audio plus supervised training on 50,000+ hours of curated, labelled multilingual speech. Data is sourced from licensed corpora, customer opt-in audio and publicly available datasets.

License: Proprietary commercial API and Self-Hosted licence. Commercial use is permitted; Self-Hosted requires a separate enterprise agreement.

Known limitations

Word Error Rate worse than Whisper Large v3 on some long-form audiobook tasks
Streaming diarisation can mislabel speakers in fast cross-talk
Self-Hosted edition has high hardware requirements
Code-switching quality varies between language pairs
Closed weights for hosted API

Research papers

Frequently asked questions

Related Models

View all Speech-to-Text

Incredibly Fast Whisper

Community

Whisper Large v3 wrapped with Hugging Face Transformers optimizations (batched inference, flash attention) for very high throughput. Transcribes hours of audio in minutes on a single GPU. Maintained by Vaibhav Srivastav. Good when you need bulk transcription fast.

€1.00

Whisper

OpenAI

OpenAI's Whisper running on Replicate. General-purpose speech recognition trained on 680k hours of multilingual audio. Transcribes and translates 99 languages, robust to accents and background noise, and outputs plain text, segments, or word-level timestamps.

€2.00