Deepgram Nova-3
Deepgram's flagship STT. First to offer realtime multilingual transcription with self-serve customization.
Deepgram Nova-3 is speech-to-text AI model from Custom, priced at β¬0.000 per 1M input tokens with a unknown context window.
Drop audio file here
MP3, WAV, M4A, FLAC (max 25MB)
Pricing
API Integration
Use our OpenAI-compatible API to integrate Deepgram Nova-3 into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("nova-3", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("nova-3", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("nova-3", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β Deepgram's Deepgram Nova-3
Deepgram was founded in 2015 by Scott Stephenson (CEO) and Noah Shutty after they finished PhDs in dark-matter physics at the University of Michigan and built a Hadoop pipeline to search audio archives. The company pivoted to commercial speech recognition in 2018 and became one of the first speech vendors to ship a fully end-to-end deep-learning ASR pipeline rather than the classical Kaldi stack. Deepgram has raised over $86M in venture funding from Tiger Global, Y Combinator, Wing, Madrona and NVIDIA's NVentures, with a 2024 Series C at a reported $1B+ valuation. The Nova model family launched in 2023 (Nova-1), followed by Nova-2 (2024) and Nova-3 (January 2025), positioning Deepgram as the fastest commercial ASR provider with sub-300 ms streaming and self-hostable deployments.
Visit Deepgram βDeepgram Nova-3 is an end-to-end deep-learning automatic speech recognition model trained on what Deepgram describes as the largest commercial training corpus among production ASR providers, including 50,000+ hours of curated multi-domain audio plus self-supervised pretraining on hundreds of thousands of hours of unlabelled speech. Nova-3 introduced a unified multilingual model that natively handles English, Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Mandarin, Korean and 10+ more, with on-the-fly code-switching inside a single audio file. New capabilities include keyword prompting (custom vocabulary at request time without retraining), Self-Hosted edition for HIPAA/SOC2 deployments, and improved noise robustness on call-centre and phone audio. The streaming variant runs at sub-300 ms latency over WebSocket, while the pre-recorded variant supports up to 4 hours per file. Outputs include diarisation, punctuation, smart formatting, language detection and topic/intent metadata.
- Parameters
- Undisclosed (multi-billion)
- Context
- 14.4K tokens
- Real-time streaming ASR at sub-300 ms latency
- Multilingual single model with code-switching across 36+ languages
- Keyword prompting for runtime custom vocabulary
- Speaker diarisation, punctuation and smart formatting
- Phone-call optimised (8 kHz, low-bandwidth, noisy)
- Self-Hosted deployment for HIPAA / SOC2 / on-premises
- Up to 4 hours of audio per pre-recorded request
- Best for: call centres, real-time meeting transcription, voice agents, compliance recording
Self-supervised pretraining on hundreds of thousands of hours of unlabelled audio plus supervised training on 50,000+ hours of curated, labelled multilingual speech. Data is sourced from licensed corpora, customer opt-in audio and publicly available datasets.
License: Proprietary commercial API and Self-Hosted licence. Commercial use is permitted; Self-Hosted requires a separate enterprise agreement.
Known limitations
- Word Error Rate worse than Whisper Large v3 on some long-form audiobook tasks
- Streaming diarisation can mislabel speakers in fast cross-talk
- Self-Hosted edition has high hardware requirements
- Code-switching quality varies between language pairs
- Closed weights for hosted API
Frequently asked questions
Related Models
View all Speech-to-TextWhisper Large V3
OpenAI's Whisper model. State-of-the-art speech recognition supporting 99+ languages.
Whisper Large v3 Turbo
OpenAI's distilled Whisper Large v3. ~216x realtime, 99+ languages, MIT-licensed weights.
ElevenLabs Scribe v1
ElevenLabs' STT. 99 languages, word-level timestamps, speaker diarization, audio-event tagging.
SeamlessM4T v2 Large (Speech)
Meta SeamlessM4T v2 Large speech mode. Speech-to-speech, speech-to-text, and text-to-speech translation across 100+ languages in a single unified model.
Start using Deepgram Nova-3 today
Get started with free credits. No credit card required. Access Deepgram Nova-3 and 100+ other models through a single API.