Deepgram Nova-3

Custom
Speech-to-Text

Deepgram's flagship STT. First to offer realtime multilingual transcription with self-serve customization.

Transcribe with Deepgram Nova-3
Upload an audio file and get a written transcript.
Sign in to try this model with €5 free credits.
Sign in
Transcript appears here.
TL;DR·Last updated May 16, 2026

Deepgram Nova-3 is speech-to-text AI model from Custom, priced at €0.000 per 1M input tokens with a unknown context window.

Try Deepgram Nova-3

Drop audio file here

MP3, WAV, M4A, FLAC (max 25MB)

Direct API access coming soon

Pricing

Price per Generation
Per generation€0.004

API Integration

Use our OpenAI-compatible API to integrate Deepgram Nova-3 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("nova-3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("nova-3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("nova-3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Price
€0.004
Developer
Custom
Category
Speech-to-Text
Supported Formats
audio
Tags
deepgram
stt
transcription
realtime
multilingual
per-minute

Deep dive — Deepgram's Deepgram Nova-3

About Deepgram
Founded 2015 · San Francisco, California, USA

Deepgram was founded in 2015 by Scott Stephenson (CEO) and Noah Shutty after they finished PhDs in dark-matter physics at the University of Michigan and built a Hadoop pipeline to search audio archives. The company pivoted to commercial speech recognition in 2018 and became one of the first speech vendors to ship a fully end-to-end deep-learning ASR pipeline rather than the classical Kaldi stack. Deepgram has raised over $86M in venture funding from Tiger Global, Y Combinator, Wing, Madrona and NVIDIA's NVentures, with a 2024 Series C at a reported $1B+ valuation. The Nova model family launched in 2023 (Nova-1), followed by Nova-2 (2024) and Nova-3 (January 2025), positioning Deepgram as the fastest commercial ASR provider with sub-300 ms streaming and self-hostable deployments.

Visit Deepgram →
Architecture
End-to-end encoder-decoder speech-to-text with self-supervised pretraining

Deepgram Nova-3 is an end-to-end deep-learning automatic speech recognition model trained on what Deepgram describes as the largest commercial training corpus among production ASR providers, including 50,000+ hours of curated multi-domain audio plus self-supervised pretraining on hundreds of thousands of hours of unlabelled speech. Nova-3 introduced a unified multilingual model that natively handles English, Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Mandarin, Korean and 10+ more, with on-the-fly code-switching inside a single audio file. New capabilities include keyword prompting (custom vocabulary at request time without retraining), Self-Hosted edition for HIPAA/SOC2 deployments, and improved noise robustness on call-centre and phone audio. The streaming variant runs at sub-300 ms latency over WebSocket, while the pre-recorded variant supports up to 4 hours per file. Outputs include diarisation, punctuation, smart formatting, language detection and topic/intent metadata.

Parameters
Undisclosed (multi-billion)
Context
14.4K tokens
What it can do
  • Real-time streaming ASR at sub-300 ms latency
  • Multilingual single model with code-switching across 36+ languages
  • Keyword prompting for runtime custom vocabulary
  • Speaker diarisation, punctuation and smart formatting
  • Phone-call optimised (8 kHz, low-bandwidth, noisy)
  • Self-Hosted deployment for HIPAA / SOC2 / on-premises
  • Up to 4 hours of audio per pre-recorded request
  • Best for: call centres, real-time meeting transcription, voice agents, compliance recording
Training & License

Self-supervised pretraining on hundreds of thousands of hours of unlabelled audio plus supervised training on 50,000+ hours of curated, labelled multilingual speech. Data is sourced from licensed corpora, customer opt-in audio and publicly available datasets.

License: Proprietary commercial API and Self-Hosted licence. Commercial use is permitted; Self-Hosted requires a separate enterprise agreement.

Known limitations
  • Word Error Rate worse than Whisper Large v3 on some long-form audiobook tasks
  • Streaming diarisation can mislabel speakers in fast cross-talk
  • Self-Hosted edition has high hardware requirements
  • Code-switching quality varies between language pairs
  • Closed weights for hosted API

Frequently asked questions

Start using Deepgram Nova-3 today

Get started with free credits. No credit card required. Access Deepgram Nova-3 and 100+ other models through a single API.