What is OpenAI TTS-1?

OpenAI TTS-1 is text-to-speech AI model developed by OpenAI. OpenAI's text-to-speech model. Six built-in voices with natural intonation. Access it through Railwail's unified, OpenAI-compatible API at €0.000 per 1M input tokens.

How much does OpenAI TTS-1 cost via Railwail?

Per-call: €0.60. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of OpenAI TTS-1?

OpenAI TTS-1 supports a unknown context window — enough for typical AI workloads.

How fast is OpenAI TTS-1?

Average response latency: 2.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is OpenAI TTS-1 better than ElevenLabs Multilingual V2?

It depends on your use case. OpenAI TTS-1 (OpenAI) and ElevenLabs Multilingual V2 (ElevenLabs) are both strong choices in text-to-speech. Compare them side-by-side at /compare/openai-tts-1-vs-elevenlabs-multilingual-v2.

OpenAI TTS-1

Name: OpenAI TTS-1
Brand: OpenAI
SKU: openai-tts-1
Price: 0.6 EUR
Availability: InStock

OpenAI

Text-to-Speech

OpenAI's text-to-speech model. Six built-in voices with natural intonation.

Speak with OpenAI TTS-1

Type any text and hear it spoken in a chosen voice.

Voice

Audio player appears here.

TL;DR·Last updated March 4, 2026

OpenAI TTS-1 is text-to-speech AI model from OpenAI, priced at €0.000 per 1M input tokens with a unknown context window.

Try OpenAI TTS-1

Text to speak

Voice

Speed

Examples

See what OpenAI TTS-1 can generate

Notification Voice

Input text:

"Your order has been confirmed and is being prepared. Estimated delivery time is thirty-five minutes. You'll receive a notification when your driver is on the way."

0:04

Tutorial Guide

Input text:

"Step one: open your terminal and navigate to the project directory. Step two: run npm install to download all dependencies. Step three: create a dot env file and add your API keys. Finally, run npm run dev to start the development server."

0:04

Pricing

Price per Generation

Per generation€0.60

API Integration

Use our OpenAI-compatible API to integrate OpenAI TTS-1 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("openai-tts-1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("openai-tts-1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("openai-tts-1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Price

€0.60

Avg. latency

2.0s

Est. duration

Developer

OpenAI

Deep dive — OpenAI's OpenAI TTS-1

About OpenAI

Founded 2015 · San Francisco, California, USA

OpenAI was founded in December 2015 by Sam Altman, Elon Musk, Greg Brockman, Ilya Sutskever, Wojciech Zaremba and John Schulman as a non-profit AI research lab. It restructured into the capped-profit OpenAI LP in 2019 and has since received over $13B from Microsoft plus billions more from Thrive Capital, Sequoia, Khosla and SoftBank, at a 2024 valuation north of $157B and a planned 2025 round reportedly at $500B. The audio research line at OpenAI includes Jukebox (2020), Whisper (2022) and the TTS-1 family released in November 2023 alongside the ChatGPT voice feature. TTS-1 powers ChatGPT Voice Mode, the OpenAI API audio endpoint and the Realtime API used in voice agents.

Visit OpenAI →

Architecture

Proprietary Transformer text-to-speech with neural codec

OpenAI TTS-1 is the standard-quality variant of OpenAI's text-to-speech model family, optimised for real-time streaming inside ChatGPT Voice and the Realtime API. It is a Transformer-based model that predicts neural-codec audio tokens conditioned on text and a fixed set of stock voices (alloy, echo, fable, onyx, nova, shimmer) that OpenAI recorded with professional voice talent. There is no voice cloning. The model supports six output formats (MP3, Opus, AAC, FLAC, WAV, PCM) and outputs at 24 kHz. TTS-1 is tuned for low latency and starts streaming audio within a few hundred milliseconds, making it suitable for live conversational agents, while the HD sibling trades latency for fidelity. Text input is capped at 4,096 characters per request. OpenAI has not published a technical paper but the system architecturally resembles published neural-codec TTS such as VALL-E.

Parameters: Undisclosed
Context: 4.1K tokens

What it can do

Real-time streaming TTS with sub-second first-byte latency
Six recorded voices: alloy, echo, fable, onyx, nova, shimmer
Multilingual: 50+ languages including English, German, Spanish, French, Italian, Japanese, Mandarin
Six output formats including streaming PCM for low latency
Powers ChatGPT Voice and the OpenAI Realtime API
Inexpensive at $0.015 per 1,000 characters (vs. $0.03 for HD)
Best for: voice agents, IVR, interactive narration, accessibility

Training & License

Not disclosed. OpenAI states voices were recorded with paid professional voice actors and the model was trained on a mixture of recorded speech and licensed text-audio pairs.

License: Proprietary commercial API. Generated audio may be used commercially under the OpenAI Usage Policy; the six stock voices cannot be impersonated outside the API.

Known limitations

No custom voice cloning
Limited prosody / emotion control
Quality below TTS-1 HD on slow, emotional narration
Hard cap of 4,096 characters per request
Closed weights, hosted only

Research papers

Frequently asked questions

Related Models

View all Text-to-Speech

ElevenLabs Multilingual V2

ElevenLabs

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

€1.00

AudioLDM 2

AudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

€0.01

Cartesia Sonic

Custom

Cartesia's ultra-low-latency TTS (~90ms TTFB). State-space model with voice cloning support.

Free

Chatterbox

Replicate

Resemble AI's open Chatterbox TTS. Zero-shot voice cloning from a short audio prompt with an exaggeration control for emotion intensity, plus CFG weight to balance pacing and fidelity.

€2.00

Start using OpenAI TTS-1 today

Get started with free credits. No credit card required. Access OpenAI TTS-1 and 100+ other models through a single API.

Get Started Free Browse All Models