How much does Cartesia Sonic cost via Railwail?

Input: €0.030 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Cartesia Sonic?

Cartesia Sonic supports a unknown context window — enough for typical AI workloads.

How fast is Cartesia Sonic?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Cartesia Sonic better than ElevenLabs Multilingual V2?

It depends on your use case. Cartesia Sonic (Custom) and ElevenLabs Multilingual V2 (ElevenLabs) are both strong choices in text-to-speech. Compare them side-by-side at /compare/cartesia-sonic-vs-elevenlabs-multilingual-v2.

Cartesia Sonic

Name: Cartesia Sonic
Brand: Custom
SKU: cartesia-sonic
Price: 0.00003 EUR
Availability: InStock

Custom

Text-to-Speech

Cartesia's ultra-low-latency TTS (~90ms TTFB). State-space model with voice cloning support.

Speak with Cartesia Sonic

Type any text and hear it spoken in a chosen voice.

Voice

Audio player appears here.

TL;DR·Last updated June 24, 2026

Cartesia Sonic is text-to-speech AI model from Custom, priced at €0.030 per 1M input tokens with a unknown context window.

Try Cartesia Sonic

Text to speak

Voice

Speed

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Cartesia Sonic into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("cartesia-sonic", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("cartesia-sonic", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("cartesia-sonic", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Custom

Deep dive — Cartesia AI's Cartesia Sonic

About Cartesia AI

Founded 2023 · San Francisco, California, USA

Cartesia AI was founded in 2023 by Karan Goel and Albert Gu, the academic team behind the influential state-space model line of research at Stanford and CMU (S4, H3, Mamba, Mamba-2). Co-founders also include Arjun Desai, Brandon Yang and Chris Re (Stanford advisor). The company set out to build voice and multimodal foundation models on a state-space backbone instead of Transformers in order to achieve sub-100 ms latency and stream-friendly inference. Cartesia raised a $27M seed in March 2024 led by Index Ventures with participation from Conviction, A* and Lightspeed, followed by a $64M Series A in October 2024 led by Kleiner Perkins at a reported $325M valuation. Sonic, the company's first product, launched in May 2024 and quickly became one of the lowest-latency commercial TTS systems on the market.

Visit Cartesia AI →

Architecture

State-space model (Mamba family) text-to-speech with neural codec output

Cartesia Sonic is a streaming text-to-speech model built on the structured state-space modelling (SSM) architecture pioneered by Cartesia's founders (S4, H3, Mamba, Mamba-2). Unlike Transformer-based TTS systems that scale quadratically with sequence length, Sonic uses linear-time SSMs with selective scan, which lets the model maintain a small constant-memory recurrent state and generate audio chunks as text streams in. Cartesia reports a model first-byte latency of around 75-90 ms on their hosted API, which is faster than ElevenLabs Turbo and OpenAI TTS-1. Sonic outputs 24 kHz PCM via a neural codec decoder, supports streaming text input (so it can start speaking before the LLM finishes its sentence) and offers voice cloning from short reference samples (3-30 seconds). Sonic 2 added improved prosody, multilingual coverage (15+ languages) and reduced WER. The system is offered exclusively as a hosted API; weights are not released.

Parameters: Undisclosed
Context: 24K tokens

What it can do

Sub-100 ms first-byte latency suitable for real-time voice agents
State-space (Mamba-family) backbone with linear-time inference
Streaming text input and streaming audio output via WebSocket or gRPC
Instant voice cloning from short reference audio
Multilingual: English, Spanish, French, German, Portuguese, Mandarin, Japanese and more
Emotion and pace controls via inline tags
24 kHz PCM, MP3 and Opus output formats
Best for: real-time voice agents, IVR systems, conversational AI, low-latency phone bots

Training & License

Cartesia has not disclosed the training corpus. Public statements describe a 'diverse multilingual speech dataset' with permissioned voice talent for the stock voice library.

License: Proprietary commercial API. Voice clones produced from customer audio remain customer property under the Terms of Service; commercial use is permitted on paid tiers.

Known limitations

Closed weights, hosted-only
Voice clone quality below ElevenLabs Multilingual V2 for nuanced emotional acting
Limited SSML / fine-grained prosody control
Hard cap of around 24,000 input characters per request
Mandarin and Japanese still less polished than English/Spanish

Research papers

Frequently asked questions

Related Models

View all Text-to-Speech

ElevenLabs Multilingual V2

ElevenLabs

ElevenLabs' most natural-sounding TTS model. Supports 29 languages with emotional range.

€1.00

AudioLDM 2

AudioLDM

Latent-diffusion model for general-purpose text-to-audio. Generates speech, music, and sound effects with a unified prior.

€0.01

Chatterbox

Replicate

Resemble AI's open Chatterbox TTS. Zero-shot voice cloning from a short audio prompt with an exaggeration control for emotion intensity, plus CFG weight to balance pacing and fidelity.

€2.00

Edge TTS

Custom

Microsoft Edge neural voices accessed via the open-source edge-tts wrapper. 400+ voices across 100+ locales, suitable for batch generation.

Free

Start using Cartesia Sonic today

Get started with free credits. No credit card required. Access Cartesia Sonic and 100+ other models through a single API.

Get Started Free Browse All Models

Cartesia Sonic

Pricing

API Integration

Deep dive — Cartesia AI's Cartesia Sonic

Research papers

Frequently asked questions

What is Cartesia Sonic?

How much does Cartesia Sonic cost via Railwail?

What is the context window of Cartesia Sonic?

How fast is Cartesia Sonic?

Is Cartesia Sonic better than ElevenLabs Multilingual V2?

Related Models

ElevenLabs Multilingual V2

AudioLDM 2

Chatterbox

Edge TTS

Start using Cartesia Sonic today