How much does Perplexity Sonar cost via Railwail?

Input: €1.00 per 1M tokens. Output: €1.00 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Perplexity Sonar?

Perplexity Sonar supports a 127K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Perplexity Sonar?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Perplexity Sonar better than Bio_ClinicalBERT?

It depends on your use case. Perplexity Sonar (Custom) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/sonar-vs-bio-clinicalbert.

Perplexity Sonar

Name: Perplexity Sonar
Brand: Custom
SKU: sonar
Price: 0.001 EUR
Availability: InStock

Custom

Text & Chat

Perplexity's fastest and cheapest web-grounded chat model. Live-source citations included.

Try Perplexity Sonar now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Perplexity Sonar is text & chat AI model from Custom, priced at €1.00 per 1M input tokens with a 127K tokens context window.

Try Perplexity Sonar

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Perplexity Sonar into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("sonar", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("sonar", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("sonar", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

127,000 tokens

Max output

8,192 tokens

Developer

Custom

Deep dive — Perplexity AI's Perplexity Sonar

About Perplexity AI

Founded 2022 · San Francisco, USA

Perplexity AI was founded in August 2022 by Aravind Srinivas (CEO, former OpenAI researcher and DeepMind/Google Brain alumnus), Denis Yarats (CTO, former Meta AI researcher), Andy Konwinski (co-founder of Databricks) and Johnny Ho. The company's product is an 'answer engine' that combines real-time web search with LLM-based reasoning to return cited answers, positioning itself as a search alternative to Google. Perplexity launched its consumer chat product in late 2022 and quickly raised over $500M across multiple rounds from investors including IVP, NEA, NVIDIA, Jeff Bezos and Susan Wojcicki, reaching a $9B valuation in late 2024. The company released its own Sonar model family in 2024-2025, fine-tuned for grounded, citation-bearing answers from live web retrieval. Sonar is positioned as a fast, cost-efficient default model in the Perplexity API and consumer product, with Sonar Pro and Sonar Reasoning as higher-capability variants. Perplexity also ships Perplexity Pages (long-form content), Perplexity Spaces (collaboration) and partnership integrations with Apple Intelligence and several US news organisations.

Visit Perplexity AI →

Architecture

Search-grounded LLM (Llama-derived, tuned for retrieval-augmented answering)

Perplexity Sonar is the default, fast and low-cost model in Perplexity's API and consumer product, released as a public API tier in January 2025. Sonar is built on a Llama-3.x-derived base, fine-tuned by Perplexity for retrieval-augmented generation, citation insertion and concise web-grounded answers. The training process is not fully disclosed but follows standard supervised fine-tuning and preference optimisation on Perplexity's curated logs of high-quality, well-cited answers, plus synthetic question-answer pairs grounded in real web pages. At inference time, every Sonar query is augmented with a live web search step run by Perplexity's in-house search index, which retrieves the most relevant pages, ranks them, and supplies them as context to the model. The model is trained to cite sources inline using numbered references, to refuse to answer when the corpus does not support the claim, and to compose answers in a concise, structured format. Sonar supports a 127K token context window and the standard OpenAI-compatible chat completions API, making it a drop-in replacement for OpenAI in many search-enabled agent stacks. The cost is roughly $1 per 1,000 search-augmented requests at launch, making it one of the cheapest fully-grounded options on the market.

Parameters: Undisclosed (Llama-3.x-derived base, mid-sized)
Context: 127K tokens

What it can do

Live web search built into every request
Inline citations with numbered source references
Concise, structured answers tuned for search use cases
127K context window
OpenAI-compatible chat completions API
Low latency and low cost (around $1 per 1k search-augmented requests at launch)
Returns sources, images and related questions in the response
Drop-in replacement for retrieval-augmented agents
Refuses to answer when source corpus is insufficient
Available in Perplexity API and consumer product
Best for: real-time research, grounded Q&A, news summarisation, RAG without managing your own retrieval.

Training & License

Built on a Llama-3.x-derived base, fine-tuned on Perplexity's curated logs of high-quality cited answers plus synthetic retrieval-grounded QA pairs. At inference time augmented with Perplexity's proprietary live web search index.

License: Proprietary commercial license via the Perplexity API. Weights not released. Standard Perplexity Terms of Service apply.

Known limitations

Quality depends heavily on Perplexity's retrieval ranking
May cite low-quality sources if highly ranked
Closed weights; no on-prem option
Citations occasionally do not support the exact claim
Less capable than Sonar Pro on complex multi-step reasoning

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8