How much does DeepSeek R1 cost via Railwail?

Input: €5.50 per 1M tokens. Output: €22.00 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of DeepSeek R1?

DeepSeek R1 supports a 64K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is DeepSeek R1?

Average response latency: 8.0s (p50 across recent Railwail traffic). See live p50/p95 metrics on /rankings.

Is DeepSeek R1 better than Bio_ClinicalBERT?

It depends on your use case. DeepSeek R1 (DeepSeek) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/deepseek-r1-vs-bio-clinicalbert.

DeepSeek R1

Name: DeepSeek R1
Brand: DeepSeek
SKU: deepseek-r1
Price: 0.0055 EUR
Availability: InStock

New

DeepSeek

Text & Chat

DeepSeek's reasoning model with chain-of-thought capabilities. Excellent for complex problem-solving.

Try DeepSeek R1 now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated March 4, 2026

DeepSeek R1 is text & chat AI model from DeepSeek, priced at €5.50 per 1M input tokens with a 64K tokens context window.

Try DeepSeek R1

System Prompt

Message

Temperature

0.7

Max Tokens

Examples

See what DeepSeek R1 can generate

Calculus Problem

Find the volume of the solid obtained by rotating y = x² from x=0 to x=2 around the x-axis

Using the disk method, V = π∫(0 to 2) [f(x)]² dx = π∫(0 to 2) x⁴ dx. Evaluating: π[x⁵/5] from 0 to 2 = π(32/5 - 0) = 32π/5. The volume is 32π/5 ≈ 20.11 cubic units. The disk method works here because we're rotating around the x-axis and the function is non-negative on the interval.

Physics Reasoning

A 2kg ball is thrown upward at 15 m/s. Ignoring air resistance, what is the maximum height and total time in the air?

At maximum height, velocity = 0. Using v² = v₀² - 2gh: 0 = 225 - 2(9.8)h, so h = 225/19.6 = 11.48 meters. For time to reach max height: v = v₀ - gt, so 0 = 15 - 9.8t, giving t = 1.53 seconds. Total flight time is double this (symmetry of projectile motion): 3.06 seconds. Note that mass doesn't affect the answer—the 2kg is irrelevant when air resistance is ignored, as all objects experience the same gravitational acceleration.

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate DeepSeek R1 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("deepseek-r1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("deepseek-r1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("deepseek-r1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

64,000 tokens

Max output

8,192 tokens

Avg. latency

8.0s

Developer

DeepSeek

Deep dive — DeepSeek's DeepSeek R1

About DeepSeek

Founded 2023 · Hangzhou, China

DeepSeek (formally DeepSeek AI) was founded in July 2023 by Liang Wenfeng, who is also the co-founder of the quantitative hedge fund High-Flyer (founded 2015). High-Flyer initially funded DeepSeek's research and provided access to thousands of NVIDIA A100 and H800 GPUs accumulated before US export controls tightened. DeepSeek's research output rapidly became influential: DeepSeek Coder (Nov 2023), DeepSeek LLM 67B (Jan 2024), DeepSeekMath (Feb 2024) which introduced GRPO reinforcement learning, DeepSeek V2 with Multi-head Latent Attention and MoE (May 2024), DeepSeek V3 (Dec 2024) and DeepSeek R1 (Jan 2025), the open-weight reasoning model that matched OpenAI o1 on many benchmarks. R1's release in January 2025 triggered a significant US stock-market re-rating of AI infrastructure spending given its training cost reportedly under $6M for the V3 base. DeepSeek publishes detailed technical reports and releases weights under the MIT license, making it one of the most transparent frontier labs. The company employs roughly 200 researchers, mostly recent graduates from top Chinese universities, and has stated it is not currently raising external venture capital.

Visit DeepSeek →

Architecture

Sparse Mixture-of-Experts Transformer (reasoning model trained with pure RL)

DeepSeek R1 was released on 20 January 2025 with weights under MIT license, a publicly downloadable technical report, and pricing roughly 1/30th of OpenAI o1 at the API. Architecturally R1 inherits from DeepSeek V3: a Sparse Mixture-of-Experts Transformer with 671B total parameters and 37B active per token, using Multi-head Latent Attention (MLA) and DeepSeekMoE routing. The breakthrough is the training recipe. DeepSeek R1-Zero was trained from the V3 base via pure large-scale reinforcement learning (GRPO) on verifiable math, code and reasoning tasks, with no supervised fine-tuning at all - the model spontaneously developed long chains-of-thought, reflection and self-verification behaviours, the so-called 'aha moment' phenomenon. R1-Zero suffered from readability issues, so DeepSeek R1 added a cold-start SFT step using a small set of curated long-CoT examples, followed by a multi-stage pipeline alternating between RL, rejection sampling and SFT. The team also distilled the reasoning behaviour into smaller dense models (DeepSeek-R1-Distill-Qwen-1.5B/7B/14B/32B and DeepSeek-R1-Distill-Llama-8B/70B), demonstrating that reasoning capability can be transferred to compact models. R1 supports a 128K context window and is widely deployed via vLLM, SGLang, Ollama and HuggingFace Inference.

Parameters: 671B total, 37B active per token
Context: 128K tokens

What it can do

Open weights under MIT license, weights freely downloadable
671B-parameter MoE with 37B active per token
Long chain-of-thought reasoning learned via pure RL (GRPO)
Matches or beats OpenAI o1 on AIME, MATH-500 and Codeforces benchmarks
128K context window
Distilled reasoning variants from 1.5B to 70B (Qwen and Llama bases)
Pricing approximately 1/30th of OpenAI o1 at the DeepSeek API
Strong code generation on LiveCodeBench and HumanEval
Self-verification and reflection emerge from training
Compatible with vLLM, SGLang, Ollama, llama.cpp and HuggingFace
Best for: cost-sensitive reasoning workloads, on-prem deployment, research, reproducible chains-of-thought.

Training & License

Built on the DeepSeek V3 base (14.8T high-quality tokens of multilingual web text, code, books and scientific papers). R1 post-training uses cold-start SFT on a small curated long-CoT dataset, followed by multi-stage GRPO reinforcement learning against verifiable rewards on math, code and reasoning tasks plus rule-based language-consistency rewards.

License: MIT license (model weights, code, distilled variants). Commercial use permitted with no usage restrictions.

Known limitations

Sensitive topics (Taiwan, Tiananmen, certain political content) are filtered or refused
Very long CoT can be slow at inference time
Memory footprint of full 671B MoE is large (~1.3TB FP8)
Limited multimodal input (text-only base model)
Sandbox safety evaluations not published in detail

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8