How much does DeepSeek V3.1 cost via Railwail?

Input: €0.270 per 1M tokens. Output: €1.10 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of DeepSeek V3.1?

DeepSeek V3.1 supports a 131.1K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is DeepSeek V3.1?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is DeepSeek V3.1 better than Bio_ClinicalBERT?

It depends on your use case. DeepSeek V3.1 (DeepSeek) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/deepseek-v3-1-vs-bio-clinicalbert.

DeepSeek V3.1

Name: DeepSeek V3.1
Brand: DeepSeek
SKU: deepseek-v3-1
Price: 0.00027 EUR
Availability: InStock

Popular

DeepSeek

Text & Chat

DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.

Try DeepSeek V3.1 now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

DeepSeek V3.1 is text & chat AI model from DeepSeek, priced at €0.270 per 1M input tokens with a 131.1K tokens context window.

Try DeepSeek V3.1

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate DeepSeek V3.1 into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("deepseek-v3-1", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("deepseek-v3-1", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("deepseek-v3-1", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

131,072 tokens

Max output

8,192 tokens

Developer

DeepSeek

Deep dive — DeepSeek's DeepSeek V3.1

About DeepSeek

Founded 2023 · Hangzhou, China

DeepSeek AI was founded in July 2023 in Hangzhou by Liang Wenfeng, also co-founder of the High-Flyer quantitative hedge fund. The fund's pre-export-control GPU cluster financed DeepSeek's training runs. The lab is known for transparent technical reports and an aggressive open-weights strategy under MIT license. Releases include DeepSeek Coder (Nov 2023), DeepSeek LLM 67B (Jan 2024), DeepSeekMath with GRPO (Feb 2024), DeepSeek V2 introducing Multi-head Latent Attention (May 2024), DeepSeek V3 in December 2024 trained for ~$5.6M of GPU-hours, DeepSeek R1 in January 2025 and DeepSeek V3.1 in 2025 as an incremental update consolidating the base model and the R1 reasoning capabilities into a unified hybrid model. The company has roughly 200 researchers and is privately backed by High-Flyer rather than venture capital. Its V3/R1 release triggered a global re-evaluation of frontier-AI training economics and a notable stock-market move in late January 2025.

Visit DeepSeek →

Architecture

Sparse Mixture-of-Experts Transformer (hybrid base + thinking modes)

DeepSeek V3.1 is a 2025 update of the V3 base that unifies chat (non-thinking) and reasoning (thinking) modes into a single hybrid checkpoint. It retains the V3 architecture - a Sparse MoE Transformer with 671B total and 37B active parameters using DeepSeekMoE routing and Multi-head Latent Attention - but expands the pretraining corpus and updates the post-training recipe. According to DeepSeek's release notes, V3.1 was continually pretrained on ~840B additional tokens of long-context data, extending effective context handling and improving long-document recall within the 128K window. Post-training merged the V3 chat data with R1-style long-CoT reasoning data plus tool-use and agentic trajectories. V3.1 exposes two operating modes selected via the chat template: 'non-thinking' (V3-style fast responses) and 'thinking' (R1-style chain-of-thought before the answer), letting developers choose per request. Tool use and function calling are first-class and improved over both V3 and R1. The model also includes targeted strengthening on coding, agent benchmarks (SWE-bench, Terminal-Bench), and search-augmented reasoning. Weights are released under MIT license and the official DeepSeek API hosts both V3.1 and V3.1-Terminus checkpoints.

Parameters: 671B total, 37B active per token (extended for V3.1)
Context: 128K tokens

What it can do

Hybrid model: switchable thinking / non-thinking modes in one checkpoint
671B-parameter MoE with 37B active per token
128K context window, retrained on ~840B additional long-context tokens
Strong agentic and tool-use performance on SWE-bench Verified and Terminal-Bench
Function calling and parallel tool calls
Long-CoT reasoning inherited from R1
Open weights under MIT license
DeepSeek API approximately 1/20th the cost of GPT-4o-class models
Compatible with vLLM, SGLang, llama.cpp, HuggingFace
Improved code editing and diff-format generation
Best for: budget-conscious agentic workloads, coding, hybrid reasoning, on-prem enterprise.

Training & License

Built on V3's 14.8T-token base, then continually pretrained on roughly 840B additional tokens biased toward long-context documents and code. Post-training combines V3 chat data with R1-style long-CoT and agentic tool-use trajectories.

License: MIT license for weights, code and tokenizer; commercial use permitted.

Known limitations

Sensitive Chinese political topics filtered
Large memory footprint requires multi-GPU inference
Text-only inputs (no native vision)
Knowledge cutoff approximately late 2024
Hybrid mode switching adds prompt-template complexity

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8