How much does DeepSeek V4 Flash cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of DeepSeek V4 Flash?

DeepSeek V4 Flash supports a 1.0M tokens context window — enough for entire codebases or research papers in one prompt.

How fast is DeepSeek V4 Flash?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is DeepSeek V4 Flash better than Bio_ClinicalBERT?

It depends on your use case. DeepSeek V4 Flash (DeepSeek) and Bio_ClinicalBERT (huggingface) are both strong choices in text & chat. Compare them side-by-side at /compare/deepseek-v4-flash-vs-bio-clinicalbert.

DeepSeek V4 Flash

Name: DeepSeek V4 Flash
Brand: DeepSeek
SKU: deepseek-v4-flash
Availability: InStock

New

DeepSeek

Text & Chat

Efficiency-optimized variant of DeepSeek V4. 284B MoE / 13B active, 1M context, ultra-low pricing for high-throughput workloads.

Try DeepSeek V4 Flash now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

DeepSeek V4 Flash is text & chat AI model from DeepSeek, priced at €0.000 per 1M input tokens with a 1.0M tokens context window.

About this model

DeepSeek-V4-Flash is the cost-efficient sibling of V4-Pro, released April 2026 as part of the V4 Preview. 284B total / 13B active MoE parameters with the same 1M-token context window. Designed for high-throughput agentic loops, RAG and batch tasks where latency and cost matter more than raw capability. Recommended for production agents, classification at scale, large-scale data extraction.

Try DeepSeek V4 Flash

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate DeepSeek V4 Flash into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("deepseek-v4-flash", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("deepseek-v4-flash", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("deepseek-v4-flash", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

1,048,575 tokens

Max output

384,000 tokens

Developer

DeepSeek

Deep dive — DeepSeek AI's DeepSeek V4 Flash

About DeepSeek AI

Founded 2023 · Hangzhou, China

DeepSeek AI is a Chinese AI research lab founded in 2023 by Liang Wenfeng, founder of the High-Flyer quantitative hedge fund. The lab is funded primarily by High-Flyer's profits. Its mission is open frontier AI, with all flagship models released with open weights. Major releases include DeepSeek LLM (2023), DeepSeek-V2 (May 2024), DeepSeek-V3 (December 2024), DeepSeek-R1 (January 2026), DeepSeek V3.1 (early 2026) and the DeepSeek V4 family (April 24, 2026), comprising V4-Pro and V4-Flash. DeepSeek is credited with popularising large-scale Reinforcement Learning from Verifiable Rewards and consistently tops open-weights leaderboards.

Visit DeepSeek AI →

Architecture

Sparse Mixture-of-Experts Transformer (efficiency-optimized open-weights)

DeepSeek-V4-Flash was released April 24, 2026 as the efficiency-optimized sibling of V4-Pro. It is a Sparse MoE Transformer with 284B total parameters and 13B activated per token, retaining the full 1M-token native context window and 384K-token max output of the Pro variant at significantly lower inference cost. The model uses the same DeepSeek architectural stack: Multi-head Latent Attention (MLA), DeepSeekMoE with fine-grained expert specialization and shared experts, and FP8 mixed-precision training. Post-training combined supervised fine-tuning, RLVR on math/code/tool-use trajectories, and heavy distillation from the V4-Pro teacher model. V4 Flash is published with open weights under a permissive license and is designed for production-scale RAG, agentic loops and high-throughput workloads. At $0.112 input / $0.224 output per million tokens it undercuts every Western frontier model by an order of magnitude.

Parameters: 284B total / 13B active per token
Context: 1.0M tokens

What it can do

1M token native context window with 384K max output
284B MoE / 13B active parameters
Ultra-low pricing ($0.112 / $0.224 per million tokens)
Distilled from DeepSeek V4-Pro teacher model
FP8-trained for compute efficiency
Multi-head Latent Attention for memory-efficient long context
Function calling and structured JSON output
Strong on math, STEM and coding for its size
Available via DeepSeek API, OpenRouter, Together and self-hosted with vLLM/SGLang
Open weights under a permissive license
Best for: production agents, RAG pipelines, high-throughput data extraction, on-premise inference under tight cost budgets.

Training & License

Pretrained on the same multi-trillion-token mixture as V4-Pro. Post-training combines supervised fine-tuning, RLVR and distillation from the V4-Pro teacher model. Knowledge cutoff approximately early 2026.

License: Open weights under a permissive license that allows commercial use. Hosted API access via deepseek.com.

Known limitations

Below V4-Pro on the hardest reasoning and coding benchmarks
Light built-in safety alignment relative to Western frontier models
No native vision or audio input (text-only)
Older deepseek-chat / deepseek-reasoner endpoints will be deprecated July 24, 2026
Some Chinese-language safety constraints apply

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Bio_ClinicalBERT

huggingface

The original Bio_ClinicalBERT from Alsentzer et al., a BERT model initialized from BioBERT and further pretrained on all MIMIC-III clinical notes. Served as a fill-mask endpoint it predicts masked tokens in clinical text and produces clinical embeddings. It is the standard encoder backbone behind many downstream clinical NLP fine-tunes.

€1.00

Biomedical NER (all entities)

huggingface

Token-classification model from d4data that tags 84 biomedical entity types in clinical and medical text, including disease, sign, symptom, medication, dosage, lab value, body part and procedure. Trained on the Maccrobat clinical case corpus on a DistilBERT base, so it runs cheaply for high-volume tagging.

€1.00

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Opus 4.8