DeepSeek V4 Flash

New
DeepSeek
Text & Chat

Efficiency-optimized variant of DeepSeek V4. 284B MoE / 13B active, 1M context, ultra-low pricing for high-throughput workloads.

Try DeepSeek V4 Flash now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DR·Last updated May 16, 2026

DeepSeek V4 Flash is text & chat AI model from DeepSeek, priced at €0.000 per 1M input tokens with a 1.0M tokens context window.

About this model

DeepSeek-V4-Flash is the cost-efficient sibling of V4-Pro, released April 2026 as part of the V4 Preview. 284B total / 13B active MoE parameters with the same 1M-token context window. Designed for high-throughput agentic loops, RAG and batch tasks where latency and cost matter more than raw capability. Recommended for production agents, classification at scale, large-scale data extraction.
Try DeepSeek V4 Flash

0.7

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate DeepSeek V4 Flash into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("deepseek-v4-flash", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("deepseek-v4-flash", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("deepseek-v4-flash", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
1,048,575 tokens
Max output
384,000 tokens
Developer
DeepSeek
Category
Text & Chat
Supported Formats
text
Tags
deepseek
open-weights
moe
cost-efficient
long-context
1m-context

Deep dive — DeepSeek AI's DeepSeek V4 Flash

About DeepSeek AI
Founded 2023 · Hangzhou, China

DeepSeek AI is a Chinese AI research lab founded in 2023 by Liang Wenfeng, founder of the High-Flyer quantitative hedge fund. The lab is funded primarily by High-Flyer's profits. Its mission is open frontier AI, with all flagship models released with open weights. Major releases include DeepSeek LLM (2023), DeepSeek-V2 (May 2024), DeepSeek-V3 (December 2024), DeepSeek-R1 (January 2026), DeepSeek V3.1 (early 2026) and the DeepSeek V4 family (April 24, 2026), comprising V4-Pro and V4-Flash. DeepSeek is credited with popularising large-scale Reinforcement Learning from Verifiable Rewards and consistently tops open-weights leaderboards.

Visit DeepSeek AI
Architecture
Sparse Mixture-of-Experts Transformer (efficiency-optimized open-weights)

DeepSeek-V4-Flash was released April 24, 2026 as the efficiency-optimized sibling of V4-Pro. It is a Sparse MoE Transformer with 284B total parameters and 13B activated per token, retaining the full 1M-token native context window and 384K-token max output of the Pro variant at significantly lower inference cost. The model uses the same DeepSeek architectural stack: Multi-head Latent Attention (MLA), DeepSeekMoE with fine-grained expert specialization and shared experts, and FP8 mixed-precision training. Post-training combined supervised fine-tuning, RLVR on math/code/tool-use trajectories, and heavy distillation from the V4-Pro teacher model. V4 Flash is published with open weights under a permissive license and is designed for production-scale RAG, agentic loops and high-throughput workloads. At $0.112 input / $0.224 output per million tokens it undercuts every Western frontier model by an order of magnitude.

Parameters
284B total / 13B active per token
Context
1.0M tokens
What it can do
  • 1M token native context window with 384K max output
  • 284B MoE / 13B active parameters
  • Ultra-low pricing ($0.112 / $0.224 per million tokens)
  • Distilled from DeepSeek V4-Pro teacher model
  • FP8-trained for compute efficiency
  • Multi-head Latent Attention for memory-efficient long context
  • Function calling and structured JSON output
  • Strong on math, STEM and coding for its size
  • Available via DeepSeek API, OpenRouter, Together and self-hosted with vLLM/SGLang
  • Open weights under a permissive license
  • Best for: production agents, RAG pipelines, high-throughput data extraction, on-premise inference under tight cost budgets.
Training & License

Pretrained on the same multi-trillion-token mixture as V4-Pro. Post-training combines supervised fine-tuning, RLVR and distillation from the V4-Pro teacher model. Knowledge cutoff approximately early 2026.

License: Open weights under a permissive license that allows commercial use. Hosted API access via deepseek.com.

Known limitations
  • Below V4-Pro on the hardest reasoning and coding benchmarks
  • Light built-in safety alignment relative to Western frontier models
  • No native vision or audio input (text-only)
  • Older deepseek-chat / deepseek-reasoner endpoints will be deprecated July 24, 2026
  • Some Chinese-language safety constraints apply

Frequently asked questions

Start using DeepSeek V4 Flash today

Get started with free credits. No credit card required. Access DeepSeek V4 Flash and 100+ other models through a single API.