DeepSeek V4 Flash
Efficiency-optimized variant of DeepSeek V4. 284B MoE / 13B active, 1M context, ultra-low pricing for high-throughput workloads.
DeepSeek V4 Flash is text & chat AI model from DeepSeek, priced at β¬0.000 per 1M input tokens with a 1.0M tokens context window.
About this model
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate DeepSeek V4 Flash into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("deepseek-v4-flash", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("deepseek-v4-flash", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("deepseek-v4-flash", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β DeepSeek AI's DeepSeek V4 Flash
DeepSeek AI is a Chinese AI research lab founded in 2023 by Liang Wenfeng, founder of the High-Flyer quantitative hedge fund. The lab is funded primarily by High-Flyer's profits. Its mission is open frontier AI, with all flagship models released with open weights. Major releases include DeepSeek LLM (2023), DeepSeek-V2 (May 2024), DeepSeek-V3 (December 2024), DeepSeek-R1 (January 2026), DeepSeek V3.1 (early 2026) and the DeepSeek V4 family (April 24, 2026), comprising V4-Pro and V4-Flash. DeepSeek is credited with popularising large-scale Reinforcement Learning from Verifiable Rewards and consistently tops open-weights leaderboards.
Visit DeepSeek AI βDeepSeek-V4-Flash was released April 24, 2026 as the efficiency-optimized sibling of V4-Pro. It is a Sparse MoE Transformer with 284B total parameters and 13B activated per token, retaining the full 1M-token native context window and 384K-token max output of the Pro variant at significantly lower inference cost. The model uses the same DeepSeek architectural stack: Multi-head Latent Attention (MLA), DeepSeekMoE with fine-grained expert specialization and shared experts, and FP8 mixed-precision training. Post-training combined supervised fine-tuning, RLVR on math/code/tool-use trajectories, and heavy distillation from the V4-Pro teacher model. V4 Flash is published with open weights under a permissive license and is designed for production-scale RAG, agentic loops and high-throughput workloads. At $0.112 input / $0.224 output per million tokens it undercuts every Western frontier model by an order of magnitude.
- Parameters
- 284B total / 13B active per token
- Context
- 1.0M tokens
- 1M token native context window with 384K max output
- 284B MoE / 13B active parameters
- Ultra-low pricing ($0.112 / $0.224 per million tokens)
- Distilled from DeepSeek V4-Pro teacher model
- FP8-trained for compute efficiency
- Multi-head Latent Attention for memory-efficient long context
- Function calling and structured JSON output
- Strong on math, STEM and coding for its size
- Available via DeepSeek API, OpenRouter, Together and self-hosted with vLLM/SGLang
- Open weights under a permissive license
- Best for: production agents, RAG pipelines, high-throughput data extraction, on-premise inference under tight cost budgets.
Pretrained on the same multi-trillion-token mixture as V4-Pro. Post-training combines supervised fine-tuning, RLVR and distillation from the V4-Pro teacher model. Knowledge cutoff approximately early 2026.
License: Open weights under a permissive license that allows commercial use. Hosted API access via deepseek.com.
Known limitations
- Below V4-Pro on the hardest reasoning and coding benchmarks
- Light built-in safety alignment relative to Western frontier models
- No native vision or audio input (text-only)
- Older deepseek-chat / deepseek-reasoner endpoints will be deprecated July 24, 2026
- Some Chinese-language safety constraints apply
Frequently asked questions
Related Models
View all Text & ChatClaude Opus 4
Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.
Claude Sonnet 4
Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.
DeepSeek V3.1
DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.
DeepSeek V4 Pro
DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.
Start using DeepSeek V4 Flash today
Get started with free credits. No credit card required. Access DeepSeek V4 Flash and 100+ other models through a single API.