DeepSeek V3
Powerful open-weight model from DeepSeek. Strong at coding, math, and Chinese/English tasks.
DeepSeek V3 is text & chat AI model from DeepSeek, priced at €1.40 per 1M input tokens with a 64K tokens context window.
0.7
Examples
See what DeepSeek V3 can generate
Explain how a hash map works internally, including collision handling
A hash map stores key-value pairs by computing a hash of the key to determine which 'bucket' (array index) the pair belongs in. When you insert a key, the hash function converts it to an integer, which is then mapped to an index via modulo with the array size. Collisions—when two keys map to the same index—are handled through chaining (storing a linked list at each bucket) or open addressing (probing for the next empty slot). Load factor triggers resizing: when the ratio of entries to buckets exceeds a threshold (typically 0.75), the array doubles in size and all entries are rehashed.
Translate this Chinese proverb and explain its meaning: 三人行,必有我师焉
Translation: 'Among three people walking together, there must be one who can be my teacher.' This proverb comes from Confucius' Analerta (Lunyu) and conveys that everyone you meet has something valuable to teach you. It encourages humility and lifelong learning—no matter how knowledgeable you are, others possess skills, perspectives, or experiences worth learning from. It's a cornerstone of Chinese educational philosophy.
Pricing
API Integration
Use our OpenAI-compatible API to integrate DeepSeek V3 into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("deepseek-v3", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("deepseek-v3", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("deepseek-v3", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — DeepSeek's DeepSeek V3
DeepSeek AI was founded in July 2023 in Hangzhou by Liang Wenfeng, who is also co-founder of the High-Flyer quantitative hedge fund. High-Flyer's GPU cluster (thousands of NVIDIA A100/H800 cards stockpiled before US export controls tightened) bootstrapped DeepSeek's training capacity. The lab gained global attention for highly efficient training recipes documented in transparent technical reports. Notable releases include DeepSeek Coder (Nov 2023), DeepSeek LLM 67B (Jan 2024), DeepSeekMath with GRPO reinforcement learning (Feb 2024), DeepSeek V2 introducing Multi-head Latent Attention and DeepSeekMoE (May 2024), DeepSeek V3 in December 2024 and DeepSeek R1 in January 2025. The V3/R1 releases triggered global discussion when DeepSeek reported that V3 was trained for approximately $5.6M of GPU-hour cost on 2.788M H800 GPU-hours, ten or more times cheaper than comparable Western frontier runs. All models are released under MIT license. The company is privately funded by High-Flyer rather than venture capital and employs roughly 200 researchers, mostly recent PhDs from Chinese universities.
Visit DeepSeek →DeepSeek V3 was released on 26 December 2024 with weights under MIT license. It is a Sparse Mixture-of-Experts Transformer with 671 billion total parameters and 37 billion active per token. The architecture combines DeepSeekMoE (fine-grained experts with shared experts for load balancing without auxiliary loss) and Multi-head Latent Attention (MLA), a low-rank KV-cache compression technique introduced in V2 that drastically reduces memory bandwidth during inference. V3 was pretrained on 14.8 trillion high-quality tokens spanning multilingual web text, code, books and scientific papers, using a total compute budget of 2.788 million H800 GPU-hours, which DeepSeek reports as approximately $5.576M at $2/GPU-hour. The training run introduced multi-token prediction (MTP) as an auxiliary objective and FP8 mixed-precision training with custom CUDA kernels for the MoE routing. Post-training included supervised fine-tuning on 1.5M curated examples plus a reinforcement learning stage using GRPO. V3 achieves performance competitive with GPT-4o and Claude 3.5 Sonnet on most text and code benchmarks while costing approximately 1/10th to operate, making it the highest-performing open-weight non-reasoning model at launch.
- Parameters
- 671B total, 37B active per token
- Context
- 128K tokens
- 671B-parameter MoE with 37B active per token
- 128K context window
- Pretrained on 14.8T tokens for ~$5.6M of compute
- DeepSeekMoE routing without auxiliary loss
- Multi-head Latent Attention for memory-efficient inference
- FP8 mixed-precision training with custom kernels
- Multi-token prediction (MTP) auxiliary objective
- Strong code generation on HumanEval, MBPP, LiveCodeBench
- Open weights under MIT license
- Compatible with vLLM, SGLang, llama.cpp, HuggingFace
- Best for: cost-efficient open-weight chat, coding, on-prem enterprise, research on MoE.
Pretrained on 14.8 trillion tokens of curated multilingual web text, code repositories, books and scientific papers. Knowledge cutoff is approximately mid-2024. Post-training uses 1.5M-example SFT followed by GRPO reinforcement learning on preference and verifiable-reward data.
License: MIT license for model weights, code and tokenizer. Commercial use permitted without restrictions.
Known limitations
- Refuses or evades certain political topics (Tiananmen, Taiwan)
- Large memory footprint (~1.3TB FP8 weights) limits self-hosting to multi-GPU clusters
- Text-only base; no native vision input
- Knowledge cutoff mid-2024
- Less battle-tested in production than GPT-4o/Claude
Frequently asked questions
Related Models
View all Text & ChatClaude Opus 4
Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.
Claude Sonnet 4
Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.
DeepSeek V3.1
DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.
DeepSeek V4 Pro
DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.
Start using DeepSeek V3 today
Get started with free credits. No credit card required. Access DeepSeek V3 and 100+ other models through a single API.