DeepSeek V3

DeepSeek
Text & Chat

Powerful open-weight model from DeepSeek. Strong at coding, math, and Chinese/English tasks.

Try DeepSeek V3 now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DR·Last updated March 4, 2026

DeepSeek V3 is text & chat AI model from DeepSeek, priced at €1.40 per 1M input tokens with a 64K tokens context window.

Try DeepSeek V3

0.7

Sign in to generate — 50 free credits on sign-up

Examples

See what DeepSeek V3 can generate

Algorithm Explanation

Explain how a hash map works internally, including collision handling

A hash map stores key-value pairs by computing a hash of the key to determine which 'bucket' (array index) the pair belongs in. When you insert a key, the hash function converts it to an integer, which is then mapped to an index via modulo with the array size. Collisions—when two keys map to the same index—are handled through chaining (storing a linked list at each bucket) or open addressing (probing for the next empty slot). Load factor triggers resizing: when the ratio of entries to buckets exceeds a threshold (typically 0.75), the array doubles in size and all entries are rehashed.

Bilingual Task

Translate this Chinese proverb and explain its meaning: 三人行,必有我师焉

Translation: 'Among three people walking together, there must be one who can be my teacher.' This proverb comes from Confucius' Analerta (Lunyu) and conveys that everyone you meet has something valuable to teach you. It encourages humility and lifelong learning—no matter how knowledgeable you are, others possess skills, perspectives, or experiences worth learning from. It's a cornerstone of Chinese educational philosophy.

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate DeepSeek V3 into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("deepseek-v3", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("deepseek-v3", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("deepseek-v3", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
64,000 tokens
Max output
8,192 tokens
Avg. latency
2.0s
Developer
DeepSeek
Category
Text & Chat
Tags
affordable
coding

Deep dive — DeepSeek's DeepSeek V3

About DeepSeek
Founded 2023 · Hangzhou, China

DeepSeek AI was founded in July 2023 in Hangzhou by Liang Wenfeng, who is also co-founder of the High-Flyer quantitative hedge fund. High-Flyer's GPU cluster (thousands of NVIDIA A100/H800 cards stockpiled before US export controls tightened) bootstrapped DeepSeek's training capacity. The lab gained global attention for highly efficient training recipes documented in transparent technical reports. Notable releases include DeepSeek Coder (Nov 2023), DeepSeek LLM 67B (Jan 2024), DeepSeekMath with GRPO reinforcement learning (Feb 2024), DeepSeek V2 introducing Multi-head Latent Attention and DeepSeekMoE (May 2024), DeepSeek V3 in December 2024 and DeepSeek R1 in January 2025. The V3/R1 releases triggered global discussion when DeepSeek reported that V3 was trained for approximately $5.6M of GPU-hour cost on 2.788M H800 GPU-hours, ten or more times cheaper than comparable Western frontier runs. All models are released under MIT license. The company is privately funded by High-Flyer rather than venture capital and employs roughly 200 researchers, mostly recent PhDs from Chinese universities.

Visit DeepSeek →
Architecture
Sparse Mixture-of-Experts Transformer (DeepSeekMoE + Multi-head Latent Attention)

DeepSeek V3 was released on 26 December 2024 with weights under MIT license. It is a Sparse Mixture-of-Experts Transformer with 671 billion total parameters and 37 billion active per token. The architecture combines DeepSeekMoE (fine-grained experts with shared experts for load balancing without auxiliary loss) and Multi-head Latent Attention (MLA), a low-rank KV-cache compression technique introduced in V2 that drastically reduces memory bandwidth during inference. V3 was pretrained on 14.8 trillion high-quality tokens spanning multilingual web text, code, books and scientific papers, using a total compute budget of 2.788 million H800 GPU-hours, which DeepSeek reports as approximately $5.576M at $2/GPU-hour. The training run introduced multi-token prediction (MTP) as an auxiliary objective and FP8 mixed-precision training with custom CUDA kernels for the MoE routing. Post-training included supervised fine-tuning on 1.5M curated examples plus a reinforcement learning stage using GRPO. V3 achieves performance competitive with GPT-4o and Claude 3.5 Sonnet on most text and code benchmarks while costing approximately 1/10th to operate, making it the highest-performing open-weight non-reasoning model at launch.

Parameters
671B total, 37B active per token
Context
128K tokens
What it can do
  • 671B-parameter MoE with 37B active per token
  • 128K context window
  • Pretrained on 14.8T tokens for ~$5.6M of compute
  • DeepSeekMoE routing without auxiliary loss
  • Multi-head Latent Attention for memory-efficient inference
  • FP8 mixed-precision training with custom kernels
  • Multi-token prediction (MTP) auxiliary objective
  • Strong code generation on HumanEval, MBPP, LiveCodeBench
  • Open weights under MIT license
  • Compatible with vLLM, SGLang, llama.cpp, HuggingFace
  • Best for: cost-efficient open-weight chat, coding, on-prem enterprise, research on MoE.
Training & License

Pretrained on 14.8 trillion tokens of curated multilingual web text, code repositories, books and scientific papers. Knowledge cutoff is approximately mid-2024. Post-training uses 1.5M-example SFT followed by GRPO reinforcement learning on preference and verifiable-reward data.

License: MIT license for model weights, code and tokenizer. Commercial use permitted without restrictions.

Known limitations
  • Refuses or evades certain political topics (Tiananmen, Taiwan)
  • Large memory footprint (~1.3TB FP8 weights) limits self-hosting to multi-GPU clusters
  • Text-only base; no native vision input
  • Knowledge cutoff mid-2024
  • Less battle-tested in production than GPT-4o/Claude

Frequently asked questions

Start using DeepSeek V3 today

Get started with free credits. No credit card required. Access DeepSeek V3 and 100+ other models through a single API.