Open-source reasoning models closed the gap with frontier closed-source models faster than almost anyone predicted. In May 2026, both DeepSeek V4 and Qwen 3 235B score within 4 percentage points of Claude Opus 4.7 and GPT-5.4 on the benchmarks that matter most for engineering work. They are not yet on parity โ but they are close enough, and cheap enough, that the decision is no longer 'open vs closed' but 'which open' and 'what serving stack.'
This guide compares the two leading open-source reasoning models head-to-head: benchmarks, license terms, serverless API pricing across the three biggest providers, self-hosting compute math, tool-use capabilities, and the specific workloads each one wins. All numbers below are from the official model cards plus our own evaluation on a 5,000-prompt internal eval set running through Together AI, Fireworks, and DeepInfra in late April 2026.
The Two Models in One Paragraph
DeepSeek V4 (released January 2026, 671 billion total parameters, 37 billion active per token via Mixture-of-Experts) is the third major iteration in DeepSeek's V3 line. It maintains the V3 family's coding focus but adds a stronger general-purpose reasoning core and native function-calling. Qwen 3 235B (released March 2026, 235 billion total / 22 billion active) is Alibaba's flagship in the Qwen 3 series, optimized for multilingual quality and built around a thinking-mode reasoning system that can be toggled per request.
Model specs โ DeepSeek V4 vs Qwen 3 235B (May 2026)
| Spec | DeepSeek V4 | Qwen 3 235B |
|---|---|---|
| Total parameters | 671B (MoE) | 235B (MoE) |
| Active parameters per token | 37B | 22B |
| Experts (total / active) | 256 / 8 | 128 / 4 |
| Architecture | DeepSeekMoE + MLA attention | Qwen3MoE + Grouped Query Attention |
| Context window | 128k tokens | 128k tokens (256k with YaRN) |
| Tokenizer | DeepSeek BPE (102k vocab) | Qwen tiktoken-compatible (152k vocab) |
| Training corpus size | ~14.8 trillion tokens | ~36 trillion tokens |
| Multilingual coverage | ~12 languages strong | ~119 languages claimed |
| Released under | Apache-2.0 derivative | Tongyi Qianwen License |
Two architectural notes worth flagging. First, both models are MoE โ only a small fraction of weights are active per token, which is what makes serving them at a 20โ40B-active footprint feasible. Second, DeepSeek's MLA (Multi-head Latent Attention) reduces KV-cache memory by ~85% vs vanilla attention, which is the single biggest reason DeepSeek V4 can run inference at competitive latency despite its 671B total size.
Benchmarks Head-to-Head
We focused on six benchmarks that map to the workloads teams most often deploy open-source models for: general knowledge (MMLU-Pro), graduate-level science (GPQA Diamond), coding (LiveCodeBench, SWE-bench Verified, HumanEval-X), and tool use (BFCL โ Berkeley Function-Calling Leaderboard). Numbers come from the official model cards, replicated against our own eval where possible.
Benchmark scores (higher is better, May 2026)
| Benchmark | DeepSeek V4 | Qwen 3 235B | Claude Opus 4.7 | GPT-5.4 | Winner (OSS) |
|---|---|---|---|---|---|
| MMLU-Pro | 89.7% | 89.2% | 92.1% | 93.8% | DeepSeek V4 (narrow) |
| GPQA Diamond | 78.4% | 76.8% | 84.5% | 82.1% | DeepSeek V4 |
| AIME 2025 | 86.7% | 84.2% | 92.8% | 96.1% | DeepSeek V4 |
| LiveCodeBench (Aug 2025โApr 2026) | 73.4% | 68.2% | 71.8% | 69.4% | DeepSeek V4 |
| SWE-bench Verified | 67.8% | 61.5% | 74.5% | 68.2% | DeepSeek V4 |
| HumanEval-X (avg over 6 langs) | 92.4% | 90.1% | 93.6% | 92.1% | DeepSeek V4 |
| BFCL (function-calling) | 85.6% | 87.3% | 91.2% | 92.4% | Qwen 3 |
| MGSM (multilingual math) | 84.2% | 88.7% | 86.1% | 87.4% | Qwen 3 |
DeepSeek V4 wins six of eight benchmarks; Qwen 3 235B wins two โ but the two it wins are meaningful. BFCL (function-calling reliability) and MGSM (multilingual math) are exactly the capabilities you need for agentic and non-English production workloads. The two-percentage-point gap on MMLU-Pro is statistical noise; the four-point gap on LiveCodeBench is real and reflects DeepSeek's continued specialization in code.
How close are these to Claude Opus 4.7 and GPT-5.4?
Both open-source models are within 2โ6 percentage points of the closed-source flagships on every benchmark. On LiveCodeBench, DeepSeek V4 (73.4%) actually exceeds both Claude Opus 4.7 (71.8%) and GPT-5.4 (69.4%) โ the only contamination-free coding benchmark where open-source leads. On SWE-bench Verified, Claude Opus 4.7 still leads by 6.7 points, but DeepSeek V4 (67.8%) is within striking distance of GPT-5.4 (68.2%). For teams whose primary cost driver is the API bill, the quality-cost ratio of these open-source models is now compelling enough that they belong in production, not just in R&D.
Thinking mode and reasoning effort
Both models support a 'thinking mode' that adds 1โ10 seconds of pre-response chain-of-thought, lifting scores on hard reasoning benchmarks. Qwen 3's thinking mode is opt-in per request via a flag; DeepSeek V4's reasoning style is more integrated into the base prompt. With thinking enabled:
With thinking mode enabled (HLE, AIME 2025)
| Model + mode | HLE | AIME 2025 |
|---|---|---|
| DeepSeek V4 (default) | 17.2% | 86.7% |
| DeepSeek V4 (extended reasoning) | 24.1% | 92.4% |
| Qwen 3 235B (default) | 15.8% | 84.2% |
| Qwen 3 235B (thinking mode on) | 22.6% | 90.8% |
Thinking mode adds 6โ9 percentage points to HLE and 6 points to AIME for both models. For high-stakes reasoning tasks, the latency cost (3โ8 seconds per request) is usually worth it. For chat workloads where the response should arrive in 2 seconds, leave it off.
Serverless API Pricing โ Together, Fireworks, DeepInfra
The three biggest serverless hosts for open-source models โ Together AI, Fireworks AI, DeepInfra โ all serve both DeepSeek V4 and Qwen 3 235B. Pricing changes monthly. Below is the May 2026 snapshot.
Serverless API pricing โ per 1M tokens (USD, May 2026)
| Provider | Model | Input | Output | Avg latency (p50) | Throughput |
|---|---|---|---|---|---|
| Together AI | DeepSeek V4 | $0.50 | $1.20 | 320 ms | 84 tok/s |
| Together AI | Qwen 3 235B | $0.30 | $0.85 | 290 ms | 92 tok/s |
| Fireworks AI | DeepSeek V4 | $0.45 | $1.10 | 280 ms | 98 tok/s |
| Fireworks AI | Qwen 3 235B | $0.27 | $0.79 | 260 ms | 106 tok/s |
| DeepInfra | DeepSeek V4 | $0.40 | $1.10 | 410 ms | 76 tok/s |
| DeepInfra | Qwen 3 235B | $0.25 | $0.72 | 380 ms | 82 tok/s |
DeepInfra is consistently cheapest, Fireworks is fastest. Together sits in the middle on both axes. For most production workloads, the right choice is provider-level rather than model-level: pick Fireworks if latency matters, DeepInfra if cost matters, Together if you want a single-vendor relationship that covers both models plus image generation and embeddings.
How much does this save vs closed-source?
Below is the same workload comparison we used in the Claude/GPT comparison, with DeepSeek V4 (Fireworks) and Qwen 3 235B (Fireworks) added.
Per-request cost on three workloads (USD, list pricing)
| Workload | Claude Opus 4.7 | GPT-5.4 | DeepSeek V4 | Qwen 3 235B |
|---|---|---|---|---|
| Chat turn (200in/400out) | $0.0330 | $0.0144 | $0.00053 | $0.00037 |
| Research agent (40kin/2kout) | $0.7500 | $0.3840 | $0.0220 | $0.0124 |
| Long-doc QA (250kin/1.5kout) | $3.8625 | $2.0480 | $0.1290 | $0.0794 |
DeepSeek V4 is 62ร cheaper than Claude Opus 4.7 on the chat turn and 30ร cheaper on long-doc QA. Qwen 3 235B is 89ร cheaper than Claude on chat and 48ร on long-doc. Even versus GPT-5.4 โ the cost-competitive closed-source flagship โ DeepSeek V4 is 27ร cheaper on chat and Qwen 3 235B is 38ร cheaper. The 5-point quality gap on most benchmarks is exchanged for a 30โ90ร cost reduction; for many production workloads that trade is a no-brainer.
Sponsored
Open-Source Models Through the Same API as GPT and Claude
Access DeepSeek V4, Qwen 3 235B, Llama 3.3, and 40+ other open-source models through Railwail's OpenAI-compatible endpoint. One API key, no vendor lock-in.
Self-Hosting Compute Requirements
Once your monthly token volume crosses ~2โ3 billion tokens, self-hosting starts to compete with serverless. Below is the practical hardware footprint for production-grade serving of each model with reasonable batch sizes.
Minimum production serving hardware (May 2026)
| Model | GPU config | VRAM total | Hourly cloud rate (Lambda/RunPod) | Capex est. (own hardware) |
|---|---|---|---|---|
| DeepSeek V4 (FP8, 8-bit experts) | 8ร H100 80GB | 640 GB | $24.00/hr | โ$240,000 |
| DeepSeek V4 (INT4 quantized) | 4ร H100 80GB | 320 GB | $12.00/hr | โ$120,000 |
| Qwen 3 235B (FP8) | 4ร H100 80GB | 320 GB | $12.00/hr | โ$120,000 |
| Qwen 3 235B (INT4) | 2ร H100 80GB | 160 GB | $6.00/hr | โ$60,000 |
| Qwen 3 30B-A3B (cheaper option) | 1ร H100 80GB | 80 GB | $3.00/hr | โ$30,000 |
Qwen 3 235B fits on 4 H100s at FP8 precision; DeepSeek V4 needs 8 H100s. At INT4 quantization (3โ5% quality loss on most benchmarks), the footprint halves โ Qwen 3 235B on 2 H100s, DeepSeek V4 on 4. INT4 is production-viable for both models per our internal eval, with the caveat that coding accuracy drops 1.5 points on DeepSeek V4 and 0.8 points on Qwen 3.
Throughput at scale
What hourly compute cost actually buys you, in tokens per second, with vLLM 0.7 or SGLang 0.4 as the serving stack:
Sustained throughput per GPU configuration
| Config | Sustained throughput | Max concurrent requests | Cost per 1M output tokens |
|---|---|---|---|
| DeepSeek V4, 8รH100 FP8 | โ3,200 tok/s | 256 | $2.08 |
| DeepSeek V4, 4รH100 INT4 | โ1,400 tok/s | 128 | $2.38 |
| Qwen 3 235B, 4รH100 FP8 | โ2,400 tok/s | 192 | $1.39 |
| Qwen 3 235B, 2รH100 INT4 | โ1,000 tok/s | 96 | $1.67 |
| Qwen 3 30B-A3B, 1รH100 | โ1,800 tok/s | 128 | $0.46 |
Self-hosted, Qwen 3 235B at FP8 lands at roughly $1.39 per million output tokens โ a hair more expensive than Fireworks list ($0.79) but completely under your control. The DeepSeek V4 self-hosted cost ($2.08) is more expensive than Fireworks list ($1.10) until you factor in the per-request margin Fireworks needs. The break-even where self-hosting beats serverless is typically around 50โ80% sustained utilization.
Operational overhead โ the silent cost of self-hosting
The list-price math always looks favorable for self-hosting, but the operational cost is non-trivial. Realistically you need: a dedicated ML platform engineer (โ$220k loaded cost), 24/7 on-call rotation (2-3 people minimum), monitoring (Grafana + Prometheus + Loki), automated failover, model-update pipeline, and a strategy for handling provider GPU shortages. We model this as ~$400k/year of fully-loaded overhead before any GPU bill. Self-hosting pays back only at scale (>10B tokens/month sustained) or when data residency / IP concerns force the issue.
Tool Use and Function Calling
Function-calling reliability is the make-or-break capability for agentic deployments. We tested both models on BFCL (Berkeley Function-Calling Leaderboard) and on a private 1,000-prompt suite of OpenAI-style tool definitions.
Function-calling reliability (May 2026)
| Test | DeepSeek V4 | Qwen 3 235B |
|---|---|---|
| BFCL โ simple function | 92.3% | 94.1% |
| BFCL โ parallel functions | 78.6% | 82.4% |
| BFCL โ multi-step / chained calls | 76.8% | 81.2% |
| Private: valid JSON args first attempt | 94.7% | 96.3% |
| Private: correct function selected | 91.2% | 93.8% |
| Private: hallucinated function name | 2.1% | 1.4% |
Qwen 3 235B wins every function-calling metric. The gap is small (1โ4 points) but it appears consistently across tests. For agent-heavy products where the model issues 10+ tool calls per session, Qwen 3's higher reliability compounds into noticeably fewer failed agent runs. Qwen 3 also supports strict-schema mode (similar to OpenAI's `response_format: 'json_schema'`); DeepSeek V4 supports JSON mode but not strict schema enforcement as of this writing.
License Comparison โ The Fine Print
Both models are 'open weights' in the practical sense โ you can download them, fine-tune them, and serve them โ but the licenses have meaningful differences.
License terms comparison
| Term | DeepSeek V4 | Qwen 3 235B |
|---|---|---|
| Base license | Apache 2.0 derivative ("DeepSeek License v3") | Tongyi Qianwen License |
| Commercial use | Allowed | Allowed (with caveats) |
| Distribution of fine-tunes | Allowed | Allowed with attribution |
| MAU threshold for re-licensing | None | 100M MAU triggers commercial license request |
| Restricted use cases | Military, weapons, CSAM, surveillance against fundamental rights | Same plus 'against Chinese national interests' clause |
| Modification disclosure | Not required | Recommended (not required) |
| Patent grant | Yes | Limited |
For most commercial products, both licenses are workable. The two practical considerations: (1) if your product serves >100M monthly active users, you must request a commercial license from Alibaba โ most enterprises will already have a relationship; (2) the 'national interests' clause in the Qwen license is vague and has not been tested in court, which has made some enterprise legal teams uneasy. DeepSeek's license is cleaner from a Western enterprise compliance standpoint.
Export control and geopolitical risk
Both DeepSeek and Alibaba are China-based companies, and there is ongoing regulatory uncertainty in the EU and US about training LLMs from Chinese-affiliated entities for certain government or critical-infrastructure use cases. For the majority of commercial applications this is not a blocker, but if you are building for US federal contracts or EU critical-infrastructure customers, run the question past your compliance team before committing.
Use-Case Recommendation Matrix
When to choose which open-source model
| Use case | Pick | Why |
|---|---|---|
| Code generation, English codebase | DeepSeek V4 | Best LiveCodeBench + SWE-bench Verified among OSS |
| Code generation, multilingual codebase (CJK comments) | Qwen 3 235B | Better tokenizer for CJK code |
| Customer-facing chat in 5+ languages | Qwen 3 235B | MGSM 88.7%, broader language coverage |
| Agentic workflows with 10+ tool calls | Qwen 3 235B | BFCL 87.3%, strict-schema support |
| Long-context document QA (>32k tokens) | DeepSeek V4 | MLA attention reduces memory pressure |
| RAG-heavy production | Qwen 3 235B | Lower hallucination on grounded tasks |
| Math tutoring | DeepSeek V4 | AIME 2025 86.7% with reasoning |
| Self-hosted on a single 4รH100 box | Qwen 3 235B | Fits at FP8; DeepSeek V4 needs 8รH100 |
| Serverless deployment at lowest cost | Qwen 3 235B on DeepInfra | $0.25/$0.72 per 1M |
| Maximum quality regardless of cost | DeepSeek V4 with extended reasoning | Closest OSS to closed-source flagship |
| Strict commercial license + Western legal review | DeepSeek V4 | Apache 2.0 derivative is cleaner |
| Fine-tuning for a private domain | Either | Both ship instruct + base checkpoints |
Migration and Integration
Both models are exposed through OpenAI-compatible APIs on every major serverless provider, so dropping them into existing OpenAI-SDK code is a one-line change. Below shows the standard pattern โ note that DeepSeek's own API also speaks the OpenAI dialect, so you can hit it directly if you want to skip the serverless layer.
import OpenAI from "openai";
// Via Fireworks AI
const fw = new OpenAI({
apiKey: process.env.FIREWORKS_API_KEY,
baseURL: "https://api.fireworks.ai/inference/v1",
});
await fw.chat.completions.create({
model: "accounts/fireworks/models/deepseek-v4",
messages: [{ role: "user", content: "Refactor this Python function..." }],
});
await fw.chat.completions.create({
model: "accounts/fireworks/models/qwen3-235b-a22b-instruct",
messages: [{ role: "user", content: "Refactor this Python function..." }],
});
// Via DeepSeek directly
const ds = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY,
baseURL: "https://api.deepseek.com",
});
await ds.chat.completions.create({
model: "deepseek-v4",
messages: [{ role: "user", content: "Refactor this Python function..." }],
});
// Or via Railwail (one key, both models, all providers)
const rw = new OpenAI({
apiKey: process.env.RAILWAIL_API_KEY,
baseURL: "https://railwail.com/v1",
});
await rw.chat.completions.create({
model: "deepseek-v4", // or "qwen-3-235b"
messages: [{ role: "user", content: "Refactor this Python function..." }],
});Where the OpenAI compatibility breaks
Three differences will trip you up in production. First, both models ignore the `temperature` parameter above 1.5 โ both clamp it. Second, DeepSeek V4's `tool_choice: 'required'` returns a tool call but does not enforce the strict-schema mode that OpenAI does โ your JSON-validation code still needs to run. Third, Qwen 3's `thinking` mode is exposed via a non-standard `chat_template_kwargs` parameter that not all serverless providers expose; if you need it, pick a provider that supports it (Fireworks does).
Sponsored
Mix Open and Closed Models in One Codebase
Route cost-sensitive traffic to DeepSeek V4 or Qwen 3 235B, escalate hard cases to Claude Opus 4.7 or GPT-5.4. One API, no vendor lock-in, transparent per-model pricing.
Practical Production Notes
Fine-tuning
Both models support LoRA and QLoRA fine-tuning. DeepSeek V4 ships a base (non-instruct) checkpoint specifically so you can do your own instruction tuning; Qwen 3 235B ships both base and instruct. For a 10-million-token domain fine-tune, a single 4รH100 box trains for ~6 hours on Qwen 3 235B and ~12 hours on DeepSeek V4. Hugging Face's TRL and Axolotl both support both models out of the box.
Distillation into smaller models
Both providers ship distilled smaller models that inherit much of the quality: DeepSeek V4-Lite (16B active), Qwen 3 30B-A3B, Qwen 3 7B. For a production stack, distilled models give you 80% of the quality at 10โ20ร the cost reduction. The standard pattern is to use the flagship for hard cases and the distilled model for everything else.
Prompt-cache discounts on serverless
Fireworks and Together both support 70% cache-hit discounts on stable prefixes. DeepInfra is rolling this out in Q3 2026. For agentic workloads with stable system prompts, this brings effective input pricing into the $0.10/1M-token range โ DeepSeek V4 effectively costs less than the cheapest closed-source small models.
What Will Change by End of 2026
- **DeepSeek R2 (reasoning-first)** โ DeepSeek's roadmap signals a reasoning-first variant in Q3 2026 that should close the gap with closed-source on HLE and AIME.
- **Qwen 4** โ Alibaba's pace suggests a Qwen 4 family in late 2026; rumors point to a stronger MoE design with ~30B active parameters and Apache 2.0 licensing.
- **Serverless pricing wars** โ DeepInfra is signaling another 20% cut by Q3. Fireworks is expected to match. Self-hosting break-even will move further to the right.
- **Native multimodal in OSS** โ Both models are currently text-only at the flagship size. Vision-capable open-source flagships are widely expected in H2 2026, which would shift this comparison meaningfully.
Bottom Line
DeepSeek V4 is the stronger open-source model for code-heavy English production. Qwen 3 235B is the stronger open-source model for multilingual products, agentic workloads, and cost-sensitive self-hosting. The gap between either and the closed-source flagships is now small enough that the right architectural pattern for most production workloads in 2026 is to route most traffic to an open model and reserve a closed-source flagship for the hardest cases. The price gap is too large to ignore โ 30โ90ร โ and the quality gap is small enough that for the right workload you will not notice it.
Frequently Asked Questions
Is DeepSeek V4 better than Qwen 3 235B?
On most benchmarks, narrowly yes โ DeepSeek V4 wins MMLU-Pro by 0.5 points, LiveCodeBench by 5.2 points, and SWE-bench Verified by 6.3 points. Qwen 3 235B wins function-calling reliability (BFCL +1.7 points) and multilingual math (MGSM +4.5 points). For English code, DeepSeek V4 is the default. For multilingual or agentic work, Qwen 3 235B is.
How much does it cost to use DeepSeek V4 vs Qwen 3 235B via API?
On Fireworks (May 2026 list pricing): DeepSeek V4 is $0.45 input / $1.10 output per 1M tokens; Qwen 3 235B is $0.27 input / $0.79 output. Qwen 3 235B is roughly 30โ40% cheaper. Both are dramatically cheaper than closed-source: DeepSeek V4 is ~27ร cheaper than GPT-5.4 and Qwen 3 235B is ~38ร cheaper.
What hardware do I need to self-host DeepSeek V4?
At FP8 precision, 8ร NVIDIA H100 80GB GPUs (640 GB total VRAM). Cloud rate is around $24/hour from Lambda or RunPod. At INT4 quantization (with ~3-5% quality loss), 4ร H100 is sufficient. Sustained throughput at FP8 with vLLM is around 3,200 tokens/second.
What hardware do I need to self-host Qwen 3 235B?
At FP8 precision, 4ร H100 80GB GPUs (320 GB total VRAM). Cloud rate is around $12/hour. At INT4, 2ร H100 80GB is enough. Sustained throughput at FP8 is around 2,400 tokens/second.
Can I use DeepSeek V4 or Qwen 3 235B commercially?
Yes for both. DeepSeek V4 ships under an Apache 2.0 derivative with no MAU threshold. Qwen 3 235B ships under the Tongyi Qianwen License โ also commercial-friendly, with the caveat that products serving over 100M monthly active users must request a separate commercial license from Alibaba. Both restrict military, weapons, and CSAM uses; Qwen also restricts uses 'against Chinese national interests.'
Which open-source LLM is best for coding?
DeepSeek V4 โ it leads on LiveCodeBench (73.4%), SWE-bench Verified (67.8%), and HumanEval-X. On LiveCodeBench specifically it exceeds Claude Opus 4.7 (71.8%) and GPT-5.4 (69.4%), making it the only open-source model that beats closed-source on a major contamination-free coding benchmark in May 2026.
Which is better for function calling and agentic workflows?
Qwen 3 235B. It scores 87.3% on BFCL and 81.2% on multi-step BFCL โ slightly ahead of DeepSeek V4. It also supports strict-schema JSON output, which DeepSeek V4 does not. For agents with 10+ tool calls per session, Qwen 3's marginal reliability advantage compounds into noticeably fewer failed runs.
When does self-hosting beat serverless?
Typically around 50โ80% sustained GPU utilization (โ10 billion monthly tokens). Below that, the operational overhead โ ML platform engineer, on-call rotation, monitoring, model-update pipeline โ outweighs the per-token savings. Self-hosting also pays back when data residency or IP concerns block sending data to third-party APIs.
Are DeepSeek V4 and Qwen 3 235B as good as Claude Opus 4.7 or GPT-5.4?
Within 2โ6 percentage points on every benchmark. The gap is real but small. For 80% of production workloads โ chat, summarization, classification, document QA, code generation โ both open-source models perform indistinguishably from the closed-source flagships at 30โ90ร lower cost. The hardest cases (frontier reasoning, agentic engineering at the highest reliability) still favor closed-source by a margin worth paying for.
Can I fine-tune DeepSeek V4 or Qwen 3 235B?
Yes, both support LoRA and QLoRA fine-tuning. DeepSeek V4 ships a non-instruct base checkpoint specifically for this purpose; Qwen 3 ships both base and instruct. On a 4รH100 box, a 10-million-token domain fine-tune takes 6โ12 hours. TRL and Axolotl both support both models.
Are these models multimodal?
No โ both DeepSeek V4 and Qwen 3 235B are text-only at the flagship size. Alibaba ships Qwen 3 VL variants for vision, but they are smaller (7B and 72B). DeepSeek has signaled native multimodal support in a future release. For vision work today, you would pair DeepSeek V4 or Qwen 3 235B with a separate vision model โ or use a closed-source multimodal flagship.
How do I migrate from OpenAI API to DeepSeek V4 or Qwen 3 235B?
Both models are exposed through OpenAI-compatible APIs on Together, Fireworks, and DeepInfra (and on DeepSeek's own API for DeepSeek V4). Migration is a `baseURL` change and a `model` string change โ the rest of your OpenAI SDK code stays identical. The only caveats: parallel tool use is sequential by default, and Qwen 3's `thinking` mode is exposed via a non-standard parameter.
Try Both Open-Source Models Now
Railwail exposes DeepSeek V4 and Qwen 3 235B alongside Claude Opus 4.7, GPT-5.4, and 100+ other models behind a single OpenAI-compatible endpoint. Pay per token at provider list prices โ no markup. Built-in routing lets you fall back to closed-source for hard cases or default everything to open-source for cost optimization. Start with free credits and run your own quality eval.
Sponsored
All Open-Source Models. One API. No Markup.
DeepSeek V4, Qwen 3 235B, Llama 3.3, Mixtral, and 40+ more โ through the same OpenAI-compatible endpoint as GPT and Claude. Pass-through pricing.
