How much does Microsoft Phi-3.5 MoE Instruct cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Microsoft Phi-3.5 MoE Instruct?

Microsoft Phi-3.5 MoE Instruct supports a 131.1K tokens context window — enough for long books, technical manuals, and extended analysis.

How fast is Microsoft Phi-3.5 MoE Instruct?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Microsoft Phi-3.5 MoE Instruct better than Claude Opus 4?

It depends on your use case. Microsoft Phi-3.5 MoE Instruct (Microsoft) and Claude Opus 4 (Anthropic) are both strong choices in text & chat. Compare them side-by-side at /compare/phi-3-5-moe-instruct-vs-claude-opus-4.

Microsoft Phi-3.5 MoE Instruct

Name: Microsoft Phi-3.5 MoE Instruct
Brand: Together AI
SKU: phi-3-5-moe-instruct
Availability: InStock

Microsoft

Text & Chat

Mixture-of-experts Phi-3.5: 42B total / 6.6B active params. 128k context, multilingual.

Try Microsoft Phi-3.5 MoE Instruct now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated May 16, 2026

Microsoft Phi-3.5 MoE Instruct is text & chat AI model from Microsoft, priced at €0.000 per 1M input tokens with a 131.1K tokens context window.

Try Microsoft Phi-3.5 MoE Instruct

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Microsoft Phi-3.5 MoE Instruct into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("phi-3-5-moe-instruct", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("phi-3-5-moe-instruct", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("phi-3-5-moe-instruct", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

131,072 tokens

Max output

4,096 tokens

Developer

Microsoft

Deep dive — Microsoft Research's Microsoft Phi-3.5 MoE Instruct

About Microsoft Research

Founded 1991 · Redmond, Washington, USA

Microsoft Research's Machine Learning Foundations group — led by Sébastien Bubeck and Ronen Eldan — drove the Phi series of small-but-capable language models. The Phi thesis is that synthetic 'textbook-quality' training data can produce small models that punch far above their weight on reasoning benchmarks. The series began with Phi-1 (1.3B, code, 2023), Phi-1.5 (general reasoning, 2023), Phi-2 (2.7B, 2023), Phi-3 (Mini, Small, Medium dense models, April 2024) and Phi-3.5 (Mini, Vision, MoE, August 2024). Phi-3.5 MoE was Microsoft's first Mixture-of-Experts Phi variant — 16 experts of 3.8B parameters each with top-2 routing. Microsoft Research itself was founded in 1991 and remains one of the largest industrial AI research organisations in the world; Phi is one of its flagship open-weights AI projects.

Visit Microsoft Research →

Architecture

Mixture-of-Experts Decoder Transformer

Phi-3.5 MoE Instruct is a 16x3.8B Mixture-of-Experts decoder transformer — 16 experts each approximately the size of Phi-3-Mini, with top-2 routing yielding 6.6B active parameters out of 41.9B total. The architecture uses 32 layers, 4,096 hidden size, 32-head grouped-query attention with 8 KV heads, RoPE positional embeddings (theta=10000, extended for 128K context), SwiGLU activations, and a 32,064-token Llama-derived BPE tokeniser. Routing uses a sparse mixer with auxiliary loss for expert balancing. The model was pretrained on 4.9 trillion tokens of heavily curated data, with the Phi recipe emphasising synthetic 'textbook-quality' data generated from larger models — explicitly oversampling reasoning-dense content over breadth. Training used 512 H100 GPUs for 23 days. Post-training is supervised fine-tuning plus Direct Preference Optimisation (DPO) with explicit safety post-training. Released August 2024 under MIT license.

Parameters: 41.9B total, 6.6B active per token (16 experts of ~3.8B each, top-2 routing)
Context: 131.1K tokens

What it can do

16-expert MoE — Microsoft's first MoE Phi variant
Only 6.6B active parameters — cheap inference for MoE
Punches above weight: matches Mixtral 8x7B (12.9B active) and Llama 3.1 8B on many benchmarks
Strong math and reasoning for active-param size (MMLU 78.9, GSM8K 88.7)
128K context window
Multilingual support for 22 languages
Open weights under permissive MIT license
Best for: cost-efficient reasoning, on-device inference (INT4 ~12GB), education and tutoring applications.

Training & License

Pretrained on 4.9 trillion tokens. The mix is heavily curated and includes filtered web data, synthetic 'textbook-quality' data generated from larger models, code, math and 22-language multilingual sources. Knowledge cutoff October 2023. Training used 512 NVIDIA H100 GPUs for 23 days. Post-training is supervised fine-tuning plus DPO with explicit safety post-training and red-team feedback.

License: MIT License for the open weights. Commercial use, redistribution and modification permitted without restriction — one of the most permissive licenses among major open-weight LLMs.

Known limitations

Total memory ~42B parameters needs ~80GB FP16 — heavier than 6.6B active suggests
MoE routing means latency spikes on imbalanced batches
Knowledge breadth narrower than larger dense models — Phi trades breadth for reasoning
Behind frontier models on coding benchmarks despite strong math
Synthetic-data-heavy training can produce 'textbook-like' answers that don't match real-world tone
No vision modality (use Phi-3.5-Vision instead)

Research papers

Frequently asked questions

Related Models

View all Text & Chat

Claude Opus 4

Anthropic

Anthropic's most powerful model. Exceptional at complex analysis, agentic tasks, and extended reasoning.

Free

Claude Sonnet 4

Anthropic

Anthropic's most capable model. Excellent for complex analysis, coding, math, and creative writing.

Free

DeepSeek V3.1

DeepSeek

DeepSeek's refreshed V3.1 release. 671B MoE / 37B active. Tops open-weights leaderboards on coding and reasoning.

Free

DeepSeek V4 Pro

DeepSeek

DeepSeek's April 2026 flagship. 1.6T MoE / 49B active params, 1M context, rivals top closed-source models on STEM and coding at a fraction of the price.

Free

Start using Microsoft Phi-3.5 MoE Instruct today

Get started with free credits. No credit card required. Access Microsoft Phi-3.5 MoE Instruct and 100+ other models through a single API.

Get Started Free Browse All Models

Microsoft Phi-3.5 MoE Instruct

Pricing

API Integration

Deep dive — Microsoft Research's Microsoft Phi-3.5 MoE Instruct

Research papers

Frequently asked questions

What is Microsoft Phi-3.5 MoE Instruct?

How much does Microsoft Phi-3.5 MoE Instruct cost via Railwail?

What is the context window of Microsoft Phi-3.5 MoE Instruct?

How fast is Microsoft Phi-3.5 MoE Instruct?

Is Microsoft Phi-3.5 MoE Instruct better than Claude Opus 4?

Related Models

Claude Opus 4

Claude Sonnet 4

DeepSeek V3.1

DeepSeek V4 Pro

Start using Microsoft Phi-3.5 MoE Instruct today