Microsoft Phi-3.5 MoE Instruct

Microsoft
Text & Chat

Mixture-of-experts Phi-3.5: 42B total / 6.6B active params. 128k context, multilingual.

Try Microsoft Phi-3.5 MoE Instruct now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DRΒ·Last updated May 16, 2026

Microsoft Phi-3.5 MoE Instruct is text & chat AI model from Microsoft, priced at €0.000 per 1M input tokens with a 131.1K tokens context window.

Try Microsoft Phi-3.5 MoE Instruct

0.7

Sign in to generate β€” 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Microsoft Phi-3.5 MoE Instruct into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple β€” just pass a string
const reply = await rw.run("phi-3-5-moe-instruct", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("phi-3-5-moe-instruct", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("phi-3-5-moe-instruct", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
131,072 tokens
Max output
4,096 tokens
Developer
Microsoft
Category
Text & Chat
Supported Formats
text
Tags
microsoft
open-weights
moe
multilingual
pricing-tbd

Deep dive β€” Microsoft Research's Microsoft Phi-3.5 MoE Instruct

About Microsoft Research
Founded 1991 Β· Redmond, Washington, USA

Microsoft Research's Machine Learning Foundations group β€” led by SΓ©bastien Bubeck and Ronen Eldan β€” drove the Phi series of small-but-capable language models. The Phi thesis is that synthetic 'textbook-quality' training data can produce small models that punch far above their weight on reasoning benchmarks. The series began with Phi-1 (1.3B, code, 2023), Phi-1.5 (general reasoning, 2023), Phi-2 (2.7B, 2023), Phi-3 (Mini, Small, Medium dense models, April 2024) and Phi-3.5 (Mini, Vision, MoE, August 2024). Phi-3.5 MoE was Microsoft's first Mixture-of-Experts Phi variant β€” 16 experts of 3.8B parameters each with top-2 routing. Microsoft Research itself was founded in 1991 and remains one of the largest industrial AI research organisations in the world; Phi is one of its flagship open-weights AI projects.

Visit Microsoft Research β†’
Architecture
Mixture-of-Experts Decoder Transformer

Phi-3.5 MoE Instruct is a 16x3.8B Mixture-of-Experts decoder transformer β€” 16 experts each approximately the size of Phi-3-Mini, with top-2 routing yielding 6.6B active parameters out of 41.9B total. The architecture uses 32 layers, 4,096 hidden size, 32-head grouped-query attention with 8 KV heads, RoPE positional embeddings (theta=10000, extended for 128K context), SwiGLU activations, and a 32,064-token Llama-derived BPE tokeniser. Routing uses a sparse mixer with auxiliary loss for expert balancing. The model was pretrained on 4.9 trillion tokens of heavily curated data, with the Phi recipe emphasising synthetic 'textbook-quality' data generated from larger models β€” explicitly oversampling reasoning-dense content over breadth. Training used 512 H100 GPUs for 23 days. Post-training is supervised fine-tuning plus Direct Preference Optimisation (DPO) with explicit safety post-training. Released August 2024 under MIT license.

Parameters
41.9B total, 6.6B active per token (16 experts of ~3.8B each, top-2 routing)
Context
131.1K tokens
What it can do
  • 16-expert MoE β€” Microsoft's first MoE Phi variant
  • Only 6.6B active parameters β€” cheap inference for MoE
  • Punches above weight: matches Mixtral 8x7B (12.9B active) and Llama 3.1 8B on many benchmarks
  • Strong math and reasoning for active-param size (MMLU 78.9, GSM8K 88.7)
  • 128K context window
  • Multilingual support for 22 languages
  • Open weights under permissive MIT license
  • Best for: cost-efficient reasoning, on-device inference (INT4 ~12GB), education and tutoring applications.
Training & License

Pretrained on 4.9 trillion tokens. The mix is heavily curated and includes filtered web data, synthetic 'textbook-quality' data generated from larger models, code, math and 22-language multilingual sources. Knowledge cutoff October 2023. Training used 512 NVIDIA H100 GPUs for 23 days. Post-training is supervised fine-tuning plus DPO with explicit safety post-training and red-team feedback.

License: MIT License for the open weights. Commercial use, redistribution and modification permitted without restriction β€” one of the most permissive licenses among major open-weight LLMs.

Known limitations
  • Total memory ~42B parameters needs ~80GB FP16 β€” heavier than 6.6B active suggests
  • MoE routing means latency spikes on imbalanced batches
  • Knowledge breadth narrower than larger dense models β€” Phi trades breadth for reasoning
  • Behind frontier models on coding benchmarks despite strong math
  • Synthetic-data-heavy training can produce 'textbook-like' answers that don't match real-world tone
  • No vision modality (use Phi-3.5-Vision instead)

Frequently asked questions

Start using Microsoft Phi-3.5 MoE Instruct today

Get started with free credits. No credit card required. Access Microsoft Phi-3.5 MoE Instruct and 100+ other models through a single API.