Mistral Pixtral Large (124B)
Mistral's 124B multimodal flagship. 123B decoder + 1B vision encoder, 128k ctx, up to 30 images per request.
Mistral Pixtral Large (124B) is multimodal AI model from Mistral AI, priced at β¬2.00 per 1M input tokens with a 131.1K tokens context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Mistral Pixtral Large (124B) into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple β just pass a string
const reply = await rw.run("pixtral-large-124b", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("pixtral-large-124b", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("pixtral-large-124b", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive β Mistral AI's Mistral Pixtral Large (124B)
Mistral AI was founded in April 2023 in Paris by Arthur Mensch (CEO, ex-DeepMind), Guillaume Lample (Chief Scientist, ex-Meta FAIR LLaMA lead) and Timothee Lacroix (CTO, ex-Meta FAIR). The company is the European poster child for sovereign AI and has raised a record-setting $113M seed (June 2023) followed by $415M Series A (December 2023), $645M Series B (June 2024) and a reported β¬1B+ funding round in 2025 at a valuation above $6B, with backers including Andreessen Horowitz, Lightspeed, General Catalyst, BPI France and Cisco. Mistral's model line includes Mistral 7B, Mixtral 8x7B and 8x22B (MoE), Mistral Large 1/2, Codestral, and the multimodal Pixtral family. Pixtral 12B launched in September 2024 and Pixtral Large 124B followed in November 2024 as the flagship multimodal model, released under the Mistral Research Licence and offered commercially via la Plateforme.
Visit Mistral AI βPixtral Large is a 124B-parameter multimodal model built on top of Mistral Large 2 (123B decoder-only Transformer) and Mistral's custom Pixtral-ViT vision encoder (1B parameters, 400M visible). The vision encoder processes each image at its native aspect ratio in patches of 16x16 pixels and projects the resulting visual tokens directly into the LLM token stream (without cross-attention or fixed-size resampler), which lets the model handle high-resolution images and many images per prompt in the same 131K-token context window. The vision encoder uses 2D rotary position embeddings and an attention mask that distinguishes patches from different images. Training used a multi-stage curriculum: vision-encoder pretraining on image-text pairs, joint vision-language pretraining and multimodal supervised fine-tuning with chain-of-thought style instruction data. Pixtral Large posts top scores on MathVista, DocVQA and VQAv2, often matching GPT-4o and Claude 3.5 Sonnet at lower cost. The model is released under the Mistral Research Licence with a separate commercial licence available.
- Parameters
- 124B (123B LLM + 1B vision encoder)
- Context
- 131.1K tokens
- Native variable-resolution image input (no fixed grid)
- Up to 30 images per request, 131K-token context
- Top-tier MathVista, DocVQA, VQAv2 and ChartQA scores
- 131K-token context window
- Multilingual: English, German, French, Spanish, Italian, Portuguese, Dutch, Russian, Arabic, Hindi, Japanese, Chinese, Korean
- JSON output and tool use inherited from Mistral Large 2
- Open weights under Mistral Research Licence (commercial via separate licence)
- Best for: high-end open-weights multimodal apps, document AI, math-with-image, European data sovereignty
Mistral has not disclosed exact dataset sizes. The technical report describes a 'multimodal corpus of public web data, licensed image-text pairs and curated instruction data' with a knowledge cutoff in 2024.
License: Weights under Mistral Research Licence (non-commercial). Commercial use via Mistral la Plateforme API or a paid Mistral Commercial Licence.
Known limitations
- Commercial use requires paid licence
- Serving 124B requires multi-GPU infrastructure
- No video or audio input
- Vision quality below GPT-4o on the very hardest charts
- Multilingual vision coverage lighter than text
Frequently asked questions
Related Models
View all MultimodalClaude Opus 4.7
Anthropic's April 2026 flagship. 87.6% on SWE-bench Verified, 3x higher image resolution, output self-verification, vision + reasoning.
Claude Sonnet 4.6
Anthropic's balanced mid-tier model from February 2026. Best price/performance for production workloads: 5x cheaper than Opus, near-flagship quality.
Depth Anything v2
Monocular depth-estimation model trained on 595k labeled and 62M unlabeled images. Strong zero-shot generalization in indoor and outdoor scenes.
GPT-5.4
OpenAI's unified flagship combining GPT and o-series reasoning into one model. 1M context, multimodal, top SWE-Bench Pro and OSWorld scores.
Start using Mistral Pixtral Large (124B) today
Get started with free credits. No credit card required. Access Mistral Pixtral Large (124B) and 100+ other models through a single API.