Gemini 3 Flash

New
Popular
Google DeepMind
Multimodal

Google's April 2026 fast multimodal model. Combines Gemini 3 Pro's reasoning with Flash-tier latency and price. Default model in the Gemini app.

Try Gemini 3 Flash now
Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.
Sign in to try this model with €5 free credits.
Sign in
Press Cmd+Enter to send
Response appears here.
TL;DR·Last updated May 16, 2026

Gemini 3 Flash is multimodal AI model from Google DeepMind, priced at €0.001 per 1M input tokens with a 1.0M tokens context window.

About this model

Announced April 22, 2026, Gemini 3 Flash brings Pro-grade reasoning to the Flash latency tier. 1M-token context, fully multimodal (text, image, audio, video), 65K max output. The default model in the Gemini app and AI Mode in Search. PhD-level reasoning on common benchmarks at a fraction of the cost of 3.1 Pro. Recommended for high-throughput agentic workflows, real-time multimodal chat, RAG and consumer applications.
Try Gemini 3 Flash

0.7

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Gemini 3 Flash into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("gemini-3-flash", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("gemini-3-flash", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("gemini-3-flash", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Context window
1,048,576 tokens
Max output
65,536 tokens
Developer
Google DeepMind
Category
Multimodal
Supported Formats
text
image
audio
video
Tags
google
deepmind
balanced
multimodal
low-latency
long-context
1m-context

Deep dive — Google DeepMind's Gemini 3 Flash

About Google DeepMind
Founded 2010 · Mountain View, USA / London, UK

Google DeepMind is the merged AI research organisation formed in April 2023 by combining Google Brain with DeepMind. Demis Hassabis leads the unit as CEO. Flash variants have been Google's high-throughput tier since Gemini 1.5 Flash (May 2024), with Gemini 2.0 Flash (December 2024), 2.5 Flash (mid-2025) and Gemini 3 Flash (April 2026) representing the progression. DeepMind's seminal papers include 'Attention Is All You Need' (2017), AlphaGo (2016), AlphaFold (2018-2021, Nobel Prize 2024) and the Gemini Technical Report.

Visit Google DeepMind →
Architecture
Sparse Mixture-of-Experts Transformer (multimodal, latency-optimized)

Gemini 3 Flash was announced April 22, 2026 as the default Flash-tier model and the new default model in the Gemini app and AI Mode in Search. It is a natively multimodal Sparse MoE Transformer engineered to combine Gemini 3 Pro's reasoning quality with Flash-grade latency, efficiency and cost. Pretraining used Google's TPU v6e infrastructure on a multi-trillion-token mixture of web text, code, books, image-text pairs, audio and video frames. Post-training combined supervised fine-tuning, RLHF, RL against verifiable rewards and distillation from larger Gemini 3.1 Pro teacher models. The architecture preserves Gemini's native multimodality across text, image, audio and video, the full tool-use API and Search grounding, while running at a fraction of Pro pricing. Gemini 3 Flash is the recommended default for high-throughput agentic workflows and consumer-facing multimodal chat.

Parameters
Undisclosed (sparse MoE, smaller and sparser than Gemini 3.1 Pro)
Context
1.0M tokens
What it can do
  • Pro-grade reasoning at Flash latency
  • 1,048,576 token context window
  • Natively multimodal: text, image, audio and video
  • Search grounding and Code Execution built into the API
  • Function calling, JSON schema and parallel tool calls
  • Default model in the Gemini app and AI Mode in Search
  • PhD-level reasoning on common benchmarks
  • Available via Vertex AI, AI Studio, Gemini Enterprise, Antigravity and the Gemini app
  • Strong long-video understanding (hour-long clips)
  • Cross-lingual fluency across 100+ languages
  • Best for: high-throughput agentic workflows, real-time multimodal chat, RAG, consumer applications.
Training & License

Pretrained on a multi-trillion-token mixture of web text, code, books, scientific papers, image-text pairs, audio and video frames. Heavily distilled from larger Gemini 3.1 Pro teacher models. Post-training uses supervised fine-tuning, RLHF and RL against verifiable rewards. Knowledge cutoff in late 2025.

License: Proprietary commercial license via Google AI Studio, Vertex AI and the Gemini app. Free tier available in the Gemini app and AI Mode in Search.

Known limitations
  • Below Gemini 3.1 Pro on the hardest reasoning and long-context benchmarks
  • Smaller context window than 3.1 Pro (1M vs 2M)
  • Vision can misread dense tables and handwriting
  • Region availability is rolling out in 2026
  • Audio output not yet supported

Frequently asked questions

Start using Gemini 3 Flash today

Get started with free credits. No credit card required. Access Gemini 3 Flash and 100+ other models through a single API.