How much does Gemini 3 Flash cost via Railwail?

Input: €0.001 per 1M tokens. Output: €0.003 per 1M tokens. No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Gemini 3 Flash?

Gemini 3 Flash supports a 1.0M tokens context window — enough for entire codebases or research papers in one prompt.

How fast is Gemini 3 Flash?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Gemini 3 Flash better than BLIP?

It depends on your use case. Gemini 3 Flash (Google DeepMind) and BLIP (Salesforce) are both strong choices in multimodal. Compare them side-by-side at /compare/gemini-3-flash-vs-blip-captioning.

Does Gemini 3 Flash support image input (vision)?

Yes — Gemini 3 Flash accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: text, image, audio, video.

Gemini 3 Flash

Name: Gemini 3 Flash
Brand: Google
SKU: gemini-3-flash
Price: 1e-6 EUR
Availability: InStock

New

Popular

Google DeepMind

Multimodal

Google's April 2026 fast multimodal model. Combines Gemini 3 Pro's reasoning with Flash-tier latency and price. Default model in the Gemini app.

Try Gemini 3 Flash now

Send a single prompt and stream a response inline. Hit Cmd+Enter to submit.

Press Cmd+Enter to send

Response appears here.

TL;DR·Last updated June 24, 2026

Gemini 3 Flash is multimodal AI model from Google DeepMind, priced at €0.001 per 1M input tokens with a 1.0M tokens context window.

About this model

Announced April 22, 2026, Gemini 3 Flash brings Pro-grade reasoning to the Flash latency tier. 1M-token context, fully multimodal (text, image, audio, video), 65K max output. The default model in the Gemini app and AI Mode in Search. PhD-level reasoning on common benchmarks at a fraction of the cost of 3.1 Pro. Recommended for high-throughput agentic workflows, real-time multimodal chat, RAG and consumer applications.

Try Gemini 3 Flash

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Gemini 3 Flash into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("gemini-3-flash", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("gemini-3-flash", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("gemini-3-flash", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Context window

1,048,576 tokens

Max output

65,536 tokens

Developer

Google DeepMind

Deep dive — Google DeepMind's Gemini 3 Flash

About Google DeepMind

Founded 2010 · Mountain View, USA / London, UK

Google DeepMind is the merged AI research organisation formed in April 2023 by combining Google Brain with DeepMind. Demis Hassabis leads the unit as CEO. Flash variants have been Google's high-throughput tier since Gemini 1.5 Flash (May 2024), with Gemini 2.0 Flash (December 2024), 2.5 Flash (mid-2025) and Gemini 3 Flash (April 2026) representing the progression. DeepMind's seminal papers include 'Attention Is All You Need' (2017), AlphaGo (2016), AlphaFold (2018-2021, Nobel Prize 2024) and the Gemini Technical Report.

Visit Google DeepMind →

Architecture

Sparse Mixture-of-Experts Transformer (multimodal, latency-optimized)

Gemini 3 Flash was announced April 22, 2026 as the default Flash-tier model and the new default model in the Gemini app and AI Mode in Search. It is a natively multimodal Sparse MoE Transformer engineered to combine Gemini 3 Pro's reasoning quality with Flash-grade latency, efficiency and cost. Pretraining used Google's TPU v6e infrastructure on a multi-trillion-token mixture of web text, code, books, image-text pairs, audio and video frames. Post-training combined supervised fine-tuning, RLHF, RL against verifiable rewards and distillation from larger Gemini 3.1 Pro teacher models. The architecture preserves Gemini's native multimodality across text, image, audio and video, the full tool-use API and Search grounding, while running at a fraction of Pro pricing. Gemini 3 Flash is the recommended default for high-throughput agentic workflows and consumer-facing multimodal chat.

Parameters: Undisclosed (sparse MoE, smaller and sparser than Gemini 3.1 Pro)
Context: 1.0M tokens

What it can do

Pro-grade reasoning at Flash latency
1,048,576 token context window
Natively multimodal: text, image, audio and video
Search grounding and Code Execution built into the API
Function calling, JSON schema and parallel tool calls
Default model in the Gemini app and AI Mode in Search
PhD-level reasoning on common benchmarks
Available via Vertex AI, AI Studio, Gemini Enterprise, Antigravity and the Gemini app
Strong long-video understanding (hour-long clips)
Cross-lingual fluency across 100+ languages
Best for: high-throughput agentic workflows, real-time multimodal chat, RAG, consumer applications.

Training & License

Pretrained on a multi-trillion-token mixture of web text, code, books, scientific papers, image-text pairs, audio and video frames. Heavily distilled from larger Gemini 3.1 Pro teacher models. Post-training uses supervised fine-tuning, RLHF and RL against verifiable rewards. Knowledge cutoff in late 2025.

License: Proprietary commercial license via Google AI Studio, Vertex AI and the Gemini app. Free tier available in the Gemini app and AI Mode in Search.

Known limitations

Below Gemini 3.1 Pro on the hardest reasoning and long-context benchmarks
Smaller context window than 3.1 Pro (1M vs 2M)
Vision can misread dense tables and handwriting
Region availability is rolling out in 2026
Audio output not yet supported

Research papers

Frequently asked questions

Related Models

View all Multimodal

BLIP

Salesforce

Salesforce BLIP. Vision-language model for image captioning and visual question answering. Given an image it writes a short natural-language caption, or answers a question about the image when one is supplied. A widely used baseline for automatic captioning.

€1.00

CLIP Interrogator

Community

pharmapsychotic's CLIP Interrogator. Takes an image and produces a Stable-Diffusion-style text prompt by combining BLIP captioning with CLIP to rank likely subjects, artists, mediums and styles. Commonly used to reverse-engineer a prompt from an existing picture.

€1.00

Claude 3.5 Sonnet (vision)

Anthropic

Anthropic Claude 3.5 Sonnet with image input. 200k context, strong on dense documents, tables, charts and handwriting. Reliable structured extraction from screenshots and scans.

Free

Claude Opus 4.7