How much does Gemini Robotics-ER cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Gemini Robotics-ER?

Gemini Robotics-ER supports a unknown context window — enough for typical AI workloads.

How fast is Gemini Robotics-ER?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Gemini Robotics-ER better than Gemini Robotics (2025)?

It depends on your use case. Gemini Robotics-ER (Google DeepMind) and Gemini Robotics (2025) (Google DeepMind) are both strong choices in vla / robotics. Compare them side-by-side at /compare/gemini-robotics-er-vs-gemini-robotics-2025.

Does Gemini Robotics-ER support image input (vision)?

Yes — Gemini Robotics-ER accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: image, text.

Gemini Robotics-ER

Name: Gemini Robotics-ER
Brand: Google
SKU: gemini-robotics-er
Availability: InStock

Google DeepMind

VLA / Robotics

Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.

Research-only model

Gemini Robotics-ER runs on physical robot hardware and is not exposed via the Railwail API yet.

Not API-accessible

Read the research

TL;DR·Last updated June 24, 2026

Gemini Robotics-ER is vla / robotics AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Gemini Robotics-ER

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Gemini Robotics-ER into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("gemini-robotics-er", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("gemini-robotics-er", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("gemini-robotics-er", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Google DeepMind

Deep dive — Google DeepMind's Gemini Robotics-ER

About Google DeepMind

Founded 2010 · London, UK / Mountain View, USA

Google DeepMind announced Gemini Robotics-ER (Embodied Reasoning) alongside Gemini Robotics in March 2025. While Gemini Robotics is the action-producing VLA, Gemini Robotics-ER is the reasoning-focused sibling: a Vision-Language Model variant of Gemini 2.0 specialised for spatial understanding, 3D grounding, point/box prediction, trajectory planning and code generation for robotics. It is designed to be combined with classical motion planners, low-level controllers or with the Gemini Robotics VLA itself. DeepMind positions Gemini Robotics-ER as a 'reasoning brain' that a robot stack can call with multimodal prompts to decompose tasks, locate objects in 2D / 3D, and emit waypoints or Python control code. As with Gemini Robotics, access is limited to research and partner programs.

Visit Google DeepMind →

Architecture

Vision-Language Model for Embodied Reasoning (no end-to-end action head)

Gemini Robotics-ER is a fine-tuned variant of Gemini 2.0 specialised for embodied perception and planning rather than direct control. The architecture preserves the multimodal Transformer backbone of Gemini 2.0 (image, video, text, code) but is post-trained on a curated corpus of embodied tasks: object detection in 2D and 3D, point and bounding-box prediction, grasp prediction, motion-trajectory generation, and code-as-policy outputs that call robot APIs. It can accept egocentric robot camera streams and a natural-language task description, then produce structured outputs such as pixel-space points to grasp, 3D coordinates relative to the camera, planning steps, or Python snippets that drive a downstream controller. In combination with the Gemini Robotics VLA, Robotics-ER provides high-level reasoning while the VLA handles closed-loop low-level actions.

Parameters: Undisclosed (Gemini 2.0-class)
Context: unknown

What it can do

Spatial reasoning over 2D / 3D scenes
Point and bounding-box prediction for objects and grasps
Trajectory waypoint generation
Code-as-policy generation (Python that calls robot APIs)
Compositional task planning from natural language
Pair with motion planners or with Gemini Robotics VLA
Multimodal context: images, video, text, robot state
Improved zero-shot performance on embodied QA benchmarks
Best for: planning and grounding modules in research robotics stacks.

Training & License

Gemini 2.0 multimodal pretraining plus embodied post-training on object-detection, 3D grounding, grasp prediction, trajectory planning, and code-generation tasks for robotic control. Draws on Google's internal robot datasets and curated public embodied datasets.

License: Research-only / partner access through Google DeepMind. Not publicly downloadable.

Known limitations