Gemini Robotics-ER

Google DeepMind
VLA / Robotics

Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.

Research-only model
Gemini Robotics-ER runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DR·Last updated May 16, 2026

Gemini Robotics-ER is vla / robotics AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Gemini Robotics-ER

0.7

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Gemini Robotics-ER into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("gemini-robotics-er", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("gemini-robotics-er", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("gemini-robotics-er", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Google DeepMind
Category
VLA / Robotics
Supported Formats
image
text
Tags
google
deepmind
gemini
vla
robotics
research-only
weights-closed
embodied-reasoning

Deep dive — Google DeepMind's Gemini Robotics-ER

About Google DeepMind
Founded 2010 · London, UK / Mountain View, USA

Google DeepMind announced Gemini Robotics-ER (Embodied Reasoning) alongside Gemini Robotics in March 2025. While Gemini Robotics is the action-producing VLA, Gemini Robotics-ER is the reasoning-focused sibling: a Vision-Language Model variant of Gemini 2.0 specialised for spatial understanding, 3D grounding, point/box prediction, trajectory planning and code generation for robotics. It is designed to be combined with classical motion planners, low-level controllers or with the Gemini Robotics VLA itself. DeepMind positions Gemini Robotics-ER as a 'reasoning brain' that a robot stack can call with multimodal prompts to decompose tasks, locate objects in 2D / 3D, and emit waypoints or Python control code. As with Gemini Robotics, access is limited to research and partner programs.

Visit Google DeepMind →
Architecture
Vision-Language Model for Embodied Reasoning (no end-to-end action head)

Gemini Robotics-ER is a fine-tuned variant of Gemini 2.0 specialised for embodied perception and planning rather than direct control. The architecture preserves the multimodal Transformer backbone of Gemini 2.0 (image, video, text, code) but is post-trained on a curated corpus of embodied tasks: object detection in 2D and 3D, point and bounding-box prediction, grasp prediction, motion-trajectory generation, and code-as-policy outputs that call robot APIs. It can accept egocentric robot camera streams and a natural-language task description, then produce structured outputs such as pixel-space points to grasp, 3D coordinates relative to the camera, planning steps, or Python snippets that drive a downstream controller. In combination with the Gemini Robotics VLA, Robotics-ER provides high-level reasoning while the VLA handles closed-loop low-level actions.

Parameters
Undisclosed (Gemini 2.0-class)
Context
unknown
What it can do
  • Spatial reasoning over 2D / 3D scenes
  • Point and bounding-box prediction for objects and grasps
  • Trajectory waypoint generation
  • Code-as-policy generation (Python that calls robot APIs)
  • Compositional task planning from natural language
  • Pair with motion planners or with Gemini Robotics VLA
  • Multimodal context: images, video, text, robot state
  • Improved zero-shot performance on embodied QA benchmarks
  • Best for: planning and grounding modules in research robotics stacks.
Training & License

Gemini 2.0 multimodal pretraining plus embodied post-training on object-detection, 3D grounding, grasp prediction, trajectory planning, and code-generation tasks for robotic control. Draws on Google's internal robot datasets and curated public embodied datasets.

License: Research-only / partner access through Google DeepMind. Not publicly downloadable.

Known limitations
  • No direct low-level action output
  • Requires downstream controller or planner
  • Closed model - no public weights or API
  • Spatial reasoning still imperfect on cluttered scenes
  • Latency too high for tight inner control loops
  • Generalisation depends on prompt and tool stack

Frequently asked questions

Start using Gemini Robotics-ER today

Get started with free credits. No credit card required. Access Gemini Robotics-ER and 100+ other models through a single API.