Gemini Robotics (2025)

Google DeepMind
VLA / Robotics

Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.

Research-only model
Gemini Robotics (2025) runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DR·Last updated May 16, 2026

Gemini Robotics (2025) is vla / robotics AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Gemini Robotics (2025)

0.7

Sign in to generate — 50 free credits on sign-up

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Gemini Robotics (2025) into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("gemini-robotics-2025", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("gemini-robotics-2025", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("gemini-robotics-2025", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Google DeepMind
Category
VLA / Robotics
Supported Formats
image
text
Tags
google
deepmind
gemini
vla
robotics
research-only
weights-closed

Deep dive — Google DeepMind's Gemini Robotics (2025)

About Google DeepMind
Founded 2010 · London, UK / Mountain View, USA

Google DeepMind is the merged research organisation of DeepMind (London, 2010) and Google Brain, responsible for the Gemini frontier-model family. The DeepMind robotics group has a long lineage of generalist-robot work, from SayCan and RT-1 through RT-2 and the Open-X-Embodiment collaboration. In March 2025 the team announced Gemini Robotics, an advanced Vision-Language-Action (VLA) model built on top of Gemini 2.0 that brings multimodal reasoning, web knowledge and code into low-level robot control. Gemini Robotics is positioned as a generalist foundation model for dexterous bimanual manipulation, and is being trialled with hardware partners including Apptronik (Apollo humanoid), Agility (Digit) and Boston Dynamics. The model is research-only and not publicly available, but accompanies a public tech report and demo gallery.

Visit Google DeepMind
Architecture
Vision-Language-Action (VLA) transformer on top of Gemini 2.0 multimodal foundation

Gemini Robotics is a Vision-Language-Action model built from Gemini 2.0 by adding an action decoder that converts multimodal context (camera images, language instructions, optional robot state) into continuous low-level control commands. The backbone retains Gemini's web-scale multimodal pretraining (text, image, video, code), so the policy inherits broad world knowledge and language understanding. On top of that, the model is fine-tuned on a large multi-embodiment robot demonstration corpus including Aloha 2 bimanual setups, third-party humanoids (Apptronik Apollo, Agility Digit) and Google's own manipulation platforms. It outputs continuous action chunks at high frequency and is trained to produce smooth, reactive behaviour rather than discrete action tokens. A companion 'ER' (Embodied Reasoning) variant exposes intermediate reasoning, point/box predictions, and 3D grounding to drive planners.

Parameters
Undisclosed (Gemini 2.0-class backbone with action head)
Context
unknown
What it can do
  • Generalist VLA built on Gemini 2.0 multimodal backbone
  • Bimanual dexterous manipulation (Aloha 2, humanoids)
  • Zero-shot generalisation to new objects and instructions
  • Reactive closed-loop control with low latency
  • Natural-language instruction following with chain-of-thought
  • Multi-embodiment: arms, humanoids, mobile bases
  • Tight integration with Gemini Robotics-ER for planning and grounding
  • Public demos: folding origami, packing lunch boxes, slam-dunking miniature balls
  • Best for: research collaborations on dexterous manipulation.
Training & License

Multi-embodiment robot demonstration corpus combining Google's internal datasets (RT-1/RT-2 lineage, Aloha 2 bimanual), partner humanoid data (Apptronik Apollo, Agility Digit) and Open-X-Embodiment style cross-embodiment teleoperation. Inherits Gemini 2.0's multimodal pretraining (text, image, video, code) at web scale.

License: Research-only - not publicly released; access via Google DeepMind research collaborations with selected hardware partners.

Known limitations
  • Not publicly available - partner research access only
  • No public weights or API
  • Performance numbers limited to lab demonstrations
  • Generalisation across very different embodiments still bounded
  • Compute requirements undisclosed but high
  • Real-time deployment requires on-board acceleration

Frequently asked questions

Start using Gemini Robotics (2025) today

Get started with free credits. No credit card required. Access Gemini Robotics (2025) and 100+ other models through a single API.