Gemini Robotics (2025)
Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.
Gemini Robotics (2025) is vla / robotics AI model from Google DeepMind, priced at โฌ0.000 per 1M input tokens with a unknown context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Gemini Robotics (2025) into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple โ just pass a string
const reply = await rw.run("gemini-robotics-2025", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("gemini-robotics-2025", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("gemini-robotics-2025", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive โ Google DeepMind's Gemini Robotics (2025)
Google DeepMind is the merged research organisation of DeepMind (London, 2010) and Google Brain, responsible for the Gemini frontier-model family. The DeepMind robotics group has a long lineage of generalist-robot work, from SayCan and RT-1 through RT-2 and the Open-X-Embodiment collaboration. In March 2025 the team announced Gemini Robotics, an advanced Vision-Language-Action (VLA) model built on top of Gemini 2.0 that brings multimodal reasoning, web knowledge and code into low-level robot control. Gemini Robotics is positioned as a generalist foundation model for dexterous bimanual manipulation, and is being trialled with hardware partners including Apptronik (Apollo humanoid), Agility (Digit) and Boston Dynamics. The model is research-only and not publicly available, but accompanies a public tech report and demo gallery.
Visit Google DeepMind โGemini Robotics is a Vision-Language-Action model built from Gemini 2.0 by adding an action decoder that converts multimodal context (camera images, language instructions, optional robot state) into continuous low-level control commands. The backbone retains Gemini's web-scale multimodal pretraining (text, image, video, code), so the policy inherits broad world knowledge and language understanding. On top of that, the model is fine-tuned on a large multi-embodiment robot demonstration corpus including Aloha 2 bimanual setups, third-party humanoids (Apptronik Apollo, Agility Digit) and Google's own manipulation platforms. It outputs continuous action chunks at high frequency and is trained to produce smooth, reactive behaviour rather than discrete action tokens. A companion 'ER' (Embodied Reasoning) variant exposes intermediate reasoning, point/box predictions, and 3D grounding to drive planners.
- Parameters
- Undisclosed (Gemini 2.0-class backbone with action head)
- Context
- unknown
- Generalist VLA built on Gemini 2.0 multimodal backbone
- Bimanual dexterous manipulation (Aloha 2, humanoids)
- Zero-shot generalisation to new objects and instructions
- Reactive closed-loop control with low latency
- Natural-language instruction following with chain-of-thought
- Multi-embodiment: arms, humanoids, mobile bases
- Tight integration with Gemini Robotics-ER for planning and grounding
- Public demos: folding origami, packing lunch boxes, slam-dunking miniature balls
- Best for: research collaborations on dexterous manipulation.
Multi-embodiment robot demonstration corpus combining Google's internal datasets (RT-1/RT-2 lineage, Aloha 2 bimanual), partner humanoid data (Apptronik Apollo, Agility Digit) and Open-X-Embodiment style cross-embodiment teleoperation. Inherits Gemini 2.0's multimodal pretraining (text, image, video, code) at web scale.
License: Research-only - not publicly released; access via Google DeepMind research collaborations with selected hardware partners.
Known limitations
- Not publicly available - partner research access only
- No public weights or API
- Performance numbers limited to lab demonstrations
- Generalisation across very different embodiments still bounded
- Compute requirements undisclosed but high
- Real-time deployment requires on-board acceleration
Frequently asked questions
Related Models
View all VLA / RoboticsGemini Robotics-ER
Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.
Google RT-2-X
Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.
LeRobot SmolVLA
HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.
NVIDIA Cosmos-Predict-1
NVIDIA's world foundation model for physical AI. Diffusion-based video prediction for robotics simulation.
Start using Gemini Robotics (2025) today
Get started with free credits. No credit card required. Access Gemini Robotics (2025) and 100+ other models through a single API.