How much does Google RT-2-X cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Google RT-2-X?

Google RT-2-X supports a unknown context window — enough for typical AI workloads.

How fast is Google RT-2-X?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Google RT-2-X better than Gemini Robotics (2025)?

It depends on your use case. Google RT-2-X (Google DeepMind) and Gemini Robotics (2025) (Google DeepMind) are both strong choices in vla / robotics. Compare them side-by-side at /compare/rt-2-x-vs-gemini-robotics-2025.

Does Google RT-2-X support image input (vision)?

Yes — Google RT-2-X accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: image, text.

Google RT-2-X

Name: Google RT-2-X
Brand: Google
SKU: rt-2-x
Availability: InStock

Google DeepMind

VLA / Robotics

Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.

Research-only model

Google RT-2-X runs on physical robot hardware and is not exposed via the Railwail API yet.

Not API-accessible

Read the research

TL;DR·Last updated June 24, 2026

Google RT-2-X is vla / robotics AI model from Google DeepMind, priced at €0.000 per 1M input tokens with a unknown context window.

Try Google RT-2-X

System Prompt

Message

Temperature

0.7

Max Tokens

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Google RT-2-X into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("rt-2-x", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("rt-2-x", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("rt-2-x", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Google DeepMind

Deep dive — Google DeepMind's Google RT-2-X

About Google DeepMind

Founded 2010 · London, UK / Mountain View, USA

RT-2-X is Google DeepMind's flagship Robotic Transformer 2 (RT-2) model retrained on the Open-X-Embodiment dataset - the first large-scale, multi-institution effort to assemble a unified robot-learning dataset spanning many labs and robots. Open-X-Embodiment was organised in 2023 by Google DeepMind together with 21+ academic and industry institutions (Stanford, UC Berkeley, CMU, MIT, Toyota Research Institute, etc.), producing the RT-X dataset of ~1 million trajectories across 22 robot embodiments. RT-2-X extends RT-2's Vision-Language-Action recipe - using a PaLM-E / PaLI-X style VLM as backbone and emitting actions as text tokens - to this cross-embodiment corpus, demonstrating positive transfer across robots and a new state of the art on generalist manipulation at the time of release. RT-2-X is research-only and not publicly callable; it remains a key academic reference and the conceptual parent of subsequent open VLAs.

Visit Google DeepMind →

Architecture

Vision-Language-Action transformer (PaLI / PaLM-E backbone, discrete action tokens)

RT-2-X follows the RT-2 design: a large Vision-Language Model (PaLI-X or PaLM-E) is co-fine-tuned on web-scale vision-language data and on robot demonstration data, where robot actions are tokenised as strings of natural-language-like tokens (each action dimension binned and rendered as a token). The same next-token prediction objective therefore trains the model on both internet-scale image-text data and on robot trajectories, allowing the resulting policy to inherit web knowledge (object semantics, OCR, common sense) and route it to motor commands. RT-2-X is the version of this recipe trained on the Open-X-Embodiment / RT-X dataset - ~1 million trajectories across 22 robot embodiments - rather than only on Google's internal kitchen-robot dataset. Public results report 5B and 55B variants, with the 55B model showing the strongest generalisation, especially when prompted with unseen language commands or unseen object combinations.

Parameters: Up to 55B (RT-2-X variants: 5B and 55B)
Context: unknown

What it can do

Generalist VLA trained on Open-X-Embodiment (22 robots)
Inherits web-scale knowledge from PaLI / PaLM-E backbones
Discrete action-token decoding (text-like vocabulary)
Positive transfer across robot embodiments
Strong emergent semantic reasoning (e.g. 'pick up the extinct animal')
5B and 55B parameter variants
Reference architecture for the modern VLA paradigm
Co-training on internet data + robot demos
Best for: research, citation, conceptual baseline for VLAs.

Training & License

Co-trained on internet-scale vision-language data (PaLI / PaLM-E corpora) plus ~1 million robot trajectories from the Open-X-Embodiment (RT-X) dataset across 22 robot embodiments. Action targets are tokenised continuous controls.

License: Research-only - Google DeepMind has not publicly released the RT-2-X weights, code or API. Some Open-X-Embodiment data and smaller RT-X reproductions are available, but the proprietary RT-2-X checkpoints are not.

Known limitations