How much does RDT-1B cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of RDT-1B?

RDT-1B supports a unknown context window — enough for typical AI workloads.

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is RDT-1B better than Gemini Robotics (2025)?

It depends on your use case. RDT-1B (Custom) and Gemini Robotics (2025) (Google DeepMind) are both strong choices in vla / robotics. Compare them side-by-side at /compare/rdt-1b-vs-gemini-robotics-2025.

Does RDT-1B support image input (vision)?

Yes — RDT-1B accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: image, text.

RDT-1B

Name: RDT-1B
Brand: Custom
SKU: rdt-1b
Availability: InStock

Custom

VLA / Robotics

Tsinghua's 1B diffusion-transformer bimanual manipulation policy. Predicts next 64 actions per inference.

Research-only model

RDT-1B runs on physical robot hardware and is not exposed via the Railwail API yet.

Not API-accessible

Read the research

TL;DR·Last updated June 24, 2026

RDT-1B is vla / robotics AI model from Custom, priced at €0.000 per 1M input tokens with a unknown context window.

Try RDT-1B

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate RDT-1B into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("rdt-1b", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("rdt-1b", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("rdt-1b", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Custom

Deep dive — Tsinghua University (TSAIL / IIIS)'s RDT-1B

About Tsinghua University (TSAIL / IIIS)

Founded 1911 · Beijing, China

Robotics Diffusion Transformer (RDT) is a generalist bimanual manipulation policy developed at Tsinghua University's TSAIL / Institute for Interdisciplinary Information Sciences (IIIS), home of Jun Zhu's diffusion-modelling group. RDT-1B, introduced in October 2024, is one of the first publicly released billion-scale diffusion-based Vision-Language-Action models, specifically designed for two-arm robots such as Aloha, Mobile Aloha and a custom bimanual platform used by the authors. The project is positioned as a Chinese academic counterpart to π-0 and OpenVLA, with open weights released on Hugging Face under a permissive licence and the explicit aim of enabling fully reproducible bimanual VLA research.

Visit Tsinghua University (TSAIL / IIIS) →

Architecture

Diffusion Transformer Vision-Language-Action policy for bimanual manipulation

RDT-1B is a 1-billion-parameter Diffusion Transformer (DiT) trained as a Vision-Language-Action policy. Inputs are multi-view RGB observations (left + right + overhead), proprioception for both arms and any gripper / mobile-base degrees of freedom, plus a natural-language instruction encoded by a text encoder. The conditioning tokens are fed through a transformer trunk, while a diffusion head denoises continuous action chunks for both arms in a unified action space, allowing dual-arm coordinated motion. Pretraining is done in two stages: a large multi-robot pretraining phase on >1M episodes drawn from public datasets including Open-X-Embodiment and curated bimanual corpora, followed by fine-tuning on the authors' own 6,000-episode bimanual dataset spanning ~300 tasks. RDT-1B reports strong results on dexterous bimanual tasks such as folding T-shirts, pouring, and tool use.

Parameters: 1B
Context: unknown

What it can do

1B-parameter Diffusion Transformer VLA
Designed for bimanual manipulation (Aloha-class robots)
Trained on >1M cross-embodiment episodes + 6k bimanual demos
Continuous action chunks for both arms in a unified space
Diffusion head produces smooth coordinated motion
Open weights on Hugging Face (permissive licence)
Strong results on folding, pouring and tool use
Reproducible training and evaluation code
Best for: bimanual manipulation research, two-arm fine-tuning.

Training & License

Pretraining on >1 million robot episodes from Open-X-Embodiment and other public datasets, followed by fine-tuning on a curated bimanual dataset of ~6,000 episodes covering ~300 tasks collected with Aloha-class hardware.

License: Open weights released on Hugging Face under a permissive (CC-BY-NC-style) licence; primarily intended for research use.

Known limitations