RDT-1B

Custom
VLA / Robotics

Tsinghua's 1B diffusion-transformer bimanual manipulation policy. Predicts next 64 actions per inference.

Research-only model
RDT-1B runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DRΒ·Last updated May 16, 2026

RDT-1B is vla / robotics AI model from Custom, priced at €0.000 per 1M input tokens with a unknown context window.

Try RDT-1B

0.7

Direct API access coming soon

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate RDT-1B into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple β€” just pass a string
const reply = await rw.run("rdt-1b", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("rdt-1b", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("rdt-1b", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Custom
Category
VLA / Robotics
Supported Formats
image
text
Tags
tsinghua
vla
robotics
bimanual
research-only
open-weights
diffusion

Deep dive β€” Tsinghua University (TSAIL / IIIS)'s RDT-1B

About Tsinghua University (TSAIL / IIIS)
Founded 1911 Β· Beijing, China

Robotics Diffusion Transformer (RDT) is a generalist bimanual manipulation policy developed at Tsinghua University's TSAIL / Institute for Interdisciplinary Information Sciences (IIIS), home of Jun Zhu's diffusion-modelling group. RDT-1B, introduced in October 2024, is one of the first publicly released billion-scale diffusion-based Vision-Language-Action models, specifically designed for two-arm robots such as Aloha, Mobile Aloha and a custom bimanual platform used by the authors. The project is positioned as a Chinese academic counterpart to Ο€-0 and OpenVLA, with open weights released on Hugging Face under a permissive licence and the explicit aim of enabling fully reproducible bimanual VLA research.

Visit Tsinghua University (TSAIL / IIIS) β†’
Architecture
Diffusion Transformer Vision-Language-Action policy for bimanual manipulation

RDT-1B is a 1-billion-parameter Diffusion Transformer (DiT) trained as a Vision-Language-Action policy. Inputs are multi-view RGB observations (left + right + overhead), proprioception for both arms and any gripper / mobile-base degrees of freedom, plus a natural-language instruction encoded by a text encoder. The conditioning tokens are fed through a transformer trunk, while a diffusion head denoises continuous action chunks for both arms in a unified action space, allowing dual-arm coordinated motion. Pretraining is done in two stages: a large multi-robot pretraining phase on >1M episodes drawn from public datasets including Open-X-Embodiment and curated bimanual corpora, followed by fine-tuning on the authors' own 6,000-episode bimanual dataset spanning ~300 tasks. RDT-1B reports strong results on dexterous bimanual tasks such as folding T-shirts, pouring, and tool use.

Parameters
1B
Context
unknown
What it can do
  • 1B-parameter Diffusion Transformer VLA
  • Designed for bimanual manipulation (Aloha-class robots)
  • Trained on >1M cross-embodiment episodes + 6k bimanual demos
  • Continuous action chunks for both arms in a unified space
  • Diffusion head produces smooth coordinated motion
  • Open weights on Hugging Face (permissive licence)
  • Strong results on folding, pouring and tool use
  • Reproducible training and evaluation code
  • Best for: bimanual manipulation research, two-arm fine-tuning.
Training & License

Pretraining on >1 million robot episodes from Open-X-Embodiment and other public datasets, followed by fine-tuning on a curated bimanual dataset of ~6,000 episodes covering ~300 tasks collected with Aloha-class hardware.

License: Open weights released on Hugging Face under a permissive (CC-BY-NC-style) licence; primarily intended for research use.

Known limitations
  • Primarily targets bimanual Aloha-class hardware
  • Requires diffusion sampling at inference (multiple steps)
  • Limited language reasoning compared to LLM-backed VLAs
  • Generalisation to single-arm or mobile platforms needs adapters
  • Mostly indoor-lab evaluation
  • Smaller pretraining text corpus than RT-2-X / OpenVLA

Frequently asked questions

Start using RDT-1B today

Get started with free credits. No credit card required. Access RDT-1B and 100+ other models through a single API.