Octo Base

UC Berkeley
VLA / Robotics

Berkeley/Stanford 93M transformer diffusion policy. Pretrained on 800k Open-X-Embodiment episodes.

Research-only model
Octo Base runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DRยทLast updated May 16, 2026

Octo Base is vla / robotics AI model from UC Berkeley, priced at โ‚ฌ0.000 per 1M input tokens with a unknown context window.

Try Octo Base

0.7

Direct API access coming soon

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Octo Base into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple โ€” just pass a string
const reply = await rw.run("octo-base", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("octo-base", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("octo-base", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
UC Berkeley
Category
VLA / Robotics
Supported Formats
image
text
Tags
berkeley
stanford
vla
robotics
research-only
open-weights
small

Deep dive โ€” UC Berkeley / Stanford (Octo Model Team)'s Octo Base

About UC Berkeley / Stanford (Octo Model Team)
Founded 2023 ยท Berkeley & Stanford, California, USA

The Octo project is a collaboration of academic labs led by Sergey Levine (UC Berkeley BAIR) and Chelsea Finn (Stanford IRIS), with contributions from CMU, Google DeepMind, and Toyota Research Institute. Octo was first released in May 2024 alongside the Open-X-Embodiment dataset effort, with the goal of producing a generalist, fully open-source robot policy that any researcher can fine-tune on a new robot in hours. Octo introduced the recipe of a transformer policy with a diffusion action head trained on 800k cross-embodiment demonstrations, and it has become a de-facto baseline in academic VLA / generalist-policy research. The team released both Octo-Small (27M) and Octo-Base (93M) under Apache-2.0, alongside code, checkpoints and a fine-tuning toolkit.

Visit UC Berkeley / Stanford (Octo Model Team) โ†’
Architecture
Transformer policy with diffusion action head (Vision-Language-Action)

Octo-Base is a transformer-based generalist robot policy. Inputs are tokenised RGB views and a natural-language instruction (encoded with a T5-base text encoder), interleaved with learnable readout tokens. The transformer trunk consumes this sequence and emits action latents that are decoded by a diffusion head producing continuous action chunks (default 4-step lookahead, 7-DoF end-effector deltas). The model was pretrained on roughly 800k demonstrations from 25 datasets in the Open-X-Embodiment collection, covering 9 robots, both single-arm and bimanual setups. Octo is intentionally embodiment-agnostic: action and proprioception spaces are encoded via shared adapters so the same backbone can be fine-tuned to new robots with as little as a few hundred demos. The diffusion head gives smooth, multimodal trajectories that outperform discrete-token VLAs on dexterous tasks at this scale.

Parameters
93M
Context
unknown
What it can do
  • Generalist VLA policy across many robot embodiments
  • Trained on ~800k demos from Open-X-Embodiment
  • Diffusion action head produces smooth continuous actions
  • Natural-language instruction conditioning (T5 encoder)
  • Multi-view image inputs (primary + wrist cameras)
  • Designed for fast fine-tuning on new robots and tasks
  • Apache-2.0 open weights, code and recipes
  • Strong academic baseline for VLA papers
  • Best for: research, fine-tuning to new embodiments, generalist-policy benchmarks.
Training & License

~800,000 robot trajectories drawn from 25 Open-X-Embodiment-compatible datasets across 9 robot embodiments (Franka, WidowX, Bridge, RT-1 Everyday Robots, Berkeley UR5, etc.). Trained on TPU v4 / v5 hardware.

License: Apache-2.0 - fully open weights, code, and dataset references. Research-friendly; commercial use permitted under the licence.

Known limitations
  • Modest 93M scale - underperforms 7B+ VLAs on hard generalisation
  • Optimised for 7-DoF end-effector control - bimanual humanoid action spaces need adapters
  • Limited language reasoning relative to LLM-backed VLAs
  • Image resolution capped (256x256)
  • Trained mostly on Western lab data - geographic bias
  • Long-horizon planning requires external prompt decomposition

Frequently asked questions

Start using Octo Base today

Get started with free credits. No credit card required. Access Octo Base and 100+ other models through a single API.