Octo Small

UC Berkeley
VLA / Robotics

Compact 27M variant of Octo. Faster inference on consumer GPUs, designed for low-latency control.

Research-only model
Octo Small runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DRยทLast updated May 16, 2026

Octo Small is vla / robotics AI model from UC Berkeley, priced at โ‚ฌ0.000 per 1M input tokens with a unknown context window.

Try Octo Small

0.7

Direct API access coming soon

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Octo Small into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple โ€” just pass a string
const reply = await rw.run("octo-small", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("octo-small", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("octo-small", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
UC Berkeley
Category
VLA / Robotics
Supported Formats
image
text
Tags
berkeley
vla
robotics
research-only
open-weights
small
consumer-gpu

Deep dive โ€” UC Berkeley / Stanford (Octo Model Team)'s Octo Small

About UC Berkeley / Stanford (Octo Model Team)
Founded 2023 ยท Berkeley & Stanford, California, USA

Octo-Small is the compact 27M-parameter variant of the Octo generalist robot policy released by the Octo Model Team - a UC Berkeley + Stanford-led collaboration (Sergey Levine and Chelsea Finn labs) with contributors from Google DeepMind, CMU and Toyota Research Institute. Octo-Small was released alongside Octo-Base in May 2024 to give researchers a CPU/edge-friendly option that still benefits from the same 800k Open-X-Embodiment pretraining recipe. It is widely used in academic teaching, robotics coursework, and rapid prototyping where the 93M Base model is too heavy for the available hardware.

Visit UC Berkeley / Stanford (Octo Model Team) โ†’
Architecture
Compact transformer policy with diffusion action head (Vision-Language-Action)

Octo-Small is architecturally identical to Octo-Base but uses a smaller transformer trunk (~27M parameters total). Inputs are tokenised RGB observations from primary and wrist cameras and a T5-base-encoded language instruction, plus learnable readout tokens. The transformer fuses these tokens and emits action latents that are decoded by a diffusion head into continuous 7-DoF end-effector action chunks. It is pretrained on the same ~800k demonstrations from 25 Open-X-Embodiment-compatible datasets across 9 robot embodiments. Despite being ~3.4x smaller than Octo-Base, the small variant retains the diffusion-policy output and the embodiment-agnostic input adapters, making it suitable as a fast baseline and a starting point for fine-tuning to new robots on modest GPUs.

Parameters
27M
Context
unknown
What it can do
  • Compact 27M generalist VLA policy
  • Same training recipe and dataset as Octo-Base
  • Continuous action chunks via diffusion head
  • Runs on a single consumer GPU and many edge devices
  • Fast fine-tuning on new robots / tasks
  • Natural-language instruction conditioning
  • Apache-2.0 open weights and code
  • Reproducible baseline for academic VLA work
  • Best for: edge robotics, teaching, fast iteration.
Training & License

~800,000 cross-embodiment robot demonstrations from 25 Open-X-Embodiment datasets (same corpus as Octo-Base). Trained on TPU hardware with the public Octo recipe.

License: Apache-2.0 - fully open weights, code, and recipes. Research and commercial use permitted under the licence.

Known limitations
  • Lower accuracy than Octo-Base and modern 7B VLAs
  • Limited capacity for long-horizon language reasoning
  • Trained at low image resolution (256x256)
  • Few-shot transfer to drastically new robots still requires demos
  • Single text encoder (T5) limits prompt richness
  • No native multi-modal sensors beyond RGB

Frequently asked questions

Start using Octo Small today

Get started with free credits. No credit card required. Access Octo Small and 100+ other models through a single API.