How much does Octo Small cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of Octo Small?

Octo Small supports a unknown context window — enough for typical AI workloads.

How fast is Octo Small?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is Octo Small better than Gemini Robotics (2025)?

It depends on your use case. Octo Small (UC Berkeley) and Gemini Robotics (2025) (Google DeepMind) are both strong choices in vla / robotics. Compare them side-by-side at /compare/octo-small-vs-gemini-robotics-2025.

Does Octo Small support image input (vision)?

Yes — Octo Small accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: image, text.

Octo Small

Name: Octo Small
Brand: Custom
SKU: octo-small
Availability: InStock

UC Berkeley

VLA / Robotics

Compact 27M variant of Octo. Faster inference on consumer GPUs, designed for low-latency control.

Research-only model

Octo Small runs on physical robot hardware and is not exposed via the Railwail API yet.

Not API-accessible

Read the research

TL;DR·Last updated June 24, 2026

Octo Small is vla / robotics AI model from UC Berkeley, priced at €0.000 per 1M input tokens with a unknown context window.

Try Octo Small

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate Octo Small into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("octo-small", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("octo-small", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("octo-small", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

UC Berkeley

Deep dive — UC Berkeley / Stanford (Octo Model Team)'s Octo Small

About UC Berkeley / Stanford (Octo Model Team)

Founded 2023 · Berkeley & Stanford, California, USA

Octo-Small is the compact 27M-parameter variant of the Octo generalist robot policy released by the Octo Model Team - a UC Berkeley + Stanford-led collaboration (Sergey Levine and Chelsea Finn labs) with contributors from Google DeepMind, CMU and Toyota Research Institute. Octo-Small was released alongside Octo-Base in May 2024 to give researchers a CPU/edge-friendly option that still benefits from the same 800k Open-X-Embodiment pretraining recipe. It is widely used in academic teaching, robotics coursework, and rapid prototyping where the 93M Base model is too heavy for the available hardware.

Visit UC Berkeley / Stanford (Octo Model Team) →

Architecture

Compact transformer policy with diffusion action head (Vision-Language-Action)

Octo-Small is architecturally identical to Octo-Base but uses a smaller transformer trunk (~27M parameters total). Inputs are tokenised RGB observations from primary and wrist cameras and a T5-base-encoded language instruction, plus learnable readout tokens. The transformer fuses these tokens and emits action latents that are decoded by a diffusion head into continuous 7-DoF end-effector action chunks. It is pretrained on the same ~800k demonstrations from 25 Open-X-Embodiment-compatible datasets across 9 robot embodiments. Despite being ~3.4x smaller than Octo-Base, the small variant retains the diffusion-policy output and the embodiment-agnostic input adapters, making it suitable as a fast baseline and a starting point for fine-tuning to new robots on modest GPUs.

Parameters: 27M
Context: unknown

What it can do

Compact 27M generalist VLA policy
Same training recipe and dataset as Octo-Base
Continuous action chunks via diffusion head
Runs on a single consumer GPU and many edge devices
Fast fine-tuning on new robots / tasks
Natural-language instruction conditioning
Apache-2.0 open weights and code
Reproducible baseline for academic VLA work
Best for: edge robotics, teaching, fast iteration.

Training & License

~800,000 cross-embodiment robot demonstrations from 25 Open-X-Embodiment datasets (same corpus as Octo-Base). Trained on TPU hardware with the public Octo recipe.

License: Apache-2.0 - fully open weights, code, and recipes. Research and commercial use permitted under the licence.

Known limitations