LeRobot SmolVLA

Custom
VLA / Robotics

HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.

Research-only model
LeRobot SmolVLA runs on physical robot hardware and is not exposed via the Railwail API yet.
Not API-accessible
Read the research
TL;DRยทLast updated May 16, 2026

LeRobot SmolVLA is vla / robotics AI model from Custom, priced at โ‚ฌ0.000 per 1M input tokens with a unknown context window.

Try LeRobot SmolVLA

0.7

Direct API access coming soon

Pricing

Price per Generation
Per generationFree

API Integration

Use our OpenAI-compatible API to integrate LeRobot SmolVLA into your application.

Install
npm install railwail
JavaScript / TypeScript
import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple โ€” just pass a string
const reply = await rw.run("smolvla", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("smolvla", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("smolvla", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);
Specifications
Developer
Custom
Category
VLA / Robotics
Supported Formats
image
text
Tags
huggingface
lerobot
vla
robotics
research-only
open-weights
small
consumer-gpu

Deep dive โ€” Hugging Face (LeRobot team)'s LeRobot SmolVLA

About Hugging Face (LeRobot team)
Founded 2016 ยท New York, USA / Paris, France

SmolVLA is the flagship Vision-Language-Action model of Hugging Face's LeRobot project, an open-source robotics framework that brings the Transformers / Datasets philosophy to physical-AI research. SmolVLA was released in mid-2025 as a deliberately compact 450M-parameter VLA designed to be trainable and runnable on consumer hardware while still benefiting from community-scale pretraining. It is trained on 487 publicly contributed LeRobot community datasets - teleoperation episodes uploaded by hobbyists, university labs and small robotics companies - making it the first community-data-driven open VLA. The release includes pretraining and fine-tuning code, model checkpoints under Apache-2.0, and a tightly integrated stack with the LeRobot framework, hf-hub-hosted datasets, and the SO-100 / SO-ARM-100 low-cost robot arms.

Visit Hugging Face (LeRobot team) โ†’
Architecture
Compact Vision-Language-Action transformer (action-chunk regression)

SmolVLA is a 450M-parameter transformer that combines a SmolVLM-style vision-language encoder with an action expert that regresses continuous action chunks. The vision-language tower is initialised from the open SmolVLM family (compact VLMs released by Hugging Face) and is responsible for fusing multi-view RGB observations with the natural-language instruction; a smaller action-prediction head consumes the resulting tokens together with proprioception and outputs a short chunk of continuous joint or end-effector actions. The model is pretrained on 487 LeRobot-format community datasets, covering single-arm, dual-arm and mobile-base setups, with a strong tilt toward the popular SO-100 and Koch low-cost teleoperation arms. Post-pretraining, users fine-tune on their own LeRobot recording for a specific robot and task. The whole stack is designed to run pretraining on a few H100s and fine-tuning on a single consumer GPU.

Parameters
450M
Context
unknown
What it can do
  • Compact 450M open VLA pretrained on community data
  • Trained on 487 LeRobot community datasets
  • SmolVLM-style vision-language tower + action expert
  • Continuous action-chunk regression
  • Runs fine-tuning on a single consumer GPU
  • Tight integration with LeRobot framework on Hugging Face
  • Apache-2.0 licence on weights and code
  • Strong baseline for SO-100 and Koch low-cost arms
  • Best for: hobbyists, educators, low-cost robot research.
Training & License

487 publicly contributed LeRobot-format community datasets hosted on the Hugging Face Hub, dominated by teleoperation episodes from low-cost arms (SO-100, Koch) but also including dual-arm and mobile setups. Total scale on the order of millions of frames.

License: Apache-2.0 - fully open weights, code, and datasets (where contributors used compatible licences). Designed for both research and commercial use.

Known limitations
  • Modest scale - underperforms 7B VLAs on hard tasks
  • Dataset skew toward SO-100 / Koch low-cost arms
  • Limited language reasoning vs LLM-backed VLAs
  • Sensor coverage is mostly single RGB camera setups
  • Community data quality varies
  • Long-horizon behaviour limited without prompt decomposition

Frequently asked questions

Start using LeRobot SmolVLA today

Get started with free credits. No credit card required. Access LeRobot SmolVLA and 100+ other models through a single API.