How much does LeRobot SmolVLA cost via Railwail?

No monthly minimum, no subscription. Start with €5 free credits.

What is the context window of LeRobot SmolVLA?

LeRobot SmolVLA supports a unknown context window — enough for typical AI workloads.

How fast is LeRobot SmolVLA?

Latency depends on prompt length and load — typically 200ms to 2s for short prompts. We measure p50/p95 in real-time on /rankings.

Is LeRobot SmolVLA better than Gemini Robotics (2025)?

It depends on your use case. LeRobot SmolVLA (Custom) and Gemini Robotics (2025) (Google DeepMind) are both strong choices in vla / robotics. Compare them side-by-side at /compare/smolvla-vs-gemini-robotics-2025.

Does LeRobot SmolVLA support image input (vision)?

Yes — LeRobot SmolVLA accepts image inputs in addition to text. Send images via the standard OpenAI-compatible `messages` array with `image_url` content blocks. Supported formats: image, text.

LeRobot SmolVLA

Name: LeRobot SmolVLA
Brand: Custom
SKU: smolvla
Availability: InStock

Custom

VLA / Robotics

HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.

Research-only model

LeRobot SmolVLA runs on physical robot hardware and is not exposed via the Railwail API yet.

Not API-accessible

Read the research

TL;DR·Last updated June 24, 2026

LeRobot SmolVLA is vla / robotics AI model from Custom, priced at €0.000 per 1M input tokens with a unknown context window.

Try LeRobot SmolVLA

System Prompt

Message

Temperature

0.7

Max Tokens

Direct API access coming soon

Pricing

Price per Generation

Per generationFree

API Integration

Use our OpenAI-compatible API to integrate LeRobot SmolVLA into your application.

Install

npm install railwail

JavaScript / TypeScript

import railwail from "railwail";

const rw = railwail("YOUR_API_KEY");

// Simple — just pass a string
const reply = await rw.run("smolvla", "Hello! What can you do?");
console.log(reply);

// With message history
const reply2 = await rw.run("smolvla", [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);

// Full response with usage info
const res = await rw.chat("smolvla", [
  { role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);

Specifications

Developer

Custom

Deep dive — Hugging Face (LeRobot team)'s LeRobot SmolVLA

About Hugging Face (LeRobot team)

Founded 2016 · New York, USA / Paris, France

SmolVLA is the flagship Vision-Language-Action model of Hugging Face's LeRobot project, an open-source robotics framework that brings the Transformers / Datasets philosophy to physical-AI research. SmolVLA was released in mid-2025 as a deliberately compact 450M-parameter VLA designed to be trainable and runnable on consumer hardware while still benefiting from community-scale pretraining. It is trained on 487 publicly contributed LeRobot community datasets - teleoperation episodes uploaded by hobbyists, university labs and small robotics companies - making it the first community-data-driven open VLA. The release includes pretraining and fine-tuning code, model checkpoints under Apache-2.0, and a tightly integrated stack with the LeRobot framework, hf-hub-hosted datasets, and the SO-100 / SO-ARM-100 low-cost robot arms.

Visit Hugging Face (LeRobot team) →

Architecture

Compact Vision-Language-Action transformer (action-chunk regression)

SmolVLA is a 450M-parameter transformer that combines a SmolVLM-style vision-language encoder with an action expert that regresses continuous action chunks. The vision-language tower is initialised from the open SmolVLM family (compact VLMs released by Hugging Face) and is responsible for fusing multi-view RGB observations with the natural-language instruction; a smaller action-prediction head consumes the resulting tokens together with proprioception and outputs a short chunk of continuous joint or end-effector actions. The model is pretrained on 487 LeRobot-format community datasets, covering single-arm, dual-arm and mobile-base setups, with a strong tilt toward the popular SO-100 and Koch low-cost teleoperation arms. Post-pretraining, users fine-tune on their own LeRobot recording for a specific robot and task. The whole stack is designed to run pretraining on a few H100s and fine-tuning on a single consumer GPU.

Parameters: 450M
Context: unknown

What it can do

Compact 450M open VLA pretrained on community data
Trained on 487 LeRobot community datasets
SmolVLM-style vision-language tower + action expert
Continuous action-chunk regression
Runs fine-tuning on a single consumer GPU
Tight integration with LeRobot framework on Hugging Face
Apache-2.0 licence on weights and code
Strong baseline for SO-100 and Koch low-cost arms
Best for: hobbyists, educators, low-cost robot research.

Training & License

487 publicly contributed LeRobot-format community datasets hosted on the Hugging Face Hub, dominated by teleoperation episodes from low-cost arms (SO-100, Koch) but also including dual-arm and mobile setups. Total scale on the order of millions of frames.

License: Apache-2.0 - fully open weights, code, and datasets (where contributors used compatible licences). Designed for both research and commercial use.

Known limitations