Physical Intelligence π-0
Physical Intelligence's flagship VLA flow-matching policy. Generalist robot control, pretrained on 10k+ hrs robot data.
Physical Intelligence π-0 is vla / robotics AI model from Physical Intelligence, priced at €0.000 per 1M input tokens with a unknown context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Physical Intelligence π-0 into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple — just pass a string
const reply = await rw.run("pi-0-pi", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("pi-0-pi", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("pi-0-pi", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive — Physical Intelligence (PI)'s Physical Intelligence π-0
Physical Intelligence (PI) is a San Francisco robot-foundation-model startup founded in 2024 by Sergey Levine, Chelsea Finn, Karol Hausman, Brian Ichter, Suraj Nair, Lachy Groom and others, with the explicit mission of building general-purpose foundation models for robots. The company raised $400M in early 2025 (led by Jeff Bezos, Thrive Capital, Lux Capital and others) at a multi-billion-dollar valuation. π-0 ('pi-zero') is PI's flagship VLA model, announced in October 2024 as a generalist robot policy that can drive many different robots through dexterous manipulation tasks - folding laundry, bussing tables, packing boxes - using a single set of weights and natural-language instructions. The model is published with a public technical report, and a subset of weights and code are released under the open-source 'openpi' GitHub repository to support reproducible research.
Visit Physical Intelligence (PI) →π-0 is a Vision-Language-Action model built on a 3B-parameter PaliGemma multimodal backbone (Gemma LLM + SigLIP vision tower), with a dedicated 'action expert' transformer head trained as a flow-matching policy. Inputs are multi-view RGB images, robot proprioception, and a natural-language instruction. The action expert outputs continuous action chunks at high frequency via flow matching - a recently popular alternative to diffusion that produces smoother trajectories with fewer integration steps. π-0 is pretrained on a large dataset of ~10,000 hours of multi-embodiment robot teleoperation collected by PI and partners, plus Open-X-Embodiment data and offline simulation. Post-training fine-tuning specialises the policy for individual robots and tasks. The model handles cross-embodiment action spaces (Aloha bimanual, Franka, mobile manipulators, humanoids) through learned action-space adapters.
- Parameters
- ~3B (PaliGemma backbone + action expert)
- Context
- unknown
- Generalist VLA driving multiple robot embodiments
- Flow-matching action head producing smooth continuous actions
- 3B PaliGemma backbone (Gemma LLM + SigLIP vision)
- Natural-language instruction conditioning
- Long-horizon dexterous manipulation (laundry folding, bussing tables)
- Multi-view + proprioception inputs
- Reactive closed-loop control at high frequency
- Partially open-sourced via the openpi GitHub repo
- Best for: dexterous manipulation research, multi-task generalist policies.
Approximately 10,000 hours of curated teleoperation data on a fleet of Physical Intelligence robots, augmented with Open-X-Embodiment cross-embodiment data and offline simulation. Multi-view RGB + proprioception + natural-language instructions.
License: Partially open-source - core π-0 model weights and inference code released via the openpi GitHub repository under permissive terms for research; commercial deployment is governed by Physical Intelligence's own licence and partnerships.
Known limitations
- Generalisation outside trained tasks still limited
- Long-horizon tasks need careful prompt structure
- High-frequency action chunks require capable on-board compute
- Open-source release lags internal latest checkpoint
- Embodiment-adapters needed for very different robots
- Documentation skews toward research rather than production
Frequently asked questions
Related Models
View all VLA / RoboticsGemini Robotics (2025)
Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.
Gemini Robotics-ER
Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.
Google RT-2-X
Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.
LeRobot SmolVLA
HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.
Start using Physical Intelligence π-0 today
Get started with free credits. No credit card required. Access Physical Intelligence π-0 and 100+ other models through a single API.