Physical Intelligence Pi-0-FAST
Autoregressive ฯ-0 variant using FAST action tokenizer. Faster inference at competitive task success.
Physical Intelligence Pi-0-FAST is vla / robotics AI model from Physical Intelligence, priced at โฌ0.000 per 1M input tokens with a unknown context window.
0.7
Pricing
API Integration
Use our OpenAI-compatible API to integrate Physical Intelligence Pi-0-FAST into your application.
npm install railwailimport railwail from "railwail";
const rw = railwail("YOUR_API_KEY");
// Simple โ just pass a string
const reply = await rw.run("pi-0-fast", "Hello! What can you do?");
console.log(reply);
// With message history
const reply2 = await rw.run("pi-0-fast", [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Explain quantum computing simply." },
]);
console.log(reply2);
// Full response with usage info
const res = await rw.chat("pi-0-fast", [
{ role: "user", content: "Hello!" },
], { temperature: 0.7, max_tokens: 500 });
console.log(res.choices[0].message.content);
console.log(res.usage);Deep dive โ Physical Intelligence (PI)'s Physical Intelligence Pi-0-FAST
Physical Intelligence (PI) was founded in 2024 in San Francisco by Sergey Levine, Chelsea Finn, Karol Hausman and other co-founders, with a mission to build foundation models for general-purpose robots. ฯ-0-FAST, released in early 2025, is the autoregressive variant of the ฯ-0 VLA introduced together with the FAST action tokenizer. FAST (Frequency-space Action Sequence Tokenization) uses Discrete Cosine Transform compression to encode entire action chunks as a small number of discrete tokens, allowing a standard autoregressive VLM head to play the role of a robot policy without diffusion or flow-matching sampling. PI publishes ฯ-0-FAST checkpoints and tokenizer code via the openpi GitHub repository alongside the flow-matching ฯ-0 family, giving researchers both autoregressive and flow-matching policy baselines from the same backbone and data.
Visit Physical Intelligence (PI) โฯ-0-FAST keeps the PaliGemma 3B backbone (Gemma LLM + SigLIP vision tower) from ฯ-0 but replaces the flow-matching action expert with an autoregressive decoder that emits actions as FAST tokens. FAST encodes a chunk of continuous actions by applying a Discrete Cosine Transform along the time axis and quantising the resulting frequency coefficients, yielding a compact discrete representation that captures both fast and slow motion components efficiently. The VLM is then trained with a standard next-token objective to predict these action tokens given image observations, proprioception and a natural-language instruction. This makes ฯ-0-FAST architecturally similar to OpenVLA / RT-2-X (token-output VLA) but with a much more sample-efficient action codebook. Reported results show ฯ-0-FAST matching or outperforming the flow-matching ฯ-0 on many benchmarks while simplifying inference to a single autoregressive forward pass per action chunk.
- Parameters
- ~3B (PaliGemma backbone with FAST action head)
- Context
- unknown
- Autoregressive VLA variant of ฯ-0 using FAST action tokens
- FAST tokenizer compresses action chunks via DCT
- Single set of weights for many robot embodiments
- Same PaliGemma 3B backbone as ฯ-0 and ฯ-0.5
- Matches or exceeds flow-matching ฯ-0 on key benchmarks
- Easier integration with standard LLM serving stacks
- Open-source code and weights via openpi repository
- Compatible with existing autoregressive fine-tuning recipes
- Best for: research on autoregressive VLAs and action tokenisation.
Trained on the same multi-embodiment teleoperation corpus used for ฯ-0 (~10,000+ hours), plus Open-X-Embodiment data, with the FAST tokenizer providing the discrete action target instead of continuous flow-matching trajectories.
License: Partially open-source via the openpi GitHub repository; research weights and FAST tokenizer published, commercial deployment governed by Physical Intelligence directly.
Known limitations
- Discrete action tokens can quantise away very fine motion detail
- Long action chunks still bottlenecked by autoregressive decoding
- Generalisation outside training distribution still limited
- Requires the FAST tokenizer for new action spaces
- Open-source release trails internal newest checkpoint
- Documentation primarily targets researchers
Frequently asked questions
Related Models
View all VLA / RoboticsGemini Robotics (2025)
Google DeepMind's vision-language-action model based on Gemini 2.0. Generalist robot policy with strong dexterity.
Gemini Robotics-ER
Embodied-reasoning variant of Gemini Robotics. Enhanced 3D spatial reasoning and trajectory planning.
Google RT-2-X
Google's VLA from RT-X collaboration. Trained on Open-X-Embodiment (22 robots, 527 skills), positive transfer.
LeRobot SmolVLA
HuggingFace's 450M VLA pretrained on 487 community LeRobot datasets. Runs on consumer GPUs.
Start using Physical Intelligence Pi-0-FAST today
Get started with free credits. No credit card required. Access Physical Intelligence Pi-0-FAST and 100+ other models through a single API.