Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team’s new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research.
🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/719.
🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/
📖 CHAPTERS
===============================
00:00 - Introduction
2:14 - Physical Intelligence
3:47 - Key challenges in robotic learning
6:13 - Reinforcement learning in π0 and robotic foundation models
8:36 - π0 VLM model architecture
15:33 - π0 model recipe
18:39 - Pre-training dataset
22:47 - Post-training
24:23 - Laundry folding demo
31:32 - Scaling laws on π0 model
34:57 - FAST
40:26 - Open sourcing π0
43:37 - Other robot types
46:27 - Future directions
🔗 LINKS & RESOURCES
===============================
π0: Our First Generalist Policy - https://www.physicalintelligence.company/blog/pi0
Open Sourcing π0 - https://www.physicalintelligence.company/blog/openpi
FAST: Efficient Robot Action Tokenization - https://www.physicalintelligence.company/research/fast
FAST: Efficient Action Tokenization for Vision-Language-Action Models - https://arxiv.org/abs/2501.09747
📸 Camera: https://amzn.to/3TQ3zsg
🎙️Microphone: https://amzn.to/3t5zXeV
🚦Lights: https://amzn.to/3TQlX49
🎛️ Audio Interface: https://amzn.to/3TVFAIq
🎚️ Stream Deck: https://amzn.to/3zzm7F5