s1: A High-Performance Reasoning Model Trained for Under $50 [Niklas Muennighoff] - 721

The TWIML AI Podcast with Sam Charrington

2.440 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

s1: A High-Performance Reasoning Model Trained for Under $50 [Niklas Muennighoff] - 721

Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions.

🎧 / 🎥 Listen or watch the full episode on our page: https://twimlai.com/go/721.

🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1 


🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Follow us on LinkedIn: https://www.linkedin.com/company/twimlai/
Join our Slack Community: https://twimlai.com/community/ 
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/


📖 CHAPTERS
===============================
00:00 - Introduction
1:56 - S1 and o1 models
2:42 - Approaches to test time scaling
6:45 - Comparison of S1 and R1 models with o1 model
9:19 - Dataset curation
16:53 - Metrics
18:14 - Budget forcing
23:51 - “Wait” insertion
29:06 - Decontaminating samples in datasets
30:12 - Rejection sampling
32:05 - Open-sourcing S1
33:03 - Other model families
35:20 - Biases in model families
35:49 - Evaluation
36:56 - RL versus SFT
39:12 - RL in R1
40:04 - RL in training recipe
46:12 - Future directions


🔗 LINKS & RESOURCES
===============================
s1: Simple test-time scaling - https://arxiv.org/abs/2501.19393
s1.1-32B - https://huggingface.co/simplescaling/s1.1-32B


📸 Camera: https://amzn.to/3TQ3zsg 
🎙️Microphone: https://amzn.to/3t5zXeV 
🚦Lights: https://amzn.to/3TQlX49 
🎛️ Audio Interface: https://amzn.to/3TVFAIq 
🎚️ Stream Deck: https://amzn.to/3zzm7F5					

s1: A High-Performance Reasoning Model Trained for Under $50 [Niklas Muennighoff] - 721

Nhạc Theo Chủ Đề

Liên kết website

s1: A High-Performance Reasoning Model Trained for Under $50 [Niklas Muennighoff] - 721

Những bài liên quan

Chưa có bài liên quan nào!