How to Train LLMs to "Think" (o1 & DeepSeek-R1)

14.722 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

30 AI Projects You Can Build This Weekend: https://the-data-entrepreneurs.kit.com/30-ai-projects

Here, I discuss the technical details behind the recent “advanced reasoning” models trained on large-scale reinforcement learning i.e. o1 and DeepSeek-R1.

📰 Read more: https://shawhin.medium.com/how-to-train-llms-to-think-like-o1-deepseek-r1-eabc21c8842d?source=friends_link&sk=ec3e7ca77cd47f76ce38015c87ba5084

References
[1] https://openai.com/index/learning-to-reason-with-llms/
[2] arXiv:2501.12948 [cs.CL]
[3] https://youtu.be/7xTGNNLPyMI
[4] https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
[5] https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf

Intro - 0:00
OpenAI's o1 - 0:33
Test-time Compute - 1:33
"Thinking" Tokens - 3:50
DeepSeek Paper - 5:58
Reinforcement Learning - 7:22
R1-Zero: Prompt Template - 9:28
R1-Zero: Reward - 10:53
R1-Zero: GRPO (technical) - 12:53
R1-Zero: Results - 20:00
DeepSeek R1 - 23:32
Step 1: SFT with CoT - 24:47
Step 2: R1-Zero Style RL - 26:14
Step 3: SFT with Mixed Data - 27:03
Step 4: RL & RLHF - 28:26
Accessing DeepSeek Models - 29:18
Conclusions - 30:10

Homepage: https://www.shawhintalebi.com/					

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

Nhạc Theo Chủ Đề

Liên kết website