DeepSeek R1 Theory Overview | GRPO + RL + SFT

78.030 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

DeepSeek R1 Theory Overview | GRPO + RL + SFT

Here's an overview of the DeepSeek R1 paper. I read the paper this week and I was fascinated by the methods, however it was a bit difficult to follow what was going on with all the models being used.

I found a neat map of the methodology which I'll be using in this tutorial to walk you through the paper.

I strongly recommend you to still read the paper over here:
📌 PAPER: https://arxiv.org/pdf/2501.12948

and also to check out these other two video for the GRPO bit:
📌 https://www.youtube.com/watch?v=XMnxKGVnEUc&ab_channel=UmarJamil
📌 https://www.youtube.com/watch?v=bAWV_yrqx4w&ab_channel=YannicKilcher

btw map I'm using is over here:
https://www.reddit.com/r/LocalLLaMA/comments/1i66j4f/deepseekr1_training_pipeline_visualized/

Table of content
- Introduction: 0:00
- DeepSeek R1-zero path: 2:23
- Reinforcement learning setup: 3:59
- Group Relative Policy Optimization (GRPO): 7:03
- DeepSeek R1-zero result: 11:40
- Cold start supervised fine-tuning: 15:30
- Consistency reward for CoT: 16:19
- Supervised Fine tuning data generation: 17:17
- Reinforcement learning with neural reward model: 19:47
- Distillation: 21:26
- Conclusion: 24:34

----
Join the newsletter for weekly AI content: https://yacinemahdid.com
Join the Discord for general discussion:  https://discord.gg/QpkxRbQBpf

----
Follow Me Online Here:
GitHub: https://github.com/yacineMahdid
LinkedIn: https://www.linkedin.com/in/yacinemahdid/
___

Have a great week! 👋					

DeepSeek R1 Theory Overview | GRPO + RL + SFT

Nhạc Theo Chủ Đề

Liên kết website

DeepSeek R1 Theory Overview | GRPO + RL + SFT

Những bài liên quan

Chưa có bài liên quan nào!