I Trained an LLM to Think Deeper (Here's How)

9.492 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

I Trained an LLM to Think Deeper (Here's How)

Turns out reinforcement learning is all you need

Check out my prior video on RL: https://youtu.be/qTY4Rr-x5q0?si=pgTpw9r9xwkuZJM6

Resources:
Code: https://github.com/ALucek/GRPO-Training/tree/main
Model: https://huggingface.co/AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K
DeepSeek-R1 Paper: https://arxiv.org/pdf/2501.12948
DeepSeek Math Paper: https://arxiv.org/pdf/2402.03300
Unsloth Reasoning Blog: https://unsloth.ai/blog/r1-reasoning
Willccbb’s GRPO Demo: https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

Chapters:
00:00 - LLM Reasoning
01:44 - PPO Context
05:07 - GRPO Algorithm
07:24 - DeepSeek-R1-Zero Training
10:41 - DeepSeek-R1 Training
14:41 - Training: Model Loading
19:17 - Training: Dataset Prep
21:24 - Training: Reward Functions
23:11 - Training: GRPO Trainer 
24:05 - Training: Outcome and Inference

#ai #datascience #programming					

I Trained an LLM to Think Deeper (Here's How)

Nhạc Theo Chủ Đề

Liên kết website

I Trained an LLM to Think Deeper (Here's How)

Những bài liên quan

Chưa có bài liên quan nào!