I Trained an LLM to Think Deeper (Here's How)

I Trained an LLM to Think Deeper (Here's How)

8.676 Lượt nghe
I Trained an LLM to Think Deeper (Here's How)
Turns out reinforcement learning is all you need Check out my prior video on RL: https://youtu.be/qTY4Rr-x5q0?si=pgTpw9r9xwkuZJM6 Resources: Code: https://github.com/ALucek/GRPO-Training/tree/main Model: https://huggingface.co/AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K DeepSeek-R1 Paper: https://arxiv.org/pdf/2501.12948 DeepSeek Math Paper: https://arxiv.org/pdf/2402.03300 Unsloth Reasoning Blog: https://unsloth.ai/blog/r1-reasoning Willccbb’s GRPO Demo: https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb Chapters: 00:00 - LLM Reasoning 01:44 - PPO Context 05:07 - GRPO Algorithm 07:24 - DeepSeek-R1-Zero Training 10:41 - DeepSeek-R1 Training 14:41 - Training: Model Loading 19:17 - Training: Dataset Prep 21:24 - Training: Reward Functions 23:11 - Training: GRPO Trainer 24:05 - Training: Outcome and Inference #ai #datascience #programming