Curious how a 1.5B parameter model can solve maths problems better than far larger models? In this video, I demonstrate how DeepSeek R1 leverages lengthy chains of thought to enhance its mathematical reasoning. We take a close look at how DeepSeek R1 prompts are structured and generated according to the R1 paper—then reproduce these chain of thought prompts via the DeepSeek R1 coldstart method and my own maths compiler to create synthetic training data.
I then walk through the entire fine-tuning process, step by step, showing how even a relatively modest model can outperform bulkier rivals using DeepSeek R1’s coldstart technique. If you’re fascinated by AI breakthroughs or simply enjoy seeing a thorough training pipeline, this detailed behind-the-scenes session is for you.
github repo for math compiler: https://github.com/chrishayuk/chuk-math
github repo for verifiers: https://github.com/chrishayuk/verifiers
00:00 - intro
01:10 - DeepSeek R1 Chat
03:35 - DeepSeek R1 Ollama
04:44 - Think Tags
05:04 - Deep Seek R1 paper
13:45 - Generating synthetic long chains of thought
15:25 - Translating the CoT to natural language
18:40 - Self Reflection and Self Correction
22:50 - Generating sample data
30:06 - Testing the Qwen2.5-1.5B
30:52 - Fine Tuning Qwen2.5-1.5B with our Coldstart data
34:52 - Chatting with our Fine Tuned Model
39:55 - Conclusion