Make language models do what you want!
Resources:
Miro Board: https://miro.com/app/board/uXjVLLDU3as=/?share_link_id=110094813997
Maxime Labonne’s ORPO Fine Tuning Guide: https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html
DPO Paper: https://arxiv.org/pdf/2305.18290
ORPO Paper: https://arxiv.org/pdf/2403.07691
Colab Notebook: https://colab.research.google.com/drive/1KV9AFAfhQCSjF8Ej4rI2ejDmx5AUnqHq?usp=sharing
Model Trained: https://huggingface.co/AdamLucek/Orpo-Llama-3.2-1B-15k
Great Blog on DPO: https://medium.com/@joaolages/direct-preference-optimization-dpo-622fc1f18707
Chapters:
00:00 - Intro
00:27 - LLM Lifecycle Overview
04:03 - Supervised Fine Tuning
07:44 - Reinforcement Learning from Human Feedback
11:18 - Direct Preference Optimization
13:49 - Odds Ratio Preference Alignment
17:09 - Applying ORPO to Train Llama-3.2-1B
24:23 - Closing Thoughts
#ai #coding #datascience