Make AI Think Like YOU: A Guide to LLM Alignment

Make AI Think Like YOU: A Guide to LLM Alignment

1.282 Lượt nghe
Make AI Think Like YOU: A Guide to LLM Alignment
Make language models do what you want! Resources: Miro Board: https://miro.com/app/board/uXjVLLDU3as=/?share_link_id=110094813997 Maxime Labonne’s ORPO Fine Tuning Guide: https://mlabonne.github.io/blog/posts/2024-04-19_Fine_tune_Llama_3_with_ORPO.html DPO Paper: https://arxiv.org/pdf/2305.18290 ORPO Paper: https://arxiv.org/pdf/2403.07691 Colab Notebook: https://colab.research.google.com/drive/1KV9AFAfhQCSjF8Ej4rI2ejDmx5AUnqHq?usp=sharing Model Trained: https://huggingface.co/AdamLucek/Orpo-Llama-3.2-1B-15k Great Blog on DPO: https://medium.com/@joaolages/direct-preference-optimization-dpo-622fc1f18707 Chapters: 00:00 - Intro 00:27 - LLM Lifecycle Overview 04:03 - Supervised Fine Tuning 07:44 - Reinforcement Learning from Human Feedback 11:18 - Direct Preference Optimization 13:49 - Odds Ratio Preference Alignment 17:09 - Applying ORPO to Train Llama-3.2-1B 24:23 - Closing Thoughts #ai #coding #datascience