Lecture 05 • Reinforcement Learning for Language Models
This is the fifth lecture in the Language Models and Intelligent Agentic Systems course, run by Meridian Cambridge in collaboration with the Cambridge Centre for Data Driven Discovery (C2D3).
This lecture covers the basics of reinforcement leanring for LLM. During reinforcement learning (RL), a reward function is used to specify desired behaviour from the LLM. The LLM is then trained to generate responses to prompts which achieve high rewards. Here, we cover a basic algorithm for performing RL on LLMs, REINFORCE. We discuss the relationship between RL for language models and RL in classical setitings. Finally, we cover chain-of-thought prompting and how to train reasoning abilities within LLMs.
The slides for the lecture can be found here: https://tinyurl.com/LMaIAS
Meridian's Website: https://www.meridiancambridge.org/
Edward's channel: @EdwardJamesYoung
Meridian's course webpage: https://www.meridiancambridge.org/language-models-course
C2D3's course webpage: https://www.c2d3.cam.ac.uk/events/LLM_series_2025