Lecture 05 • Reinforcement Learning for Language Models

51 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Lecture 05 • Reinforcement Learning for Language Models

This is the fifth lecture in the Language Models and Intelligent Agentic Systems course, run by Meridian Cambridge in collaboration with the Cambridge Centre for Data Driven Discovery (C2D3). 

This lecture covers the basics of reinforcement leanring for LLM. During reinforcement learning (RL), a reward function is used to specify desired behaviour from the LLM. The LLM is then trained to generate responses to prompts which achieve high rewards. Here, we cover a basic algorithm for performing RL on LLMs, REINFORCE. We discuss the relationship between RL for language models and RL in classical setitings. Finally, we cover chain-of-thought prompting and how to train reasoning abilities within LLMs. 

The slides for the lecture can be found here: https://tinyurl.com/LMaIAS

Meridian's Website: https://www.meridiancambridge.org/
Edward's channel:  @EdwardJamesYoung 
Meridian's course webpage: https://www.meridiancambridge.org/language-models-course
C2D3's course webpage: https://www.c2d3.cam.ac.uk/events/LLM_series_2025					

Lecture 05 • Reinforcement Learning for Language Models

Nhạc Theo Chủ Đề

Liên kết website