Stanford CS25: V5 I On the Biology of a Large Language Model, Josh Batson of Anthropic
May 13, 2025
Large language models do many things, and it's not clear from black-box interactions how they do them. We will discuss recent progress in mechanistic interpretability, an approach to understanding models based on decomposing them into pieces, understanding the role of the pieces, and then understanding behaviors based on how those pieces fit together.
We will focus on the methods and findings of On the Biology of a Large Language Model, with some additional excursions and speculations. We hope to shed light on important behaviors like hallucination, planning, reasoning, (un)faithfulness, and emergent capabilities, and close with some suggestions for further research.
Speaker: Joshua Batson leads the circuits effort of the Anthropic mechanistic interpretability team. Before Anthropic, he worked on viral genomics and computational microscopy at the Chan Zuckerberg Biohub. His academic training is in pure mathematics.
More about the course can be found here: https://web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM