Jeremy Bernstein - Depths of First Order Optimization

Jeremy Bernstein - Depths of First Order Optimization

618 Lượt nghe
Jeremy Bernstein - Depths of First Order Optimization
Deep learning optimizers are often motivated through a mix of convex and approximate second-order theory. In this talk, I will argue that to build faster and more scalable training methods, we need to develop a deeper understanding of basic first-order optimization. I will begin by surveying popular theoretical approaches to optimization such as natural gradient descent, mirror descent and the Gauss-Newton method, with a focus on the assumptions and limitations of each approach. Next, I will argue that norm-based steepest descent---a first-order theory---overcomes many of these limitations. For the right choice of norm, I will show that we can directly obtain the benefits of two successful but poorly understood methods called Shampoo and muP. These ideas contributed to the proposal and development of the Muon optimizer, which has set speed records for training NanoGPT. I will conclude by introducing the modular norm---a means of systematically assigning a norm to any neural network as a function of the network architecture---as well as discussing opportunities for further progress. Jeremy Bernstein is a postdoc in CSAIL at MIT advised by Phillip Isola. His goal is to uncover the computational and statistical laws of natural and artificial intelligence, and thereby design learning systems that are more efficient, more automatic and more useful in practice. This session is brought to you by the Cohere Labs Open Science Community - a space where ML researchers, engineers, linguists, social scientists, and lifelong learners connect and collaborate with each other. We'd like to extend a special thank you to Anier Velasco Sotomayor, Thang Chu, and Andrej Jovanović, Leads of our ML Theory group for their dedication in organizing this event. If you’re interested in sharing your work, we welcome you to join us! Simply fill out the form at https://forms.gle/ALND9i6KouEEpCnz6 to express your interest in becoming a speaker. Join the Cohere Labs Open Science Community to see a full list of upcoming events (https://tinyurl.com/CohereLabsCommunityApp).