A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets

13.305 Lượt nghe
A New Physics-Inspired Theory of Deep Learning | Optimal initialization of Neural Nets
A special video about recent exciting developments in mathematical deep learning! 🔥 Make sure to check out the video if you want a quick visual summary over contents of the “The principles of deep learning theory” book https://deeplearningtheory.com/. SPONSOR: Aleph Alpha 👉 https://app.aleph-alpha.com/ 17:38 ERRATUM: Boris Hanin reached out to us and made this point "I found the explanations to be crisp and concise, except for one point. Namely, I am pretty sure the description you give of why MLPs become linear models at infinite width is not quite correct. It is not true that they are equivalent to a random feature model in which features are the post-activations of the final hidden layer and that activations in previous layers don’t move. Instead, what happens is that the full vector of activations in each layer moves by an order 1 amount. However, while the Jacobian of the model output with respect to its parameters remains order 1 the Hessian goes to zero. Put another way, the whole neural network can be replaced by its linearization around the start of training. In the resulting linear model all parameters move to fit the data.". Check out our daily #MachineLearning Quiz Questions: https://www.youtube.com/c/AICoffeeBreak/community ➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ 📕 The book: Roberts, Daniel A., Sho Yaida, and Boris Hanin. The principles of deep learning theory. Cambridge University Press, 2022. https://arxiv.org/abs/2106.10165 MAGMA paper 📜: https://arxiv.org/abs/2112.05253 Outline: 00:00 The Principles of Deep Learning Theory (Book) 02:12 Neural networks and black boxes 05:35 Large-width limit 07:59 How to get the large-width limit and Forward propagation recap 13:11 Why we need non-Gaussianity 16:28 No wiring for infinite-width networks 17:13 No representation learning for infinite-width networks 19:31 Layer recursion 22:36 Experimental verification 24:09 The Renormalisation Group 26:08 Fixed points 28:45 Stability 31:15 Experimental verification (activation functions) 34:57 Outro and thanks 35:26 Sponsor: Aleph Alpha Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Don Rosenthal, Dres. Trost GbR, banana.dev -- Kyle Morris, Julián Salazar, Edvard Grødem, Vignesh Valliappan, Kevin Tsai, Mutual Information, Mike Ton ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔗 Links: AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ YouTube: https://www.youtube.com/AICoffeeBreak #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Music 🎵 : It's Only Worth It if You Work for It (Instrumental) - NEFFEX