Vanishing Gradients: Why Training RNNs is Hard

Vanishing Gradients: Why Training RNNs is Hard

1.922 Lượt nghe
Vanishing Gradients: Why Training RNNs is Hard
Here, we run down how RNNs are trained via backpropagation through time, and see how this algorithm is plagued by the problems of vanishing and exploding gradients. We present an intuitive and mathematical picture by flying through the relevant calculus and linear algebra (so feel free to pause at certain bits!) Timestamps -------------------- 00:00 Introduction 00:46 RNN refresher 03:42 Gradient calculation of W 06:50 Exploding and vanishing gradients 07:35 Linear algebra perspective 12:20 Solutions Links --------- - Papers on vanishing and exploding gradients https://www.bioinf.jku.at/publications/older/2304.pdf https://ieeexplore.ieee.org/document/279181 https://arxiv.org/abs/1211.5063 - Long short-term memory paper: https://www.bioinf.jku.at/publications/older/2604.pdf - RNN paper (Elman networks): https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1402_1