Here, we run down how RNNs are trained via backpropagation through time, and see how this algorithm is plagued by the problems of vanishing and exploding gradients. We present an intuitive and mathematical picture by flying through the relevant calculus and linear algebra (so feel free to pause at certain bits!)
Timestamps
--------------------
00:00 Introduction
00:46 RNN refresher
03:42 Gradient calculation of W
06:50 Exploding and vanishing gradients
07:35 Linear algebra perspective
12:20 Solutions
Links
---------
- Papers on vanishing and exploding gradients
https://www.bioinf.jku.at/publications/older/2304.pdf
https://ieeexplore.ieee.org/document/279181
https://arxiv.org/abs/1211.5063
- Long short-term memory paper: https://www.bioinf.jku.at/publications/older/2604.pdf
- RNN paper (Elman networks): https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1402_1