Vanishing Gradients: Why Training RNNs is Hard

1.922 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Vanishing Gradients: Why Training RNNs is Hard

Here, we run down how RNNs are trained via backpropagation through time, and see how this algorithm is plagued by the problems of vanishing and exploding gradients. We present an intuitive and mathematical picture by flying through the relevant calculus and linear algebra (so feel free to pause at certain bits!)

Timestamps
--------------------
00:00 Introduction
00:46 RNN refresher
03:42 Gradient calculation of W
06:50 Exploding and vanishing gradients
07:35 Linear algebra perspective
12:20 Solutions

Links
---------
- Papers on vanishing and exploding gradients
https://www.bioinf.jku.at/publications/older/2304.pdf
 https://ieeexplore.ieee.org/document/279181
https://arxiv.org/abs/1211.5063
- Long short-term memory paper: https://www.bioinf.jku.at/publications/older/2604.pdf
- RNN paper (Elman networks): https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1402_1					

Vanishing Gradients: Why Training RNNs is Hard

Nhạc Theo Chủ Đề

Liên kết website