When to add and when to concatenate positional embeddings? What are arguments for learning positional encodings? When to hand-craft them? Ms. Coffee Bean’s answers these questions in this video.
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
Outline:
00:00 Concatenated vs. added positional embeddings
04:49 Learned positional embeddings
06:48 Ms. Coffee Bean deepest insight ever
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, help us boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Positional embeddings explained:
https://youtu.be/1biZfFLPRSY
📺 Fourier Transform instead of attention:
https://youtu.be/j7pWPdGEfMA
📺 Transformer explained:
https://youtu.be/FWFA4DGuzSc
Papers 📄:
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wang, Yu-An, and Yun-Nung Chen. "What do position embeddings learn? an empirical study of pre-trained language model positional encoding." arXiv preprint arXiv:2010.04903 (2020). https://arxiv.org/pdf/2010.04903.pdf
Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). https://arxiv.org/abs/2010.11929
✍️ Arabic Subtitles by Ali Haidar Ahmad https://www.linkedin.com/in/ali-ahmad-0706a51bb/ .
🔗 Links:
AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
YouTube: https://www.youtube.com/AICoffeeBreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research