This video covers everything about self attention in Vision Transformer - VIT , and its implementation from scratch.
I go over all the details and explain everything happening inside attention in vision transformer in detail through visualizations and also go over how an implementation of self-attention from scratch would look like in Pytorch.
I cover Vision transformer ( VIT ) in three parts:
1. Patch Embedding in Vision Transformer VIT -
https://youtu.be/lBicvB4iyYU
2. Self Attention in Vision Transformer VIT - This video
3. Building Vision Transformer and visualizations -
https://www.youtube.com/watch?v=G6_IA5vKXRI
*Paper Link* - https://tinyurl.com/exai-vit-paper
*Implementation* - https://tinyurl.com/exai-vit-code
*Other Good Resources*
Yannic Kilcher | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained) -
https://www.youtube.com/watch?v=TrdevFK_am4
AI Coffee Break with Letitia | An image is worth 16x16 words: ViT | Vision Transformer explained -
https://www.youtube.com/watch?v=DVoHvmww2lQ
James Briggs | Vision Transformers (ViT) Explained + Fine-tuning in Python -
https://www.youtube.com/watch?v=qU7wO02urYU
Good Place to understand general transformer further - https://tinyurl.com/exai-vit-transformer
*TimeStamps* :
00:00 Intro
00:33 Intuition of What isAttention & Why its helpful
03:23 Inside Attention - What is Relevant
07:53 Inside Attention - Building Context Representation
08:45 Building Context Representation For All Patches
09:45 Why Multi Head Attention
11:15 Building Context Representation For Multi Head Attention
12:35 Combining Wq, Wk,Wv matrix
13:34 Shapes of Every Matrix in Attention
14:48 Implementation Parts of Attention
15:12 Pytorch Implementation for Attention in Vision Transformer VIT
18:26 Outro
*Subscribe to Channel* - https://tinyurl.com/exai-channel-link
Background Track - Fruits of Life by Jimena Contreras
Email - [
[email protected]](mailto:
[email protected])