ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

4.695 Lượt nghe
ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation
This video covers everything about self attention in Vision Transformer - VIT , and its implementation from scratch. I go over all the details and explain everything happening inside attention in vision transformer in detail through visualizations and also go over how an implementation of self-attention from scratch would look like in Pytorch. I cover Vision transformer ( VIT ) in three parts: 1. Patch Embedding in Vision Transformer VIT - https://youtu.be/lBicvB4iyYU 2. Self Attention in Vision Transformer VIT - This video 3. Building Vision Transformer and visualizations - https://www.youtube.com/watch?v=G6_IA5vKXRI *Paper Link* - https://tinyurl.com/exai-vit-paper *Implementation* - https://tinyurl.com/exai-vit-code *Other Good Resources* Yannic Kilcher | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained) - https://www.youtube.com/watch?v=TrdevFK_am4 AI Coffee Break with Letitia | An image is worth 16x16 words: ViT | Vision Transformer explained - https://www.youtube.com/watch?v=DVoHvmww2lQ James Briggs | Vision Transformers (ViT) Explained + Fine-tuning in Python - https://www.youtube.com/watch?v=qU7wO02urYU Good Place to understand general transformer further - https://tinyurl.com/exai-vit-transformer *TimeStamps* : 00:00 Intro 00:33 Intuition of What isAttention & Why its helpful 03:23 Inside Attention - What is Relevant 07:53 Inside Attention - Building Context Representation 08:45 Building Context Representation For All Patches 09:45 Why Multi Head Attention 11:15 Building Context Representation For Multi Head Attention 12:35 Combining Wq, Wk,Wv matrix 13:34 Shapes of Every Matrix in Attention 14:48 Implementation Parts of Attention 15:12 Pytorch Implementation for Attention in Vision Transformer VIT 18:26 Outro *Subscribe to Channel* - https://tinyurl.com/exai-channel-link Background Track - Fruits of Life by Jimena Contreras Email - [[email protected]](mailto:[email protected])