ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

4.695 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

This video covers everything about self attention in Vision Transformer - VIT , and its implementation from scratch.
I  go over all the details and explain everything happening inside attention in vision transformer in detail through visualizations and also go over how an implementation of self-attention from scratch would look like in Pytorch.

I cover Vision transformer ( VIT ) in three parts:
1. Patch Embedding in Vision Transformer VIT - https://youtu.be/lBicvB4iyYU
2. Self Attention in Vision Transformer VIT - This video
3. Building Vision Transformer and visualizations - https://www.youtube.com/watch?v=G6_IA5vKXRI

*Paper Link* - https://tinyurl.com/exai-vit-paper
*Implementation* - https://tinyurl.com/exai-vit-code

*Other Good Resources*
Yannic Kilcher | An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained) - https://www.youtube.com/watch?v=TrdevFK_am4
AI Coffee Break with Letitia | An image is worth 16x16 words: ViT | Vision Transformer explained - https://www.youtube.com/watch?v=DVoHvmww2lQ
James Briggs | Vision Transformers (ViT) Explained + Fine-tuning in Python - https://www.youtube.com/watch?v=qU7wO02urYU
Good Place to understand general transformer further - https://tinyurl.com/exai-vit-transformer


*TimeStamps* :
00:00 Intro
00:33 Intuition of What isAttention & Why its helpful
03:23 Inside Attention - What is Relevant
07:53 Inside Attention - Building Context Representation
08:45 Building Context Representation For All Patches
09:45 Why Multi Head Attention
11:15 Building Context Representation For Multi Head Attention
12:35 Combining Wq, Wk,Wv matrix
13:34 Shapes of Every Matrix in Attention
14:48 Implementation Parts of Attention
15:12 Pytorch Implementation for Attention in Vision Transformer VIT
18:26 Outro

*Subscribe to Channel* - https://tinyurl.com/exai-channel-link


Background Track - Fruits of Life by Jimena Contreras
Email - [[email protected]](mailto:[email protected])					

ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

Nhạc Theo Chủ Đề

Liên kết website