SWIN transformer (image recognition)

SWIN transformer (image recognition)

2.012 Lượt nghe
SWIN transformer (image recognition)
This video talks about SWIN transformer - a model trained for image classification, but also used in a variety of tasks as a backbone, replacing ResNet/ViT. It is currently the main part of SOTA object detection models like DINO. This is another video from my "Modern Object Detection" series: https://www.youtube.com/playlist?list=PL1HdfW5-F8AQlPZCJBq2gNjERTDEAl8v3 Important links: - Original paper: https://arxiv.org/pdf/2103.14030.pdf - My previous video about ViT: https://youtu.be/NcbbPuRjMeE 00:00 - Intro 00:50 - Motivation, "Image Tokenization" Problem 08:14 - Hierarchical Patches Architecture 10:40 - Shifted Windows Attention 17:26 - Relative Positional Bias 21:58 - Results 26:00 - Next Up