Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

31.553 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows


Paper Abstract:
This  paper  presents  a  new  vision  Transformer,  calledSwin Transformer, that capably serves as a general-purposebackbone  for  computer  vision.Challenges  in  adaptingTransformer from language to vision arise from differencesbetween the two domains,  such as large variations in thescale of visual entities and the high resolution of pixels inimages compared to words in text.  To address these differ-ences, we propose a hierarchical Transformer whose rep-resentation is computed with shifted windows.  The shiftedwindowing scheme brings greater efficiency by limiting self-attention  computation  to  non-overlapping  local  windowswhile also allowing for cross-window connection.  This hi-erarchical architecture has the flexibility to model at var-ious scales and has linear computational complexity withrespect to image size.  These qualities of Swin Transformermake  it  compatible  with  a  broad  range  of  vision  tasks,including  image  classification  (86.4  top-1  accuracy  onImageNet-1K) and dense prediction tasks such as object de-tection (58.7 box AP and 51.1 mask AP on COCO test-dev)and semantic segmentation (53.5 mIoU on ADE20K val).Its performance surpasses the previous state-of-the-art by alarge margin of +2.7 box AP and +2.6 mask AP on COCO,and +3.2 mIoU on ADE20K, demonstrating the potential ofTransformer-based models as vision backbones.


Paper Link: https://arxiv.org/pdf/2103.14030.pdf

Official Code: https://github.com/microsoft/Swin-Transformer


Video Outline: 
0:00​ - Introduction 
0.50​ - Backbone for vision tasks
2:21​ - Swin Transformer - Architecture
5:31​ - Multi-Headed Self Attention (MSA)
6:51 - Swin Transformer Block
7:15 - Shifted Windows
8:40 - Swin Architecture and variants 
9:45 - Results 


**AI Bites**
YouTube: https://www.youtube.com/c/AIBites​
Twitter: https://twitter.com/ai_bites​
Patreon: https://www.patreon.com/ai_bites​
Github: https://github.com/ai-bites​


Vision Transformers (ViT): https://youtu.be/3B6q4xnuFUE
Data Efficient Image Transformer (DeiT): https://youtu.be/HobIo2oT0xY

📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚 
📖 Deep Learning by Ian Goodfellow - https://amzn.to/3Wnyixv
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - https://amzn.to/3ZVnQQA
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - https://amzn.to/3kAqThb
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - https://amzn.to/3XKVOWi

Music: https://www.bensound.com					

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

Nhạc Theo Chủ Đề

Liên kết website