Learn how to make a custom Image classification model using Swin Transformer.
Github: https://github.com/AarohiSingla/Swin-Transformer
#####################################################
For queries: You can comment in comment section or you can email me at
[email protected]
#####################################################
Swin Transformer is a type of deep learning model architecture that combines the strengths of both Transformers and convolutional neural networks (CNNs). It was introduced in a research paper titled "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows," published in 2021.
Traditionally, Transformers have been highly successful in natural language processing tasks by capturing long-range dependencies, but they have not been as commonly used in computer vision tasks due to their computational requirements. On the other hand, CNNs have excelled in computer vision tasks by leveraging local spatial hierarchies and translation invariance.
Swin Transformer aims to bridge this gap by introducing a hierarchical vision Transformer that can efficiently handle large-scale image data. It introduces a novel mechanism called "shifted windows" that breaks down the input image into smaller overlapping patches. These patches are then processed by a series of Transformer layers to capture global dependencies.
#computervision #transformers