Implement and Train ViT From Scratch for Image Recognition - PyTorch

Implement and Train ViT From Scratch for Image Recognition - PyTorch

19.347 Lượt nghe
Implement and Train ViT From Scratch for Image Recognition - PyTorch
We will implement ViT (Vision Transformer) and train our implementation on the MNIST dataset to classify images! Video where I explain the ViT paper and GitHub below ↓ Want to support the channel? Hit that like button and subscribe! ViT (Vision Transformer) - An Image Is Worth 16x16 Words (Paper Explained) https://www.youtube.com/watch?v=8phM16htKbU GitHub Link of the Code https://github.com/uygarkurt/ViT-PyTorch Notebook https://github.com/uygarkurt/ViT-PyTorch/blob/main/vit-implementation.ipynb ViT (Vision Transformer) is introduced in the paper: "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" https://arxiv.org/abs/2010.11929 What should I implement next? Let me know in the comments! 00:00:00 Introduction 00:00:09 Paper Overview 00:02:41 Imports and Hyperparameter Definitions 00:11:09 Patch Embedding Implementation 00:19:36 ViT Implementation 00:29:00 Dataset Preparation 00:51:16 Train Loop 01:09:27 Prediction Loop 01:12:05 Classifying Our Own Images Buy me a coffee! ☕️ https://ko-fi.com/uygarkurt