Implement and Train ViT From Scratch for Image Recognition - PyTorch

19.347 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

Implement and Train ViT From Scratch for Image Recognition - PyTorch

We will implement ViT (Vision Transformer) and train our implementation on the MNIST dataset to classify images! Video where I explain the ViT paper and GitHub below ↓ 

Want to support the channel? Hit that like button and subscribe!

ViT (Vision Transformer) - An Image Is Worth 16x16 Words (Paper Explained)
https://www.youtube.com/watch?v=8phM16htKbU

GitHub Link of the Code
https://github.com/uygarkurt/ViT-PyTorch

Notebook
https://github.com/uygarkurt/ViT-PyTorch/blob/main/vit-implementation.ipynb

ViT (Vision Transformer) is introduced in the paper: "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
https://arxiv.org/abs/2010.11929

What should I implement next? Let me know in the comments!

00:00:00 Introduction
00:00:09 Paper Overview
00:02:41 Imports and Hyperparameter Definitions
00:11:09 Patch Embedding Implementation
00:19:36 ViT Implementation
00:29:00 Dataset Preparation
00:51:16 Train Loop
01:09:27 Prediction Loop
01:12:05 Classifying Our Own Images

Buy me a coffee! ☕️ 
https://ko-fi.com/uygarkurt					

Implement and Train ViT From Scratch for Image Recognition - PyTorch

Nhạc Theo Chủ Đề

Liên kết website