This is my live (to the most extent) coding video, where I implement from a scratch a diffusion model that generates 32 x 32 RGB images. The tutorial assumes a basic knowledge of deep learning and Python.
Links:
- The Jupiter notebook built in this video: https://github.com/dtransposed/code_videos/blob/main/01_Diffusion_Models_Tutorial
- My website: https://dtransposed.github.io
- My Twitter: https://twitter.com/dtransposed
Sources:
- Lil' Log - What are Diffusion Models: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
- Understanding Diffusion Models: A Unified Perspective: https://arxiv.org/abs/2208.11970
- Denoising Diffusion Probabilistic Models: https://arxiv.org/abs/2006.11239
Timestamps:
0:00 Introduction
0:32 Theoretical background
13:13 Live Coding - Forward diffusion
41:29 Live Coding - Training loop
1:00:05 - Live Coding - Overfitting one batch
1:03:36 - Live Coding - Reverse diffusion
1:13:40 - Live Coding - Training on CIFAR - 10 dataset
1:17:24 - Live Coding - Result evaluation
1:19:40 - (Bonus) Quick explanation of the UNet architecture used in the tutorial