Tiled (general) Matrix Multiplication from scratch in CUDA C.
Code Repo: https://github.com/tgautam03/CUDA-C/tree/master/05_tiled_mat_mul
Notes: https://0mean1sigma.com/chapter-4-memory-coalescing-and-tiled-matrix-multiplication/
Animations: https://github.com/tgautam03/0Mean1Sigma/tree/master/CUDA_04
00:00 Introduction
00:41 Standard Matrix Multiplication
01:41 Tiled Matrix Multiplication Algorithm
03:24 Tiled Matrix Multiplication Code
05:53 General (Tiled) Matrix Multiplication
08:11 Demo
08:26 Next Video: Tensor Cores!