Matrix multiplication on a GPU using CUDA C/C++.
Code Repository: https://github.com/tgautam03/xGeMM
Video Notes and Code Explainers: https://0mean1sigma.com/xgemm/
Animations: https://github.com/tgautam03/0Mean1Sigma/tree/master/2024-11-28-GPU-Programming
Other Projects: https://0mean1sigma.com/mini-projects/
Useful References:
https://siboehm.com/articles/22/CUDA-MMM
https://leimao.github.io/article/CUDA-Matrix-Multiplication-Optimization/
Chapters:
00:00 - Introduction
00:36 - Step 1 (Basic CUDA C/C++)
03:02 - Step 2 (Memory Coalescing)
05:57 - Step 3 (GPU Shared Memory)
06:57 - Step 4 (Thread Registers)
09:18 - Step 5 (More Thread Registers)
10:43 - Step 6 (Vectorized Memory Accesses)
12:02 - Final Thoughts