/Using the GPU can substantially speed up all kinds of numerical problems. Conventional wisdom dictates that for fast numerics you need to be a C/C++ wizz. It turns out that you can get quite far with only python. In this video, I explain how you can use cupy together with numba to perform calculations on NVIDIA GPU's. Production quality is not the best, but I hope you may find it useful.
00:00 Introduction: GPU programming in python, why?
06:52 Cupy intro
08:39 Cupy demonstration in Google colab
19:54 Cupy summary
20:21 Numba.cuda and kernels intro
25:07 Grids, blocks and threads
27:12 Matrix multiplication kernel
29:20 Tiled matrix multiplication kernel and shared memory
34:31 Numba.cuda demonstration in Google colab
44:25 Final remarks
Edit 3/9/2021: the notebook is use for demonstration can be found here https://colab.research.google.com/drive/15IDLiUMRJbKqZUZPccyigudINCD5uZ71?usp=sharing
Edit 9/9/2021: at
23:56 one of the grid elements should be labeled 1,3 instead of 1,2. Thanks to _______ for pointing this out.