CUDA Crash Course: GPU Performance Optimizations Part 1
In this video we look at a step-by-step performance optimization of matrix multiplication in CUDA!
Spreadsheet: https://docs.google.com/spreadsheets/d/14v58GFyOEeTPk1DiftyfqxREULLuz7xLaPu9Ekipnc8/edit?usp=sharing
For code samples: http://github.com/coffeebeforearch
For live content: http://twitch.tv/CoffeeBeforeArch