NVIDIA CUDA Tutorial 8: Intro to Shared Memory
Wow, this has been a tricky tute. I originally tried to cover much more and added some coding at the end but it was too long to be interesting. Then I chopped the coding to be a separate tute and concentrated on the theory side, it was still way too long.
Shared memory is a very intricate topic, it's at the very core of what programming CUDA is all about. I eventually decided that there's no good brushing over this stuff, shared memory deserves more attention. This tutorial is a little intro, it has information on how to allocate shared memory, a little about what shared memory is and an illustration of the dreaded race condition problem that comes about when resources are shared among parallel threads.
Next tute we'll look in more detail at the organization of shared memory and how to get the most performance out of it. After that we will be an excellent position to optimize the algorithm we looked at last tute.
Sorry in advance if you're one of those folks that likes a bit of code in the tutes. We'll get back to coding but this topic needs a foundation first. Cheers all!
Facebook:
https://www.facebook.com/pages/WhatsaCreel/167732956665435