This is a list of useful libraries and resources for CUDA development.
-
Optimizing Parallel Reduction in CUDA - In this presentation it is shown how a fast, but relatively simple, reduction algorithm can be implemented.
-
CUDA C/C++ BASICS - This presentations explains the concepts of CUDA kernels, memory management, threads, thread blocks, shared memory, thread syncrhonization. A simple addition kernel is shown, and an optimized stencil 1D stencil kernel is shown.
-
Advanced CUDA - Optimizing to Get 20x Performance - This presentation covers: Tesla 10-Series Architecture, Particle