-
Notifications
You must be signed in to change notification settings - Fork 16
GPUs
Marcelo Forets edited this page Nov 13, 2020
·
2 revisions
- XSpeed. Presentation slides, webpage.
Two main approaches to leveraging GPUs:
-
High-level approach:
- Use a standard sequential algorithm (computing the support in a given direction; looping over all directions) that calls GPU libraries for heavy operations like matrix operations, etc.
- The gain here is limited by Amdahl's law and memory transfers.
-
Low-level approach:
- Decompose the problem (computing the support in a given direction; in parallel over all directions) and execute the algorithm as a kernel on the GPU.
- If there is no thread divergence and little memory transfer, the speedup is linear in the number of GPU cores. << major gain
Challenges to computing the support function in a kernel:
-
Potential obstacle: SIMD architecture: operations are only parallel on data, with all cores in a block either executing the same instruction or waiting.
-
The performance gain on the GPU directly hinges on thread divergence: any conditional statements will drastically slow down the GPU since threads in the same block will have to wait for each other until they're back in the same instruction.