You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this puzzle, it seems that shared cuda arrays are lazily initialized. Since SIZE is 15 and TPB is 8, the last element of Block 1 is not overwritten, but the value got carried over from Block 0, even after calling cuda.syncthreads(). So is this a normal behavior or actually a leak?
My original implentation assumed that the shared array would all be intialized as 0, so it was safe to perform the sum over the entire block. Given the above findings, there are two options for those unaligned block: 1) fill the rest as 0 or 2) check the index bounds for sum ops ((i+step) < size). While both pass the test case, the latter gives a weird dependency graph for Block 1 as attached, which raises my concern.
The text was updated successfully, but these errors were encountered:
Hi,
For this puzzle, it seems that shared cuda arrays are lazily initialized. Since
SIZE
is 15 andTPB
is 8, the last element of Block 1 is not overwritten, but the value got carried over from Block 0, even after callingcuda.syncthreads()
. So is this a normal behavior or actually a leak?My original implentation assumed that the shared array would all be intialized as 0, so it was safe to perform the sum over the entire block. Given the above findings, there are two options for those unaligned block: 1) fill the rest as 0 or 2) check the index bounds for sum ops (
(i+step) < size
). While both pass the test case, the latter gives a weird dependency graph for Block 1 as attached, which raises my concern.The text was updated successfully, but these errors were encountered: