You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For debugging ln a cuda less machine having a cpu backend to the kernels call op would be useful
We can basically just make a new funx with outermost parallel fors for block and thread, then similarly jit compile it and insert into a custom call expr
For now that should suffice if there's no shmem or sync, we can do a polygeist style lowering after GB
The text was updated successfully, but these errors were encountered:
The better way is likely to go to SPIR-V and OpenCL+POCL. A COU backend as you describe is still tricky since you need to forward variables between the loops.
For debugging ln a cuda less machine having a cpu backend to the kernels call op would be useful
We can basically just make a new funx with outermost parallel fors for block and thread, then similarly jit compile it and insert into a custom call expr
For now that should suffice if there's no shmem or sync, we can do a polygeist style lowering after GB
The text was updated successfully, but these errors were encountered: