You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a must for when this is to go public so that we can fairly compare against leading frameworks in Machine Learning.
This would require inevitably to be able to generate on the fly new kernels (even based on a template if needed) and being able to compile them. Two options exists if we want to support both Nvidia and AMD devices:
Write separate backends for each of the two vendors (probably focusing Nvidia initially). Things like nvptx and ocl would probably be the starting point of this.
Have a unified backend by using HIP which can allows us to generate same code for any of the two vendors.
Additionally, for comparisons we definitely would need to link cuBLAS and cuDNN for Nvidia devices. It would be nice if we have someone with more expertise in GPU programming who can make suggestions or comments on what is better way of approaching this.
The text was updated successfully, but these errors were encountered:
This is a must for when this is to go public so that we can fairly compare against leading frameworks in Machine Learning.
This would require inevitably to be able to generate on the fly new kernels (even based on a template if needed) and being able to compile them. Two options exists if we want to support both Nvidia and AMD devices:
Write separate backends for each of the two vendors (probably focusing Nvidia initially). Things like nvptx and ocl would probably be the starting point of this.
Have a unified backend by using HIP which can allows us to generate same code for any of the two vendors.
Additionally, for comparisons we definitely would need to link cuBLAS and cuDNN for Nvidia devices. It would be nice if we have someone with more expertise in GPU programming who can make suggestions or comments on what is better way of approaching this.
The text was updated successfully, but these errors were encountered: