-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038
Comments
## Technical Description This PR enables globals for the unstructured backends. Furthermore, an unreported issue where globals were not propagated from the wrapper class to the stencil class was fixed. Additionally, a bug in the unstructured cuda codegen was fixed when translating stencils that only use dense dimensions. ### Resolves / Enhances Fixes #1030 Fixes #1028 ### Notes The methods to set and get globals in the cuda backend are on the inner stencils. This will be addressed in [this issue](#1038). Also, a method to communicate globals from FORTRAN will need to be devised (not addressed yet). ### Testing New tests in dawn4py and a new unstructured integration test to test the correct operation of the `CXXNaiveIco` backend. `CudaIco` backend tested manually. ### Dependencies This PR is independent.
This is less pressing now because we offer various |
This may require more discussion after all. In our current scope, each stencil instantiation may only hold a single stencil. From a conceptual stand point wrapping the stencil is thus not necessary. |
Further discussions revealed that we may want to introduce distinct
|
…1089) ## Technical Description Currently, temporary fields are allocated in the constructor of the generated class. Since the API functions to the FORTRAN and cpp drivers hold the stencil on the stack, this leads to memory (de-)allocation on each call. This is fine for debugging, but not for production runs. Thus, this PR keeps that behavior for the convenience wrappers starting from host memory, but introduces static `setup` and `free` functions which have to be called by the host when using the production interface which assumes device pointers. Additionally, since the APIs are touched either way, globals can now be communicated from FORTRAN to the CUDA backend. Furthermore this PR contains a small refactoring and removes the (now) superfluous template parameter from the generated stencil class ### Resolves / Enhances Addresses part of #1038 Fixes #1042 ### Testing Since this affects the CUDA-ico backend this is tested by `icondusk-e2e`
The encapsulation in the generated code for CUDA Ico is quite bad. The outer class does not properly wrap the inner class, and the driver needs to access the generated name of the stencil directly. This is quite cumbersome when the stencil is evolving since the id changes depending on the stencil structure. For this issue: Generate code closer to the unstructured naive interface, with proper forwarding to inner stencil classes
The text was updated successfully, but these errors were encountered: