CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

mroethlin · 2020-10-13T06:46:20Z

The encapsulation in the generated code for CUDA Ico is quite bad. The outer class does not properly wrap the inner class, and the driver needs to access the generated name of the stencil directly. This is quite cumbersome when the stencil is evolving since the id changes depending on the stencil structure. For this issue: Generate code closer to the unstructured naive interface, with proper forwarding to inner stencil classes

## Technical Description This PR enables globals for the unstructured backends. Furthermore, an unreported issue where globals were not propagated from the wrapper class to the stencil class was fixed. Additionally, a bug in the unstructured cuda codegen was fixed when translating stencils that only use dense dimensions. ### Resolves / Enhances Fixes #1030 Fixes #1028 ### Notes The methods to set and get globals in the cuda backend are on the inner stencils. This will be addressed in [this issue](#1038). Also, a method to communicate globals from FORTRAN will need to be devised (not addressed yet). ### Testing New tests in dawn4py and a new unstructured integration test to test the correct operation of the `CXXNaiveIco` backend. `CudaIco` backend tested manually. ### Dependencies This PR is independent.

mroethlin · 2020-12-07T10:04:43Z

This is less pressing now because we offer various run_STENCILNAME wrappers now, which enable the driver to run the stencil without the need of holding a stencil object themselves. However, one problem this approach has is that the stencil object is not persistent in these run functions. This may impose severe performance penalties when the stencil needs to allocate fields in its constructor.

mroethlin · 2020-12-07T10:27:52Z

This may require more discussion after all. In our current scope, each stencil instantiation may only hold a single stencil. From a conceptual stand point wrapping the stencil is thus not necessary.

mroethlin · 2020-12-07T11:42:49Z

Further discussions revealed that we may want to introduce distinct setup and run functions. This would enable us to do clean timings further down the road, and ensure proper separation of concerns (the first call to run would otherwise need to allocate temp fields). It would be preferable if both setup and run functions return void s.t. we don't need to introduce a opaque c_ptr to be returned to FORTRAN, which then needs to be managed on the FORTRAN end. A current rough sketch how this might look like reads:

class stencil {
  static double *tmpField;
  static size_t size_;
public:
  stencil() {}
  static void setup(size_t size) {
  	size_ = size;
  	tmpField = new double[size];
  }
  void run() {
    for (size_t i = 0; i < size_; i++) {
      tmpField[i] = i;
    }
  }
};
void setup(size_t size) { stencil::setup(size); }
void run() { stencil().run(); }

…1089) ## Technical Description Currently, temporary fields are allocated in the constructor of the generated class. Since the API functions to the FORTRAN and cpp drivers hold the stencil on the stack, this leads to memory (de-)allocation on each call. This is fine for debugging, but not for production runs. Thus, this PR keeps that behavior for the convenience wrappers starting from host memory, but introduces static `setup` and `free` functions which have to be called by the host when using the production interface which assumes device pointers. Additionally, since the APIs are touched either way, globals can now be communicated from FORTRAN to the CUDA backend. Furthermore this PR contains a small refactoring and removes the (now) superfluous template parameter from the generated stencil class ### Resolves / Enhances Addresses part of #1038 Fixes #1042 ### Testing Since this affects the CUDA-ico backend this is tested by `icondusk-e2e`

mroethlin added backend dawn related enhancement ICON refactoring labels Oct 13, 2020

mroethlin mentioned this issue Oct 13, 2020

Enabling Globals for Unstructured Backends #1039

Merged

mroethlin changed the title ~~Proper Encapsulation for CUDA Ico Backend~~ CUDA Ico Backend Refactoring (Persistence, Encapsulation) Dec 7, 2020

mroethlin mentioned this issue Jan 11, 2021

Introduce Persistent Temporary Fields and Mesh to the Cuda Backend #1089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

mroethlin commented Oct 13, 2020

mroethlin commented Dec 7, 2020

mroethlin commented Dec 7, 2020

mroethlin commented Dec 7, 2020

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

Comments

mroethlin commented Oct 13, 2020

mroethlin commented Dec 7, 2020

mroethlin commented Dec 7, 2020

mroethlin commented Dec 7, 2020