Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenCL PSy layer for GOcean 1.0 #174

Closed
arporter opened this issue May 24, 2018 · 16 comments
Closed

Support OpenCL PSy layer for GOcean 1.0 #174

arporter opened this issue May 24, 2018 · 16 comments
Assignees

Comments

@arporter
Copy link
Member

In this issue we will extend PSyclone to support the (optional) generation of OpenCL code in the PSy layer. We will use the clfortran module so that we can stick with Fortran.
Note that transforming existing kernels into OpenCL will be the subject of a separate issue.

We will target the GOcean 1.0 API first since that is what is required in EuroEXA.

@arporter
Copy link
Member Author

Instead of calling kernel subroutines we must call clEnqueueNDRangeKernel with a kernel object as argument. We need some way of obtaining that kernel object in the generated PSy layer. Obviously we have the name of the kernel from its meta-data. We could then use this to make a call into (some) infrastructure that returns the associated kernel object.

@arporter
Copy link
Member Author

arporter commented Jun 4, 2018

I've had a change of plan and decided to go with using a Transformation to toggle whether or not to generate OpenCL. This allows it to be applied on a per-Schedule/Invoke basis rather than globally for an Algorithm file. Currently the transformation just sets Node._opencl = True for the Schedule. Children of the Schedule must currently first look-up their owning Schedule in order to determine whether or not OpenCL is enabled. Alternatively, I could change the setter for Node._opencl to cascade the setting to all child nodes.
When OpenCL is enabled, Loop.gen_code() should do nothing and Kernel.gen_code() should generate a clEnqueueNDRangeKernel call.

@arporter
Copy link
Member Author

arporter commented Jun 4, 2018

More importantly, I need to work out how we are going to set the kernel arguments which, in OpenCL, is done via (many) API calls. Happily, I think I can generate the necessary code using the kernel meta-data and Algorithm argument list. I think the generated code should then be called within the if(first_time) block at the start of the Invoke. Possibly we may need to ensure this is only ever done once for each kernel (but then again, it may not matter).

@arporter
Copy link
Member Author

arporter commented Jun 8, 2018

For reference, the code to set kernel arguments looks like this (in Fortran):

arg_idx = 0
ierr = clSetKernelArg(kernel, arg_idx, sizeof(nx), C_LOC(nx))
call check_status("clSetKernelArg", ierr)
arg_idx = arg_idx + 1
ierr = clSetKernelArg(kernel, arg_idx, sizeof(ssha_device), &
     C_LOC(ssha_device))
call check_status("clSetKernelArg", ierr)
arg_idx = arg_idx + 1

The scalar argument (nx) is just the local copy whereas for fields it is the pointer to the buffer on the device that is required. There's no mention of types - just addresses and sizes.

@arporter
Copy link
Member Author

arporter commented Jun 8, 2018

We will need to create a routine to set the kernel arguments for each kernel that we come across. Given that we run PSyclone separately on each Algorithm file, there's currently no way to avoid creating such a routine several times. That's probably not a big deal though. I think the logical place to do this is after we've created the Algorithm and PSy code since at that point we know what all our kernels are.

arporter added a commit that referenced this issue Jun 15, 2018
@arporter
Copy link
Member Author

I've put the necessary code into PSy.gen_code() as that enables me to prevent producing the same routine several times (if a PSy layer makes multiple uses of the same kernel). There's nothing to stop a single invoke from calling the same kernel multiple times with different arguments - we must therefore call the set-kernel args routine before each kernel is launched. We could subsequently improve on this by checking the argument lists of each kernel call.

arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
arporter added a commit that referenced this issue Jun 15, 2018
@arporter
Copy link
Member Author

arporter commented Jun 15, 2018

I think I have all the basics covered now with the exception of the creation of buffers on the device. At the moment I assume the infrastructure has set-up field%device_ptr but I don't think that has to be the case. Although I could in theory do this from PSyclone (although making sure the same field was associated with the same bit of device memory between Algorithm files would be tricky), there would still be the issue of output. Currently PSyclone has no knowledge of when data is required on the CPU and therefore it makes sense to leave that to the infrastructure. That being so, I need to re-factor the dl_esm_inf code to use the newly-extracted opencl code (that is now in lib/opencl).

@arporter
Copy link
Member Author

I don't want dl_esm_inf to have to depend on PSyclone though so I think this means we nead YAR (yet another repository) containing the Fortran->OpenCL interface code. In a spirit of collaboration I've done a google for such a code and found hiCL but that is based on C/C++ with a Fortran wrapper. It also hasn't been updated for two years.

@arporter
Copy link
Member Author

I've created https://github.com/stfc/FortCL and moved my OpenCL wrapper code into there. This has the advantage that it's pure Fortran. dl_esm_inf now has FortCL as a submodule.

@arporter
Copy link
Member Author

Have now brought branch up-to-date with master and used the new CharDeclGen to declare my list of kernel names correctly. Generated OpenCL code does not quite compile just yet.

@arporter
Copy link
Member Author

In order to reduce code duplication I've added an is_literal argument to psyGen.args_filter(). I can then use this in several places in gocean1p0 when dealing with scalar arguments. Scalar arguments are now passed into the kernel-argument-setting routines as required.
Generated OpenCL version now compiles!

@rupertford
Copy link
Collaborator

Whoop!

@arporter
Copy link
Member Author

Generated (PSycloneBench) code now runs as far as the point where I need to set a kernel argument that is a grid property. I now realise that I haven't created the buffers on the device for grid properties...

@arporter
Copy link
Member Author

Code now runs through but some kernels fail because arguments aren't set. On closer inspection I realise that in Fortran, the momentum kernels have use model_mod, only: rdt, cbfr, visc. When I ported this kernel to OpenCL I made those scalars into kernel arguments (which is the way it has to be in OpenCL).

@arporter
Copy link
Member Author

When we have a tool for converting Fortran kernels to OpenCL then we will be able to capture information on any such conversion of module variables into kernel arguments. In fact, we could do it now as part of the meta-data parsing (since we parse the kernel source anyway). I can break that out as a separate Issue.
I think the only other problem left to solve is that NEMOLite2D uses a fake 'built-in' to copy fields. For OpenCL we need to do a clEnqueueCopyBuffer. However, the gocean1.0 API does not actually support built-ins- for now I've simply commented-out the field-copy kernels from the invoke.

@rupertford
Copy link
Collaborator

PR #216 has been merged to master. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants