Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for generating an OpenCL PSy layer #216

Merged
merged 68 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bb0a42a
#174 add opencl option and generate use statements in psy module
arporter May 24, 2018
dd3309c
#174 add first OCL test for gocean
arporter Jun 4, 2018
585eafb
#174 add new OCLTrans() transformation
arporter Jun 4, 2018
6c0a570
#174 rm opencl from constructor and have as member of Schedule
arporter Jun 4, 2018
9f161c0
#174 generate code to set kernel args
arporter Jun 15, 2018
e71012f
#174 no explicit loops for ocl
arporter Jun 15, 2018
b540c75
#174 add call to set kernel args before each kernel call
arporter Jun 15, 2018
c998f56
#174 add Fortran OpenCLn utility code
arporter Jun 15, 2018
aeb3563
#174 add option to read kernel filename from env var. Allow add_kerne…
arporter Jun 15, 2018
1de9969
#174 add the target attribute to DeclGen
arporter Jun 15, 2018
7b776ff
#174 add if(first_time) section to psy
arporter Jun 15, 2018
26331cb
#174 add calls to copy fields to device
arporter Jun 15, 2018
26b074b
#174 add expected arg-setting code to test [skip ci]
arporter Jun 15, 2018
0ccd814
#174 add new gen_ocl_init() to create ocl-initialisation routine in P…
arporter Jun 15, 2018
646f8f0
#174 add x-failing test for psy_init()
arporter Jun 15, 2018
a9fd14d
#174 rm opencl arg from factory and generate
arporter Jun 22, 2018
5973b30
#174 support grid-property kernel args
arporter Jun 22, 2018
97c2186
#174 add property decorator to OCLTrans.name
arporter Jun 22, 2018
5838fc4
#174 use c_sizeof instead of sizeof. Avoid setting scalar args [skip ci]
arporter Jun 27, 2018
7fa79e5
#174 WIP updating declarations of quantities in PSy layer [skip ci]
arporter Jun 27, 2018
03e1c2f
#174 bring branch up-to-date with master (after declgen refactor)
arporter Jul 11, 2018
125f173
#174 rm duplicate target attribute after merge [skip ci]
arporter Jul 11, 2018
17e0b55
#174 correct type of kernel-name list [skip ci]
arporter Jul 11, 2018
9d86ace
#174 mv gen of set-kern-args into gocean1p0 [skip ci]
arporter Jul 12, 2018
594fdbe
#174 ensure ocl-init routine only executed once [skip ci]
arporter Jul 17, 2018
91ec86c
#174 create device buffer for each field
arporter Jul 20, 2018
3dc8fee
#174 add arg index and name to setkernelarg message [skip ci]
arporter Jul 20, 2018
1fd2f24
#174 add code to set kernel argument for nx [skip ci]
arporter Jul 20, 2018
d3fd8d5
#174 add code to set-up grid property arrays on device [skip ci]
arporter Jul 20, 2018
92b3319
#174 correct kind of globalsize and use c_sizeof to calculate device …
arporter Jul 20, 2018
5b1b707
#174 change to generate code for new FortCL interface [skip ci]
arporter Jul 20, 2018
ab91844
#174 bring tests up-to-date now that we set the nx argument [skip ci]
arporter Sep 20, 2018
dd26b4d
#174 bring branch up-to-date with master
arporter Sep 20, 2018
8091518
#216 fix test failures
arporter Sep 20, 2018
9fd8c5d
#216 add test for setting float scalar arg [skip ci]
arporter Sep 20, 2018
b552e76
#216 cover missed lines and remove _opencl from Node class
arporter Sep 20, 2018
d70c698
#216 rm OpenCL wrapper code now in FortCL [skip ci]
arporter Sep 20, 2018
3ade38f
#216 use namespace manager for kernel name [skip ci]
arporter Sep 20, 2018
070e2d5
#216 WIP using namespace manager for vars in generated code [skip ci]
arporter Sep 20, 2018
651fd97
#216 add xfailing test for const scalar kernel args
arporter Sep 20, 2018
d8633a4
#216 add code and test that we reject passing scalars by value
arporter Sep 21, 2018
335f75a
#216 create new gen_data_on_ocl_device() routine
arporter Oct 1, 2018
d700114
#216 fix errors due to break-out of gen_data_on_ocl_device() [skip ci]
arporter Nov 2, 2018
7fd4774
#216 bring branch up-to-date with master
arporter Nov 2, 2018
e2a7626
#216 tidy opencl_test for pylint
arporter Nov 2, 2018
d018160
#216 add test that OCLTrans() reject non-GOcean schedules
arporter Nov 2, 2018
d38168e
#216 tidy docstrings in gocean1p0 and psyGen
arporter Nov 2, 2018
774ea35
#216 bring up-to-date with master
arporter Dec 21, 2018
9887fed
#216 add OpenCL eg [skip ci]
arporter Dec 21, 2018
13b25cb
#216 update copyright info in fortran files [skip ci]
arporter Dec 21, 2018
72dd82f
#216 fix test failures due to addition of go_ suffix on master
arporter Dec 21, 2018
d89f8d3
#216 update user-guide with new OCLTrans() [skip ci]
arporter Dec 21, 2018
0268b0e
#216 add doc to developers guide [skip ci]
arporter Dec 21, 2018
a502225
#216 use namespace manager for arg-setter routine variables
arporter Dec 21, 2018
1ed37a5
#216 make cmd-queue assumption more explicit [skip ci]
arporter Dec 21, 2018
fc6a8f6
#216 tidy-up docstrings and set Memento
arporter Jan 16, 2019
aab8abf
pr #216. Merge branch 'master' into opencl_psy
rupertford Jan 22, 2019
25af70e
#216 bring branch up-to-date with master
arporter Jan 25, 2019
35049bd
#216 improve doc strings following review [skip ci]
arporter Jan 25, 2019
2553e8e
#216 ensure GOKern.gen_code calls rename_and_write to fix tests
arporter Jan 25, 2019
d886b83
#216 update developers and transformation docs [skip ci]
arporter Jan 25, 2019
266a2b7
#216 tidy for pylint
arporter Jan 25, 2019
24822c1
#216 tidy for pylint and improve docstrings
arporter Jan 30, 2019
fa09889
#216 rm duplicated GOKern.find_grid_access() method
arporter Jan 31, 2019
a5ec76f
#216 use raw_arg_list() method in GOKern.gen_code()
arporter Jan 31, 2019
60e4cc8
pr #216. Updating changelog and documentation pdf ready for merge to …
rupertford Jan 31, 2019
a53f4f9
pe #216 Merge branch 'master' into opencl_psy and update changelog an…
rupertford Jan 31, 2019
e7c5c2b
pr 216 Merge branch 'master' into opencl_psy and update changelog and…
rupertford Jan 31, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions doc/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1489,3 +1489,139 @@ Of course, a given field may already be on the device (and have been
updated) due to a previous Invoke. In this case, the fact that the
OpenACC run-time does not copy over the now out-dated host version of
the field is essential for correctness.

.. _opencl_dev:

OpenCL Support
##############

PSyclone is able to generate an OpenCL :cite:`opencl` version of
PSy-layer code for the GOcean 1.0 API. Such code may then be executed
on devices such as GPUs and FPGAs (Field-Programmable Gate
Arrays). Since OpenCL code is very different to that which PSyclone
normally generates, its creation is handled by ``gen_ocl`` methods
instead of the normal ``gen_code``. Which of these to use is
determined by the value of the ``Schedule.opencl`` flag. In turn,
this is set at a user level by the ``transformations.OCLTrans``
transformation.

The PSyKAl model of calling kernels for pre-determined iteration
spaces is a natural fit to OpenCL's concept of an
``NDRangeKernel``. However, the kernels themselves must be created or
loaded at runtime, their arguments explicitly set and any arrays
copied to the compute device. All of this 'boilerplate' code is
generated by PSyclone. In order to minimise the changes required, the
generated code is still Fortran and makes use of the FortCL library
(https://github.com/stfc/FortCL) to access OpenCL functionality. We
could of course generate the PSy layer in C instead but this would
require further extension of PSyclone.

Consider the following invoke::
rupertford marked this conversation as resolved.
Show resolved Hide resolved

call invoke( compute_cu(CU_fld, p_fld, u_fld) )

When creating the OpenCL PSy layer for this invoke, PSyclone creates
three subroutines instead of the usual one. The first, ``psy_init``
is responsible for ensuring that a valid kernel object is created
for each kernel called by the invoke, e.g.::

use fortcl, only: ocl_env_init, add_kernels
...
! Initialise the OpenCL environment/device
CALL ocl_env_init
! The kernels this PSy layer module requires
kernel_names(1) = "compute_cu_code"
! Create the OpenCL kernel objects. Expects to find all of the
! compiled kernels in PSYCLONE_KERNELS_FILE.
CALL add_kernels(1, kernel_names)

As indicated in the comment, the ``FortCL::add_kernels`` routine
expects to find all kernels in a pre-compiled file pointed to by the
PSYCLONE_KERNELS_FILE environment variable. (A pre-compiled file is
used instead of run-time kernel compilation in order to support
execution on FPGAs.)

The second routine created by PSyclone sets the kernel arguments, e.g.::

SUBROUTINE compute_cu_code_set_args(kernel_obj, nx, cu_fld, p_fld, u_fld)
USE clfortran, ONLY: clSetKernelArg
USE iso_c_binding, ONLY: c_sizeof, c_loc, c_intptr_t
...
INTEGER(KIND=c_intptr_t), target :: cu_fld, p_fld, u_fld
INTEGER(KIND=c_intptr_t), target :: kernel_obj
INTEGER, target :: nx
! Set the arguments for the compute_cu_code OpenCL Kernel
ierr = clSetKernelArg(kernel_obj, 0, C_SIZEOF(nx), C_LOC(nx))
ierr = clSetKernelArg(kernel_obj, 1, C_SIZEOF(cu_fld), C_LOC(cu_fld))
...
END SUBROUTINE compute_cu_code_set_args

The third routine generated is the ususal psy-layer routine that is
responsible for calling all of the kernels. However, it must now also
call ``psy_init``, create buffers on the compute device (if they are
not already present) and copy data over::

SUBROUTINE invoke_compute_cu(...)
...
IF (first_time) THEN
first_time = .false.
CALL psy_init
num_cmd_queues = get_num_cmd_queues()
cmd_queues => get_cmd_queues()
kernel_compute_cu_code = get_kernel_by_name("compute_cu_code")
END IF
globalsize = (/p_fld%grid%nx, p_fld%grid%ny/)
! Ensure field data is on device
IF (.NOT. cu_fld%data_on_device) THEN
size_in_bytes = int(p_fld%grid%nx*p_fld%grid%ny, 8)* &
c_sizeof(cu_fld%data(1,1))
! Create buffer on device
cu_fld%device_ptr = create_rw_buffer(size_in_bytes)
ierr = clEnqueueWriteBuffer(cmd_queues(1), cu_fld%device_ptr, &
CL_TRUE, 0_8, size_in_bytes, &
C_LOC(cu_fld%data), 0, C_NULL_PTR, &
C_LOC(write_event))
cu_fld%data_on_device = .true.
END IF
...

Note that we use the ``data_on_device`` member of the field derived
type (implemented in github.com/stfc/dl_esm_inf) to keep track of
whether a given field has been copied to the compute device. Once all
of this setup is done, the kernel itself is launched by calling
``clEnqueueNDRangeKernel``::

ierr = clEnqueueNDRangeKernel(cmd_queues(1), kernel_compute_cu_code, &
2, C_NULL_PTR, C_LOC(globalsize), &
C_NULL_PTR, 0, C_NULL_PTR, C_NULL_PTR)

Limitations
rupertford marked this conversation as resolved.
Show resolved Hide resolved
===========

Currently PSyclone can only generate the OpenCL version of the PSy
layer. Execution of the resulting code requires that the kernels
themselves be converted from Fortran to OpenCL (a dialect of C) and at
present this must be done manually. Since all data accessed by an
OpenCL kernel must be passed as an argument, this conversion must also
convert any accesses to module data into routine arguments.
Work is in progress to support kernel transformation and this will be
made available in a future PSyclone release.

In OpenCL, all tasks to be performed (whether copying data or kernel
execution) are associated with a command queue. Tasks submitted to
different command queues may then be executed concurrently,
potentially giving greater performance. The OpenCL PSy code currently
generated by PSyclone makes use of just one command queue but again,
this could be extended in the future.

The current implementation only supports the conversion of a whole
Invoke to use OpenCL. In the future we may refine this functionality
so that it may be applied to just a subset of kernels within an
Invoke.

Since PSyclone knows nothing about the I/O performed by a model, the
task of ensuring that the correct data is written out by a model
(including when doing halo exchanges for distributed memory) is left
to the dl_esm_inf library since that has the information on whether
field data is local or on a remote compute device.

7 changes: 7 additions & 0 deletions doc/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,10 @@ @manual{nemo_code_conv
url = "https://forge.ipsl.jussieu.fr/nemo/attachment/wiki/Literature/NEMO_coding.conv_v3.pdf",
year = "2013"
}

@manual{opencl,
title = "The OpenCL 2.2 Reference Guide",
version = "2.2",
url = "https://www.khronos.org/files/opencl22-reference-guide.pdf",
year = "2018"
}
48 changes: 42 additions & 6 deletions doc/transformations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,9 @@ The generic transformations currently available are listed in
alphabetical order below (a number of these have specialisations which
can be found in the API-specific sections).

.. note:: PSyclone currently only supports OpenACC transformations
for the GOcean 1.0 API. Attempts to apply these
transformations to (members of) Schedules from other
.. note:: PSyclone currently only supports OpenACC and OpenCL
rupertford marked this conversation as resolved.
Show resolved Hide resolved
transformations for the GOcean 1.0 API. Attempts to apply
these transformations to (members of) Schedules from other
APIs will be rejected.

####
Expand Down Expand Up @@ -90,9 +90,12 @@ can be found in the API-specific sections).

####

.. autoclass:: psyclone.transformations.ProfileRegionTrans
:members:
:noindex:
.. autoclass:: psyclone.transformations.OCLTrans
:members:
:noindex:

.. note:: OpenCL support is still under development. See
:ref:`opencl_dev` for more details.

####

Expand Down Expand Up @@ -122,6 +125,13 @@ can be found in the API-specific sections).
:ref:`MoveTrans <sec_move_trans>` transformation may be used
for this.

####

.. autoclass:: psyclone.transformations.ProfileRegionTrans
:members:
:noindex:


Kernels
-------

Expand Down Expand Up @@ -386,3 +396,29 @@ region for a set of nodes that includes halo swaps or global sums will
produce an error. In such cases it may be possible to re-order the
nodes in the Schedule using the :ref:`MoveTrans <sec_move_trans>`
transformation.

OpenCL
------

In common with OpenMP, the conversion of the generated code to use
OpenCL is performed by a transformation (``OCLTrans`` - see the
:ref:`sec_transformations_available` Section above). Currently this
transformation is only supported for the GOcean1.0 API and is applied
to the whole Schedule of an Invoke. This means that all kernels in
that Invoke will be executed on the OpenCL device. At present the
``OCLTrans`` transformation only alters the generated PSy-layer code. It
is currently the user's responsibility to convert the actual kernel code
from Fortran into OpenCL. Work is underway to extend PSyclone in
order to perform this translation automatically.

The OpenCL code generated by PSyclone is still Fortran and makes use
of the FortCL library (https://github.com/stfc/FortCL) to access
OpenCL functionality. It also relies upon the OpenCL support provided
by the dl_esm_inf library (https://github.com/stfc/dl_esm_inf).

The introduction of OpenCL code generation in PSyclone has been
largely motivated by the need to target Field Programmable Gate Array
(FPGA) accelerator devices. It is not currently designed to target the other
compute devices that OpenCL supports (such as GPUs and multi-core CPUs) but
this is a potentially fruitful area for future work.

59 changes: 59 additions & 0 deletions examples/gocean/eg3/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# -----------------------------------------------------------------------------
# BSD 3-Clause License
#
# Copyright (c) 2018, Science and Technology Facilities Council
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
# FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
# COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
# BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
# ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
#------------------------------------------------------------------------------
# Author A. R. Porter, STFC Daresbury Lab

The directory containing this file contains an example of the use of
PSyclone to generate OpenCL driver code with the GOcean 1.0 API.

In order to use PSyclone you must first install it, ideally with pip.
See ../../../README.md for more details.

PSyclone can be run in the directory containing this file by
executing, e.g.::

psyclone -api "gocean1.0" alg.f90

This will generate 'vanilla' PSy-layer code which is output to stdout.

In order to generate an OpenCL PSy layer instead, PSyclone must be
provided with a transformation script::

psyclone -api "gocean1.0" -s ./ocl_trans.py alg.f90

where ocl_trans.py simply applies the psyclone.transformations.OCLTrans
transformation to the Schedule of the Invoke.

Currently the (Fortran) kernels called by the Invoke must be manually
translated into OpenCL. This step will be automated in a future
release of PSyclone.
78 changes: 78 additions & 0 deletions examples/gocean/eg3/alg.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
! -----------------------------------------------------------------------------
! BSD 3-Clause License
!
! Copyright (c) 2018, Science and Technology Facilities Council.
! All rights reserved.
!
! Redistribution and use in source and binary forms, with or without
! modification, are permitted provided that the following conditions are met:
!
! * Redistributions of source code must retain the above copyright notice, this
! list of conditions and the following disclaimer.
!
! * Redistributions in binary form must reproduce the above copyright notice,
! this list of conditions and the following disclaimer in the documentation
! and/or other materials provided with the distribution.
!
! * Neither the name of the copyright holder nor the names of its
! contributors may be used to endorse or promote products derived from
! this software without specific prior written permission.
!
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
! DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
! FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
! DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
! SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
! CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
! OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
! OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
! -----------------------------------------------------------------------------
! Author: A. R. Porter, STFC Daresbury Lab.

! A simple, single Invoke example to demonstrate the generation of an
! OpenCL driver PSy layer.
program simple

use grid_mod
use field_mod
use compute_cu_mod, only: compute_cu
use compute_cv_mod, only: compute_cv
use compute_z_mod, only: compute_z
use compute_h_mod, only: compute_h
implicit none

type(grid_type), target :: model_grid

type(r2d_field) :: p_fld
type(r2d_field) :: u_fld, v_fld
type(r2d_field) :: cu_fld, cv_fld
type(r2d_field) :: z_fld
type(r2d_field) :: h_fld

integer :: ncycle

model_grid = grid_type(ARAKAWA_C, &
(/BC_PERIODIC,BC_PERIODIC,BC_NONE/), &
OFFSET_SW)

! Create fields on this grid
p_fld = r2d_field(model_grid, T_POINTS)
u_fld = r2d_field(model_grid, U_POINTS)
v_fld = r2d_field(model_grid, V_POINTS)
cu_fld = r2d_field(model_grid, U_POINTS)
cv_fld = r2d_field(model_grid, V_POINTS)
z_fld = r2d_field(model_grid, F_POINTS)
h_fld = r2d_field(model_grid, T_POINTS)

do ncycle=1,itmax

call invoke( compute_cu(CU_fld, p_fld, u_fld), &
compute_cv(CV_fld, p_fld, v_fld), &
compute_z(z_fld, p_fld, u_fld, v_fld), &
compute_h(h_fld, p_fld, u_fld, v_fld) )

end do

end program simple
Loading