Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nowait and required barriers to satisfy dependencies. #2551

Open
LonelyCat124 opened this issue Apr 18, 2024 · 12 comments
Open

Add nowait and required barriers to satisfy dependencies. #2551

LonelyCat124 opened this issue Apr 18, 2024 · 12 comments
Assignees
Labels
enhancement NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH

Comments

@LonelyCat124
Copy link
Collaborator

Things to be careful of:
Control flow jumps, e.g.
Loops with break clauses - barriers are more complex
barriers inside if statements.

When adding a nowait look forward for dependency and add appropriate barrier if non-satisfiable.
Some similar requirements to #2499 in terms of dependency finding.
Some similar behaviours to OMPTaskwaitTrans.

@LonelyCat124 LonelyCat124 self-assigned this Apr 18, 2024
@LonelyCat124
Copy link
Collaborator Author

N.B. #2039 and #2040 were known issues with the previous implementation, we need to be smarter with control flow for this implementation (easy option is to give up if we aren't sure about the control flow).

@LonelyCat124 LonelyCat124 added enhancement NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH labels May 3, 2024
@arporter arporter changed the title Add nowait to OMPTarget and compute required barriers to satisfy dependencies. Add nowait and required barriers to satisfy dependencies. Jun 19, 2024
JulienRemy pushed a commit that referenced this issue Jun 19, 2024
@sergisiso
Copy link
Collaborator

Summary of today discussion (and the OpenACC version tracked in #1664):
To make it resistant to further IR mutations, the nowait could be a flag attribute expressing the intent to make it asynchronous. Then, the lowering should introduce the necessary waits/barrier to avoid breaking any dependencies.

For OMP this means that

omp target loop nowait
   ...
    A = 1
omp target loop nowait
   ...
   B = 1
...
A 

the Directive lowering will use the dataflow graph connection to look forward and put a wait before "A". Note that this needs to be conservatively, e.g:

  • Wait before a CodeBlock
  • Wait at the end of the subroutine
  • Wait before a control flow jump (returns, breaks, calls - we can later refine this e.g. see if the call is pure, the async only operates on local symbols, ...)

The same logic can be applied for OpenACC to add wait/barriers, but in addition we need to look backwards to previous acc directive to chose the right queue-number to satisfy dependencies. For example:

acc parallel async(1)
  ...
   A = 1
acc parallel async(2)
  ...
  B = 1
wait
A 

Here the second "acc" has to look backwards the the previous "acc" and use queue 1 if there is a dependency to that directive body or a new queue number if there isn't or it finds a wait (restart all queues).

The task depends implementation currently needs "depend" clauses as nodes (not strings generated at the end) because the parent's "OMPSingle" lowering needs to analyse them to insert the proper taskwait. Currently this can not use the foward dataflow check to add taskwaits because in this implementation we mark sections of arrays as dependencies.

@MetBenjaminWent
Copy link

Whilst using the NEMO API on some UM source, it would be helpful to have an option to do a no wait for a OMP parrallel do loop transformation. Some of the UM source has existing OMP which means it maybe slightly more perfomant than psyclone generated clauses without it.
For example: https://code.metoffice.gov.uk/trac/um/browser/main/trunk/src/atmosphere/large_scale_precipitation/ls_ppn.F90
l625 to l665

@LonelyCat124
Copy link
Collaborator Author

Hi @MetBenjaminWent - at the moment this is focused on generating nowaits for omp target directives for NG-Arch - its something we could expand in the future potentially once that works.
#2384 plans to be able to support existing OpenMP directives but I have no idea when that will be completed.

@LonelyCat124
Copy link
Collaborator Author

@arporter are you working on async support for OpenACC (as discussed in WP1 meeting this morning)? We should maybe have a discussion as this might overlap with that work.

@arporter
Copy link
Member

That's a good point @LonelyCat124. I've picked up #2664 which is adding support for the necessary clauses - so "mechanism" rather than "policy". Maybe we should have a walk through it and consider whether and how to generalise it.

@MetBenjaminWent
Copy link

Thanks @LonelyCat124, understood. I'll add my comment context to the other ticket, is it definatly #2384 to support existing OpenMP (for CPU) directives for this, that looks like more ACC kernels?
Or shall I open a new issue for this?

@LonelyCat124
Copy link
Collaborator Author

Thanks @LonelyCat124, understood. I'll add my comment context to the other ticket, is it definatly #2384 to support existing OpenMP (for CPU) directives for this, that looks like more ACC kernels? Or shall I open a new issue for this?

#2384 does mention ACC directives - but the thing preventing it is the same blockage - maybe the best would be a new issue that references 2384 so we're tracking that we need to solve it for OpenMP ideally as well - not sure if Sergi/Andy agree.

@arporter
Copy link
Member

We do now have beta support for retaining comments (but it's off by default for now). I don't know whether we currently distinguish directives from general comments though @JulienRemy ?

I don't think we need a new issue - we have this one for adding support for asynch-related functionality and #2384 for handling existing directives (although it only proposes capturing them as CodeBlocks for now).

@MetBenjaminWent
Copy link

Okay, I'll note on #2384 for now, thanks all.

@LonelyCat124
Copy link
Collaborator Author

re Sergi's original comment: #2551 (comment)

Our basic strategy likely will use the new next/previous_accesses functionality.

When we lower an omp target loop directive with nowait enabled, we need to find all of the symbols it writes to and use next_accesses to find the first access to an output symbol after the end of the contained region.
If there is already a barrier between this loop and that point we can stop - this is a little tricky as the barrier could be inside control flow.
The trick is then finding the correct depth in the code to place the barrier - again the main trick is control flow. Initially we probably want to be over cautious with this.
Finally we need to add a barrier at the end of every Routine where a nowait is applied, which can be done by the first omp target loop to add it I think.

NB. I think if a target loop contains a call or codeblock then we have to disable nowait for that loop.

For OpenACC i think its a bit different - there's also the "which queue does this go in" which can replace the need for some barriers - but then uses previous_access instead (and we need to keep track of queues)

@JulienRemy
Copy link
Collaborator

@arporter Directives should be possible to retain as Node.preceding_comment now, if you set both ignore_comment and ignore_directives to False when creating a FortranReader.
However, I didn't add any specific support for them in the frontend, so they're just comments starting with $ for now. The frontend doesn't yet create nodes of the relevant OpenMP/ACC classes for them.
Also, be aware that the backend adds ! , i.e., with a whitespace as second character, at the beginning of comments so that directives will print out as ! $..., which is invalid. Stated differently, we can parse the directives but they will not make it through. I guess output_source_code_string.replace("! $", "!$") could be attempted... at your own risk :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement NG-ARCH Issues relevant to the GPU parallelisation of LFRic and other models expected to be used in NG-ARCH
Projects
Status: No status
Development

No branches or pull requests

5 participants