Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More RRTMGP performance work #6879

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

More RRTMGP performance work #6879

wants to merge 16 commits into from

Conversation

jgfouca
Copy link
Member

@jgfouca jgfouca commented Jan 8, 2025

Change list:

  1. Changes default RRMTGP backend to Kokkos
  2. Adds new testmods for selecting RRTMGP backend
  3. All kernels in rrtmgp interface can now be timed
  4. Detranspose dimensions in kernels
  5. Use a faster approach for getting random cldx (NON BFB!)
  6. Update rrtmgp submodule

I've attached a screenshot of my custom kernel profiler:
Screenshot 2025-01-08 at 3 04 10 PM

Changepct is difference between YAKL and Kokkos versions of the kernel. Anything less than 100 means kokkos is faster and vice versa. Significant (>25%) speedups are highlighted green; the opposite are highlighted red. The overall time spent in run_impl is about 20% worse with Kokkos. I cannot yet account for this ~3-4 second loss of performance because the overall time spent in kernels is already less with Kokkos.

NOTE: The screenshot and comment above are no longer true. We have parity with YAKL performance now.

[non-BFB] for anything that uses rrtmgp. The random number generator is different.

@jgfouca jgfouca added EAMxx PRs focused on capabilities for EAMxx BFB PR leaves answers BFB labels Jan 8, 2025
@@ -0,0 +1,3 @@
./xmlchange --append SCREAM_CMAKE_OPTIONS='SCREAM_RRTMGP_ENABLE_YAKL Off'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Kokkos is the default, do we need this testmod?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not unless you want to be double-sure that Kokkos is on :)

@jgfouca jgfouca force-pushed the jgfouca/more_rrtmgp_perf branch 2 times, most recently from 81a1e70 to 3751726 Compare January 13, 2025 21:58
@AaronDonahue
Copy link
Contributor

@jgfouca , is this ready for review?

@jgfouca
Copy link
Member Author

jgfouca commented Jan 24, 2025

@AaronDonahue , yes. The current data:

Screenshot 2025-01-24 at 11 36 24 AM

So, basically exact parity with YAKL. @ndkeen is checking a few things for me on pm.

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I added two non-blocking comments (no need to address them)

@@ -527,6 +563,35 @@ static void rrtmgp_main(
extra_clnclrsky_diag, extra_clnsky_diag
);

pool_t::dealloc(sw_band2gpt_mem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little uneasy about having to do a lot of memory management that is distinctly different from other processes, but we can worry about it later

Comment on lines +1178 to +1187
// Kokkos::parallel_for(ncol, KOKKOS_LAMBDA(int icol) {
// conv::Random rand(seeds(icol));
// for (int igpt = 0; igpt < ngpt; igpt++) {
// for (int ilay = 0; ilay < nlay; ilay++) {
// cldx(icol,ilay,igpt) = rand.genFP<RealT>();
// }
// }
// });
TIMED_KERNEL(FLATTEN_MD_KERNEL3(ncol, nlay, ngpt, icol, ilay, igpt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're planning to move to the device-friendly generator, right? Or have you already done so? I can't really tell, but either way a comment would've been nice, but I assume you might plan to revisit this anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the kernel here is the faster one. The issue is that this makes the PR non-BFB, so I left the old code commented out so that I could confirm the rest of the PR is BFB (with debug CPU).

@jgfouca
Copy link
Member Author

jgfouca commented Jan 27, 2025

@ndkeen says initial perlm testing shows we are now at parity with YAKL there as well. I think we are very close to being able to merge this.

@rljacob
Copy link
Member

rljacob commented Jan 30, 2025

notes: waiting for dashboard to clear.

@rljacob rljacob assigned jgfouca and unassigned bartgol Jan 30, 2025
@ndkeen
Copy link
Contributor

ndkeen commented Jan 30, 2025

We still want to be able to switch between yakl/kokkos radiation correct?

@jgfouca
Copy link
Member Author

jgfouca commented Jan 30, 2025

Yes, the YAKL cmake stuff is only turned off if SCREAM_RRTMPG_ENABLE_YAKL is Off.

Copy link

github-actions bot commented Feb 5, 2025

PR Preview Action v1.6.0

🚀 View preview at
https://E3SM-Project.github.io/E3SM/pr-preview/pr-6879/

Built to branch gh-pages at 2025-02-06 22:59 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Change list:
1) Changes default RRMTGP backend to Kokkos
2) Adds new testmods for selecting RRTMGP backend
3) All kernels in rrtmgp interface can now be timed
4) Detranspose dimensions in kernels
5) Use a faster approach for getting random cldx
6) Update rrtmgp submodule
@jgfouca jgfouca force-pushed the jgfouca/more_rrtmgp_perf branch from 25b266e to 8001ef2 Compare February 5, 2025 21:15
@ambrad
Copy link
Member

ambrad commented Feb 7, 2025

@bartgol note we want to rebase #6916 on master when this PR goes in.

@rljacob
Copy link
Member

rljacob commented Feb 7, 2025

@jgfouca is this ready?

@jgfouca jgfouca added non-BFB PR makes roundoff changes to answers. and removed BFB PR leaves answers BFB labels Feb 7, 2025
@jgfouca
Copy link
Member Author

jgfouca commented Feb 7, 2025

@rljacob , yes. I think we are finally at the point where all the CI fails are just expected DIFFs. I will begin integrating.

jgfouca added a commit that referenced this pull request Feb 7, 2025
Change list:
1. Changes default RRMTGP backend to Kokkos
2. Adds new testmods for selecting RRTMGP backend
3. All kernels in rrtmgp interface can now be timed
4. Detranspose dimensions in kernels
5. Use a faster approach for getting random cldx (NON BFB!)
6. Update rrtmgp submodule

[non-BFB] for anything that uses rrtmgp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants