-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More RRTMGP performance work #6879
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,3 @@ | |||
./xmlchange --append SCREAM_CMAKE_OPTIONS='SCREAM_RRTMGP_ENABLE_YAKL Off' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Kokkos is the default, do we need this testmod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not unless you want to be double-sure that Kokkos is on :)
81a1e70
to
3751726
Compare
@jgfouca , is this ready for review? |
@AaronDonahue , yes. The current data: ![]() So, basically exact parity with YAKL. @ndkeen is checking a few things for me on pm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I added two non-blocking comments (no need to address them)
@@ -527,6 +563,35 @@ static void rrtmgp_main( | |||
extra_clnclrsky_diag, extra_clnsky_diag | |||
); | |||
|
|||
pool_t::dealloc(sw_band2gpt_mem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little uneasy about having to do a lot of memory management that is distinctly different from other processes, but we can worry about it later
// Kokkos::parallel_for(ncol, KOKKOS_LAMBDA(int icol) { | ||
// conv::Random rand(seeds(icol)); | ||
// for (int igpt = 0; igpt < ngpt; igpt++) { | ||
// for (int ilay = 0; ilay < nlay; ilay++) { | ||
// cldx(icol,ilay,igpt) = rand.genFP<RealT>(); | ||
// } | ||
// } | ||
// }); | ||
TIMED_KERNEL(FLATTEN_MD_KERNEL3(ncol, nlay, ngpt, icol, ilay, igpt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're planning to move to the device-friendly generator, right? Or have you already done so? I can't really tell, but either way a comment would've been nice, but I assume you might plan to revisit this anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the kernel here is the faster one. The issue is that this makes the PR non-BFB, so I left the old code commented out so that I could confirm the rest of the PR is BFB (with debug CPU).
@ndkeen says initial perlm testing shows we are now at parity with YAKL there as well. I think we are very close to being able to merge this. |
notes: waiting for dashboard to clear. |
We still want to be able to switch between yakl/kokkos radiation correct? |
Yes, the YAKL cmake stuff is only turned off if SCREAM_RRTMPG_ENABLE_YAKL is Off. |
|
Change list: 1) Changes default RRMTGP backend to Kokkos 2) Adds new testmods for selecting RRTMGP backend 3) All kernels in rrtmgp interface can now be timed 4) Detranspose dimensions in kernels 5) Use a faster approach for getting random cldx 6) Update rrtmgp submodule
25b266e
to
8001ef2
Compare
@jgfouca is this ready? |
@rljacob , yes. I think we are finally at the point where all the CI fails are just expected DIFFs. I will begin integrating. |
Change list: 1. Changes default RRMTGP backend to Kokkos 2. Adds new testmods for selecting RRTMGP backend 3. All kernels in rrtmgp interface can now be timed 4. Detranspose dimensions in kernels 5. Use a faster approach for getting random cldx (NON BFB!) 6. Update rrtmgp submodule [non-BFB] for anything that uses rrtmgp
Change list:
I've attached a screenshot of my custom kernel profiler:
![Screenshot 2025-01-08 at 3 04 10 PM](https://private-user-images.githubusercontent.com/7292036/401323214-4b81f9f7-5190-4dec-83fc-4f29079e07d5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMzYxODAsIm5iZiI6MTczOTIzNTg4MCwicGF0aCI6Ii83MjkyMDM2LzQwMTMyMzIxNC00YjgxZjlmNy01MTkwLTRkZWMtODNmYy00ZjI5MDc5ZTA3ZDUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxMSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTFUMDEwNDQwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDNmZWNjMzRhMmQ4NDAxOWMyY2Q3YTM2OGM3ZjlhYjRiOWY4MDM5MjFkYjdlOTZlYTA3ZTYyODFmZDYwNmJhNCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.bc7sAt-pTsX2rzxLfGjLgs-xKhdQFK6m3GgQjwo_f4c)
Changepct is difference between YAKL and Kokkos versions of the kernel. Anything less than 100 means kokkos is faster and vice versa. Significant (>25%) speedups are highlighted green; the opposite are highlighted red. The overall time spent in run_impl is about 20% worse with Kokkos. I cannot yet account for this ~3-4 second loss of performance because the overall time spent in kernels is already less with Kokkos.
NOTE: The screenshot and comment above are no longer true. We have parity with YAKL performance now.
[non-BFB] for anything that uses rrtmgp. The random number generator is different.