-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ClimaTimeSteppers to 0.8.1 #3542
Conversation
Some jobs failed because of the |
Xref. I can open a new PR in ClimaCore, but we’ll need another release |
We need to upgrade to the latest ClimaCore, too. |
Sounds good, thanks! |
Actually, it turned out that we just weren't using the latest ClimaCore, which is required for the latest version of ClimaTimesteppers. I'm upgrading all of the dependencies in #3543. |
We could alternatively do that here, but it might be nice to confirm that nothing else is behavior changing, or at least merge the PRs separately. |
Yes, that’s still needed. |
f0847b6
to
de12906
Compare
Testing GPU longruns here: https://buildkite.com/clima/climaatmos-gpulongruns/builds/479 |
Any idea why only one reproducibility test fails? I would assume this changes MSE for all jobs? |
One of the longruns that was stable before breaks: https://buildkite.com/clima/climaatmos-gpulongruns/builds/479#0194a03f-a47b-4279-80ad-3b078615026d. I still want to get this in though because this is much better than what we currently have (with ClimaTimeSteppers 0.7.39). |
I'm still confused about the reproducibility test though. For example, It doesn't change the MSE for the moist baroclinic wave in the ci, but changes the stability of the moist baroclinic wave in the longrun, which is just a higher resolution version of the ci job. |
I'm not sure why this change would hurt longrun stability, but my impression is that it will only have a significant effect when the implicit tendency is large or there are many Newton iterations. If those conditions aren't satisfied, it might take a very long time for the conservation errors at element boundaries to build up? I think the only way to know for sure if something is wrong with the way we do DSS in v0.8.0 is to try it out with the analytic mountain test next week. For now, though, do you think it's safe to bump the ref counter and version number and merge this in? It's definitely more stable than with v0.7.39. |
Yes, I'm fine with merging this in. One GPU job keeps running out of memory though. |
I think that might be related to one of @charleskawczynski's recent ClimaCore changes? It doesn't seem to be too much of a memory increase, so for now I'll just double the memory allocation for that job. |
I don't think so, the ClimaCore updates were already merged in #3543, and that didn't require a ref counter increment. I'm fine with bumping the ref counter if CTS 0.8.0 is more stable.
This is definitely puzzling. I would expect changes for all the sphere cases:
I think I would need to review the CTS dss PR more carefully to understand how things are changing. |
Ah, I think one difference is that only Diagnostic EDMFX aquaplanet with TKE is run with ARS343. All others are run with SSPKnoth. Could that be the reason? |
That definitely seems relevant. I'd bet so. |
Hmm, if the issue isn't with ClimaCore, I'm a bit surprised that the ClimaTimeSteppers update would cause a memory increase. All I did in the switch to v0.8.0 was rename some things and add an extra call to DSS, and I changed the order of some computations. Maybe the modifications to the As for your observation, @szy21, you're right the lack of MSE changes is because Rosenbrock didn't change from v0.7.39 to v0.8.0. Do the current Rosenbrock results look correct to you? I don't think we need an extra DSS call there, but I also don't have a very good intuition for the Rosenbrock method. In addition, I've thought of a slight variation to the ARK algorithm from v0.8.0. I'm currently testing it out with the GPU longruns here: https://buildkite.com/clima/climaatmos-gpulongruns/builds/481. If it looks more stable than the current version, we can make one final update to ClimaTimeSteppers. |
They look good to me. But given we are running dycore simulations with ARS, I will change the ci to use ARS as well. I will do that after this PR is in.
The one job that failed before hasn't failed yet, so I guess this version is slightly more stable. |
de12906
to
40c737d
Compare
PR has been updated to use 0.8.1, whose GPU longruns were tested here: https://buildkite.com/clima/climaatmos-gpulongruns/builds/481 I'm going to wait until #3548 is done so that coupler work isn't delayed, and then I'll rebase and merge this in. |
40c737d
to
57534e2
Compare
24eb433
to
88168a3
Compare
Can this be merged now? |
Merging this manually (#3556 should fix the problem with the non responding test) |
Thank you! |
Purpose
This PR updates ClimaTimeSteppers to version 0.8.1, which differs from 0.7.40 in that the
post_explicit!
andpost_implicit!
callbacks are renamed tocache!
andcache_imp!
. In the future, we can explore splitting the cache in a similar manner to the autodiff debugging PR.