-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve timestep performance further: numba
or not?
#57
Comments
@sjordan29 did some performance profiling for a test case of ClearWater-modules TSM coupled with ClearWater-riverine transport. See https://github.com/EcohydrologyTeam/ClearWater-riverine/blob/43-integrate-tsm/examples/dev_sandbox/43_Realistic_TSM_Scenario.ipynb. She shared her cProfile results (profile.prof), which I explored in SnakeViz. It did a great job of identifying which functions in our code were the bottlenecks. From what I see, the top 3 were
Below that, the time is spent primarily in xarray merge, copy, and init (create) functions such as:
Some of these take so much time because they are called a lot of times. So, the key progress will be made by reducing our use of xarray functions that call these merge/copy/init functions. @xaviernogueira, the profile that Sarah shared with us yesterday was https://github.com/pyutils/line_profiler, which can tell you what is happening line by line within a function. So it complements the cProfile/Snakeviz perspectives that you have used. Please do some line profiling of these functions:
I also want to go over the approach of taking slices of a pre-existing array for reading/writing values. This is a combination of your bottom two checkboxes above, which should be combined into a single solution. Let's discus this. |
Numba vs no-numbaNumbaI noticed that (as one might expect with understanding of JIT) running one timestep is much slower (per timestep) than running many. This is because the Just-In-Time compilation done by numba only runs the first time thru a loop (full TSM calculations).
Notice how the the majority of the run-time is the first timestep / JIT compilation! The more timesteps being ran, the smaller % of total run-time the initial compilation becomes. No-NumbaPerhaps due to the # of process functions that get JIT compiled in the first loop, the compilation stage does significantly slow things down!
Note that the speed per timestep flattens out at around 0.024 seconds. Therefore many timestep calculations can are completed in a linear fashion, appox (first_timestep_time + (total_timesteps - 1)* 0.24) on my machine. Discussion
|
Typical models have 10,000s to 100,000s of time steps. |
@PeterKlaver got it, just added a 10k timestep test |
It's not unusual to do water quality simulations over a full recreation season, say, to evaluate compliance with recreational use criteria. That's seven months - if the model time step is 30 seconds, the total N is ~ 600,000. If the time step is 10 seconds, N ~ 1,800,000. You can do the math. But if you see linear scaling up to around 10K it is probably okay to extrapolate from there. |
Strategy of writing to a pre-initialized set of
|
Spinning off #57 (comment) into a new issue: |
numba
or not?
Based on @xaviernogueira's Numba vs no-numba test findings described in #57 (comment), he found that:
This surprising result is is probably due to the built-in performance of using "vectorized" calculations enabled by The difference is negligible of after about 10,000 time steps, so this doesn't deserve a huge effort to refactor away from That said, this does provide evidence that we should slowly start moving away from NOTE: There is still a case for using |
Closing this issue, which was about exploring approaches to improving performance. With the results described above, we created the following issues for tracking fixes: |
@sjordan29 found TSM to be too slow -> it was nearly 80% of the time spent compared to 20% on ClearWater-riverine, when running over a grid of flow fields from HEC-RAS-2D.
Ideas to test:
np.nan
and writing to that may be faster than writing over existing values (current behavior)? It's worth checking, especially if we can pre-init many time steps.Model.__init__
, and then the timestep you want to write onto can be passed as an argument into the run timestep function. This may also increase performance.xarray
time coordinates and variables #68The text was updated successfully, but these errors were encountered: