-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking recipes (Lauer et al.) #3598
base: main
Are you sure you want to change the base?
Conversation
…on_clouds_cycles.yml
…/ESMValTool into benchmarking_maps4monitoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @schlunma and @axel-lauer !
Some minor suggestions and questions from me.
I note also that readthedocs build still failing (so I haven't fully reviewed the documentation), and a couple of style complaints in run_tests.
@@ -27,6 +27,11 @@ | |||
produce multi panel plots for data with `shape_id` or `region` | |||
coordinates of length > 1. Supported coordinates: `time`, `shape_id` | |||
(optional) and `region` (optional). | |||
- Diurnal cycle (plot type ``diurnal_cycle``): Generate a diurnal cycle | |||
plot (timeseries like climatological from 0 to 24 hours). It will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plot (timeseries like climatological from 0 to 24 hours). It will | |
plot (timeseries like climatological from 0 to 23 hours). It will |
I presume it doesn't show the 0 (=24) hour twice, though that would be a valid choice if it did and was documented as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact the example diurnal plot in this PR only appears to plot hours 1 to 22 inclusive. Is that a data or code limitation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a data limitation. We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a data limitation.
Does MIROC not provide a full hourly timeseries? Or some days are missing some hours and the diurnal cycle mean requires them all?
We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.
Thanks, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3-hourly CMIP6 output used in the example provides data at 1.30 (0.00-3.00), 4.30 (3.00-6.00), ..., 22.30 (21.00-24.00). CMIP6 data are converted to full hours by preprocessor resample_hours
so that they can be compared to ERA5 data (provided at full hours). The preprocessor distance_metric,
applied afterwards requires all datasets to have the same time coordinate.
The hourly climatology is done inside the function so the users can | ||
plot both the timeseries and the diurnal cycle in one go | ||
""" | ||
if 'diurnal_cycle' not in self.plots: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a curious place to put the control statement, rather than where the method is called, e.g.
if 'diurnal_cycle' in self.plots:
self.plot_diurnal_cycle(cube, var_info)
However, this seems to be an existing design decision for monitor.py
prior to this PR, and arguably outside the scope of my science review, so feel free to ignore this comment,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right, I just followed the structure that was there already.
return (plot_path, {netcdf_path: cube}) | ||
|
||
def _plot_benchmarking_boxplot(self, df, cubes, variables, datasets): | ||
"""Plot benchmarking boxplot.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere, either here or at a higher level, it would be good to specify that boxplot methodology shows data quartiles and full distribution without outliers, with reference to seaborn.boxplot documentation for further info. Conventions for boxplots can vary, and it seems that Seaborn has its own algorithm for deciding on outliers, so it would be best to make the specifics of implementation discoverable for those who need to see the detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LisaBock Would you be able to address this comment?
# Make sure that the data has the correct dimensions | ||
cube = dataset['cube'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line ensure that the data has the right dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I understand that _plot_benchmarking_zonal()
expects the cube to have zonal mean applied as pre-processor, rather than being a function that makes and plots zonal mean of 3D data. Also, often zonal means are 1D line plots (i.e. zonal mean of a 2D lat-lon field), so scope for confusion there too. Is required shape of data documented?
"""Get dataset to be benchmarked.""" | ||
variable = datasets[0][self.cfg['group_variables_by']] | ||
benchmark_datasets = [d for d in datasets if | ||
d.get('benchmark_dataset', False)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this really be False? It looks like it's selecting datasets that specify benchmark_dataset: false
. If correct, please add a comment to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, the Python function dictionary.get(keyname, value)
returns value
if the specified key does not exist, so setting value
to False
is just to make sure that datasets which do not have a 'benchmark_dataset' key are not selected.
f"'{variable}', got {len(ref_datasets)}") | ||
return None | ||
|
||
def _get_benchmark_datasets(self, datasets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there documentation (in code or docs) to indicate that user should specify benchmark_dataset: true
in recipe, to select that dataset to be benchmarked? This is clear from example recipes, but should really be captured somewhere beyond those.
Worth noting in that documentation that some plots will (I think) only plot a single benchmark even if multiple specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add that to the documentation.
annual_cycle: | ||
annual_mean_kwargs: False | ||
plot_kwargs: | ||
'MIROC6': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the choice of model here need to be consistent with selection of benchmark_dataset
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to be consistent if the benchmark_dataset is to be plotted with a special line color and style. I'll add this to the documentation.
Description
This PR implements a set of benchmarking recipes for comparison of different metrics (RMSE, bias, correlation, EMD) calculated for a given model simulation to the results from an ensemble of (model) datasets:
For this, the existing monitoring diagnostics monitoring/monitor.py and monitor/multi_datasets.py have been extended.
The new diurnal cycle plot has also been added to the following existing recipes:
Documentation for the benchmarking recipes is available in recipes/recipe_benchmarking.rst, the documentation for monitoring and model evaluation have been updated to include the diurnal cycle plots.
Note for testing
The benchmarking recipes require the new preprocessor functions local_solar_time and distance_metric and the extended version of preprocessor resample_hours.
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
New or updated recipe/diagnostic