Benchmarking recipes (Lauer et al.) #3598

axel-lauer · 2024-05-16T06:56:43Z

Description

This PR implements a set of benchmarking recipes for comparison of different metrics (RMSE, bias, correlation, EMD) calculated for a given model simulation to the results from an ensemble of (model) datasets:

model_evaluation/recipe_model_benchmarking_annual_cycle.yml (annual cycle)
model_evaluation/recipe_model_benchmarking_boxplots.yml (boxplots)
model_evaluation/recipe_model_benchmarking_diurnal_cycle.yml (diurnal cycle)
model_evaluation/recipe_model_benchmarking_maps.yml (map plots of 2-dim variables)
model_evaluation/recipe_model_benchmarking_timeseries.yml (time series)
model_evaluation/recipe_model_benchmarking_zonal.yml (zonal mean plots of 3-dim variables)

For this, the existing monitoring diagnostics monitoring/monitor.py and monitor/multi_datasets.py have been extended.

The new diurnal cycle plot has also been added to the following existing recipes:

Documentation for the benchmarking recipes is available in recipes/recipe_benchmarking.rst, the documentation for monitoring and model evaluation have been updated to include the diurnal cycle plots.

Note for testing

The benchmarking recipes require the new preprocessor functions local_solar_time and distance_metric and the extended version of preprocessor resample_hours.

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🛠 This pull request has a descriptive title
🛠 Code is written according to the code quality guidelines
🛠 Documentation is available
🛠 Tests run successfully
🛠 The list of authors is up to date
🛠 Any changed dependencies have been added or removed correctly
🛠 All checks below this pull request were successful

New or updated recipe/diagnostic

🧪 Recipe runs successfully
🧪 Recipe is well documented
🧪 Figure(s) and data look as expected from literature
🛠 Provenance information has been added

…on_clouds_cycles.yml

…/ESMValTool into benchmarking_maps4monitoring

alistairsellar

Great work @schlunma and @axel-lauer !

Some minor suggestions and questions from me.

I note also that readthedocs build still failing (so I haven't fully reviewed the documentation), and a couple of style complaints in run_tests.

alistairsellar · 2024-06-05T12:55:25Z

esmvaltool/diag_scripts/monitor/monitor.py

@@ -27,6 +27,11 @@
      produce multi panel plots for data with `shape_id` or `region`
      coordinates of length > 1. Supported coordinates: `time`, `shape_id`
      (optional) and `region` (optional).
+    - Diurnal cycle (plot type ``diurnal_cycle``): Generate a diurnal cycle
+      plot (timeseries like climatological from 0 to 24 hours). It will


Suggested change

plot (timeseries like climatological from 0 to 24 hours). It will

plot (timeseries like climatological from 0 to 23 hours). It will

I presume it doesn't show the 0 (=24) hour twice, though that would be a valid choice if it did and was documented as such.

In fact the example diurnal plot in this PR only appears to plot hours 1 to 22 inclusive. Is that a data or code limitation?

That is a data limitation. We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.

That is a data limitation.
Does MIROC not provide a full hourly timeseries? Or some days are missing some hours and the diurnal cycle mean requires them all?

We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.
Thanks, that makes sense.

The 3-hourly CMIP6 output used in the example provides data at 1.30 (0.00-3.00), 4.30 (3.00-6.00), ..., 22.30 (21.00-24.00). CMIP6 data are converted to full hours by preprocessor resample_hours so that they can be compared to ERA5 data (provided at full hours). The preprocessor distance_metric, applied afterwards requires all datasets to have the same time coordinate.

alistairsellar · 2024-06-05T13:02:54Z

esmvaltool/diag_scripts/monitor/monitor.py

+        The hourly climatology is done inside the function so the users can
+        plot both the timeseries and the diurnal cycle in one go
+        """
+        if 'diurnal_cycle' not in self.plots:


This seems a curious place to put the control statement, rather than where the method is called, e.g.

if 'diurnal_cycle' in self.plots: self.plot_diurnal_cycle(cube, var_info)

However, this seems to be an existing design decision for monitor.py prior to this PR, and arguably outside the scope of my science review, so feel free to ignore this comment,

That is right, I just followed the structure that was there already.

alistairsellar · 2024-06-05T13:48:14Z

esmvaltool/diag_scripts/monitor/multi_datasets.py

+        return (plot_path, {netcdf_path: cube})
+
+    def _plot_benchmarking_boxplot(self, df, cubes, variables, datasets):
+        """Plot benchmarking boxplot."""


Somewhere, either here or at a higher level, it would be good to specify that boxplot methodology shows data quartiles and full distribution without outliers, with reference to seaborn.boxplot documentation for further info. Conventions for boxplots can vary, and it seems that Seaborn has its own algorithm for deciding on outliers, so it would be best to make the specifics of implementation discoverable for those who need to see the detail.

@LisaBock Would you be able to address this comment?

alistairsellar · 2024-06-05T13:51:57Z

esmvaltool/diag_scripts/monitor/multi_datasets.py

+        # Make sure that the data has the correct dimensions
+        cube = dataset['cube']


Does this line ensure that the data has the right dimensions?

Also, I understand that _plot_benchmarking_zonal() expects the cube to have zonal mean applied as pre-processor, rather than being a function that makes and plots zonal mean of 3D data. Also, often zonal means are 1D line plots (i.e. zonal mean of a 2D lat-lon field), so scope for confusion there too. Is required shape of data documented?

alistairsellar · 2024-06-05T14:27:39Z

esmvaltool/diag_scripts/monitor/multi_datasets.py

+        """Get dataset to be benchmarked."""
+        variable = datasets[0][self.cfg['group_variables_by']]
+        benchmark_datasets = [d for d in datasets if
+                              d.get('benchmark_dataset', False)]


Should this really be False? It looks like it's selecting datasets that specify benchmark_dataset: false. If correct, please add a comment to avoid confusion.

As far as I understand, the Python function dictionary.get(keyname, value) returns value if the specified key does not exist, so setting value to False is just to make sure that datasets which do not have a 'benchmark_dataset' key are not selected.

alistairsellar · 2024-06-05T14:33:32Z

esmvaltool/diag_scripts/monitor/multi_datasets.py

+                f"'{variable}', got {len(ref_datasets)}")
+        return None
+
+    def _get_benchmark_datasets(self, datasets):


Is there documentation (in code or docs) to indicate that user should specify benchmark_dataset: true in recipe, to select that dataset to be benchmarked? This is clear from example recipes, but should really be captured somewhere beyond those.

Worth noting in that documentation that some plots will (I think) only plot a single benchmark even if multiple specified.

Good point. I'll add that to the documentation.

alistairsellar · 2024-06-05T14:35:00Z

esmvaltool/recipes/model_evaluation/recipe_model_benchmarking_annual_cycle.yml

+          annual_cycle:
+            annual_mean_kwargs: False
+            plot_kwargs:
+              'MIROC6':


Does the choice of model here need to be consistent with selection of benchmark_dataset above?

It would make sense to be consistent if the benchmark_dataset is to be plotted with a special line color and style. I'll add this to the documentation.

axel-lauer and others added 30 commits January 15, 2024 14:25

quick and dirty implementation of diurnal cycle

94c826e

added diurnal cycle plot to monitor.py

7c7a2e6

added diurnal cycle example to model_evaluation/recipe_model_evaluati…

8a73469

…on_clouds_cycles.yml

added docu examples diurnal cycle

2daf3a9

fixed style issues in monitor/multi_datasets.py

244d8d8

fixed typo in docu example

f45b25d

draft version of first benchmarking recipe (maps)

530106e

snapshot 2024-02-01

c8aa55c

Merge branch 'main' into diurnal_cycle

080b8f5

snapshot 2024-02-02

40d9167

first working version

e707342

fixed some flake8 issues

904b291

adding benchmarking boxplot

a7ab4e4

Merge branch 'benchmarking_boxplot' into benchmarking_maps4monitoring

b9b0a40

extract plotting function

b25b9c6

added draft of recipe_model_benchmarking_timeseries.yml

ec4b1c1

fix filename

2438b26

Merge branch 'benchmarking_maps4monitoring' of github.com:ESMValGroup…

83d972e

…/ESMValTool into benchmarking_maps4monitoring

boxplots for more variables

30b8453

mv recipe

b864979

added zonal mean benchmarking plot

dddc3a5

merged with lastest branch

a8c5e1e

fixed some flake8 issues

d154eed

updated zonal mean benchmarking recipe

128a77e

addressing review comments

4ccc12c

Merge branch 'main' into diurnal_cycle

a99b522

clean recipe

413cb61

add var order and different distance metrics

ed1e991

first version of plot benchmarking_timeseries

1241f20

added benchmarking annual cycle plot

dff982e

axel-lauer added 11 commits May 8, 2024 09:03

added docu draft (no images)

70012f9

merged with branch diurnal_cycle

ec9a1d9

fixed merging conflicts

69337fe

added example plots for benchmarking recipes

7f4dbdf

updated docu

fc99b85

updated recipes

d4a75e1

fixed some flake8 and pylint issues

f807a0e

added zorder in _plot_benchmarking_boxplot

be0f566

fixed style issue in cloud.ncl

6592cc1

updated docu figures

13b5312

Merge branch 'main' into benchmarking_maps4monitoring

6ac3bae

axel-lauer added metric looking for technical reviewer looking for scientific reviewer new recipe Use this label if you are adding a new recipe labels May 16, 2024

axel-lauer added this to the v2.12.0 milestone May 16, 2024

axel-lauer requested review from LisaBock, schlunma and hb326 May 16, 2024 06:56

axel-lauer requested review from alistairsellar May 27, 2024 10:57

alistairsellar removed the looking for scientific reviewer label May 27, 2024

axel-lauer and others added 5 commits May 28, 2024 15:24

Update multi_datasets.py

ffb8b8e

Update recipe_benchmarking.rst

d5c5375

Merge branch 'main' into benchmarking_maps4monitoring

4723d44

Update recipe_benchmarking.rst

eae63ca

Merge branch 'main' into benchmarking_maps4monitoring

714b349

alistairsellar requested changes Jun 5, 2024

View reviewed changes

axel-lauer added 2 commits June 7, 2024 08:12

Update recipe_benchmarking.rst

7a3c844

Update docu (recipe_benchmarking.rst)

1fbdf10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking recipes (Lauer et al.) #3598

Benchmarking recipes (Lauer et al.) #3598

axel-lauer commented May 16, 2024 •

edited

Loading

alistairsellar left a comment

alistairsellar Jun 5, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

alistairsellar Jun 11, 2024

axel-lauer Jun 12, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

alistairsellar Jun 5, 2024

alistairsellar Jun 5, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

alistairsellar Jun 5, 2024

axel-lauer Jun 11, 2024

	plot (timeseries like climatological from 0 to 24 hours). It will
	plot (timeseries like climatological from 0 to 23 hours). It will

		# Make sure that the data has the correct dimensions
		cube = dataset['cube']

Benchmarking recipes (Lauer et al.) #3598

Are you sure you want to change the base?

Benchmarking recipes (Lauer et al.) #3598

Conversation

axel-lauer commented May 16, 2024 • edited Loading

Description

Note for testing

Checklist

alistairsellar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axel-lauer commented May 16, 2024 •

edited

Loading