Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving datasets from Tier 3 to Tier 2 when approved by providers #3331

Open
9 tasks
remi-kazeroni opened this issue Aug 17, 2023 · 3 comments · May be fixed by #3624
Open
9 tasks

Moving datasets from Tier 3 to Tier 2 when approved by providers #3331

remi-kazeroni opened this issue Aug 17, 2023 · 3 comments · May be fixed by #3624

Comments

@remi-kazeroni
Copy link
Contributor

There is a number of Tier 3 datasets (see status in the table below) for which we (@hb326 and myself) got the approval from data providers to allow access to our users on shared machines (e.g. DKRZ, Jasmin, ...). At the moment, CMORized data stored on Levante are restricted to group members (i.e. core developers) if Tier 3 and with read access to anyone if Tier 2. To ease data access for our users, we first need to check with data providers if more Tier 3 datasets could be labelled Tier 2 instead. This would allow us to open up the access at DKRZ and include more datasets into the synchronization pipeline between main HPC centers where ESMValTool is developed (see #2630 for separate discussion).

Changing the Tier of a dataset may lead to various backward incompatibility issues. Recipes using that dataset will need to be updated. Users having their local pool of CMORized data will also need to move some datasets to Tier2 on their own. A possibility to circumvent that could be to make the tier key in recipes optional. See ESMValGroup/ESMValCore#2112 for separate discussion.

On the Tool side, a number of things will need to be done when moving a dataset from Tier3 to Tier2. Here is a tentative checklist:

  • Open a GitHub issue about the affected dataset
  • Update the documentation, the CMORizer and rerun the CMORizer (or at least change the tier attribute in the metadata).
  • Add an item to the review checklist to ask if recipe developers have used the right tier key for affected datasets
  • Move the CMORized data in OBS/Tier2 (specific to DKRZ)
  • Make a symlink to OBS/Tier3 to avoid breaking recipes under development (specific to DKRZ)
  • Run the synchronization script to ship data to Jasmin and other connected systems where OBS/Tier2 data are mirrored (specific to DKRZ)
  • Remove the symlink after 2 releases (specific to DKRZ)
  • Move the corresponding raw data in RAWOBS/Tier2 (specific to DKRZ)
  • Add a message to the log files that "Missing OBS data" could be due to the change of Tier and inform users what to do.
@remi-kazeroni
Copy link
Contributor Author

Here is the status of our efforts to contact data providers of our Tier3 datasets to check if those could be made Tier2:

Tier3 dataset Provider contacted Answer Remarks Moved to Tier2
APHRO-MA Yes N/A looks possible based on their website -
AURA-TES Yes N/A - -
CALIPSO-ICECLOUD Yes Yes - -
CDS-SATELLITE-ALBEDO No - Licenses to be checked for CDS datasets -
CDS-SATELLITE-LAI-FAPAR No - - -
CDS-SATELLITE-SOIL-MOISTURE No - - -
CDS-UERRA No - - -
CDS-XCH4 No - - -
CDS-XCO2 No - - -
CERES-SYN1deg No - - -
CLARA-AVHRR No - - -
CLOUDSAT-L2 No - - -
ERA-Interim No - Data license should allow it -
ERA-Interim-Land No - Data license should allow it -
ERA5 No - Data license should allow it (done by DKRZ, CEDA and other HPCs) -
ESACCI-WATERVAPOUR No - only preliminary version currently supported -
FLUXCOM Yes Yes - -
GRACE No - - -
HWSD No - - -
JMA-TRANSCOM No - - -
LAI3g No - - -
LandFlux-EVAL No - - -
MAC-LWP Yes Yes - -
MERRA2 Yes Yes - -
MERRA No - Same answer as for MERRA2? -
MLS-AURA Yes Yes - -
MODIS No - - -
MTE Yes Yes - -
NDP No - - -
NIWA-BS Yes Yes - -
NSIDC-0116-* No - - -
UWisc Yes Yes predecessor of MAC-LWP -

(in bold font: datasets that could be moved to Tier2 right away)

@remi-kazeroni
Copy link
Contributor Author

Attention: @rswamina. This is related to our ongoing discussion on observational datasets in ESMValTool (ESMValGroup/Community#70)

@alistairsellar
Copy link
Contributor

I was thinking about some of the redistribution implications of this. Two thoughts in particular:

  • Before the above is implemented, we should review / update the ESMValTool documentation to ensure that it doesn't imply that all Tier 2 datasets are freely available. Otherwise a reader may take this to mean that we are giving them permission to do something that they don't actually have permission for.
  • If a data pool owner obtains one of these new Tier 2 datasets via redistribution, it might be best if they can be given a copy of the dataset owner's permission statement/email, as a record that they have permission to have obtained the data in this way.

@axel-lauer axel-lauer linked a pull request May 28, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants