Skip to content

cp4cds/cmip6_qc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cmip6_qc

CF quality control of CMIP6 data data CEDA

To get started you will need: a list of dataset ids and a list of variables in json format.

1. Run the QC over the identified datsets for C3S release

Ag will generate a list of dataset ids that require CF checking, you can then use python cfchecker_run_all.py --file <dataset_ids-file> --qc_check cfchecker

Note- this script will tell you which datasets are 'missing' ie. not in the CEDA Archive. You can run this script multiple times and it will not repeat the QC as long as there are log files showing the QC was already run.

If using the --file option this calls directly:

  • cfchecker_run_unit.py

Running without a file uses ABC unit approach calling batch and chunk.

cfchecker_run_unit.py sends the jobs to Lotus. Each dataset is sent to lotus as some datasets are large. To see the jobs in lotus run 'squeue -u '

This produces a CF results file in the form of a psv file in a directory called qc_logs

2 Combine the CF results using

Once you have all the QC results of the datasets you want to check (no 'missing') then run:

  • nohup ./create_expt_psvs.sh > /outdir/ofile.out &
  • nohup ./create_model_psvs.sh > /outdir/ofile.out &

These can take a while to run.

3 Generate a combined results file

Run python generate_c3s-34g_dataframe.py --create

TODO: re-write this so you understand what is happening

4. Complete the QC_template.json template.

A QC_template.json file will be provided. Use the script complete_json_release_template.py to fill out this template with the QC results.

Once complete name the output file QC_cfchecker.json and upload to the appropriate github branch of https://github.com/cp4cds/c3s_34g_qc_results/ in directory QC_Results. Although a local copy already exists on the system others may be working on this. You can add your results file via a pull request or you can clone your own copy and push as others

Some other potentially useful files

  • check_for_missing_psvfiles.py: this takes the dataset ids and makes sure that the output psv file exists for a dataset. If not then you can rerun the cfchecker_run_all.py
  • _*py older files that may have useful information - when you are confident you don't need these then remove from github and local repo.

About

Quality control of CMIP6 data data CEDA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published