Skip to content

This repository contains a command line tool to help create an inventory of physical systems observations for data assimilation. This inventory will be created from observation files located in both the BDP's aws s3 object store and Hera's hpss tape storage and will be stored in an sqlite3 database.

License

Notifications You must be signed in to change notification settings

cohen-seth/observation-inventory-utils

 
 

Repository files navigation

Tools and utilities to catalogue and evaluate observation files.

Installation and Environment Setup

Some of the observation inventory utils functionality relies on intel compilers, the nceplibs-bufr repo managed by EMC, and the UFS-RNR anaconda python module. As such, these tools/modules must also be installed.

  1. Clone the observation-inventory-utils repo.
$ git clone https://github.com/noaa-psd/observation-inventory-utils.git
  1. Install or setup the UFS-RNR anaconda3 python environment. If the UFS-RNR anaconda python environment is not already available, install it now using the instructions found in the UFS-RNR-stack repo. An example of the actual installation script can be found at scripts/build.UFS-RNR-stack.RDHPCS-Hera.anaconda3.sh. If you are installing this anaconda module at a different location than what is configured in the above script, you will need to make modifications such that the install location matches your desired location. You will also need to specify aws credentials for PSL's private s3 bucket.

  2. Load the anaconda3 environment

$ module purge
$ module use -a /contrib/home/builder/UFS-RNR-stack/modules
$ module load anaconda3
  1. Load the Intel 2018 compiler.
$ module load intel/18.0.5.274
$ module load impi/2018.4.274
  1. Point to where the forked version of NCEPLIBS-bufr repo is installed or build those executables by following the How to Build and Install instructions. In other words, add the location of the NCEPLIBS-bufr executables in your PATH environment variable. Examples of these utilities exist here: https://github.com/jack-woollen/NCEPLIBS-bufr/tree/develop/utils
$ export PATH=/contrib/home/builder/nceplibs-bufr/build/utils:$PATH

Table Schemas

The observation inventory utilities package primarily is capable of three functions, search PSL's observation archive, plot each file types file_size over the date range January 1st 1990 to January 1st 2021, and harvest the observation counts specified within each file type. The results from the two different queries are stored in an sqlite database for easy access at a later time. The results of each command given to determine the file meta of all observation files related to a given cycle is also stored in a table cmd_results.

cmd_results

CREATE TABLE cmd_results (
	cmd_result_id INTEGER NOT NULL, 
	command VARCHAR, 
	arg0 VARCHAR, 
	raw_output VARCHAR, 
	raw_error VARCHAR, 
	error_code VARCHAR, 
	obs_day DATETIME, 
	submitted_at DATETIME, 
	latency VARCHAR, 
	inserted_at DATETIME, 
	PRIMARY KEY (cmd_result_id)
obs_inventory

CREATE TABLE obs_inventory (
	cmd_result_id INTEGER NOT NULL, 
	obs_id INTEGER NOT NULL, 
	filename VARCHAR, 
	parent_dir VARCHAR, 
	platform VARCHAR, 
	s3_bucket VARCHAR, 
	prefix VARCHAR, 
	cycle_tag VARCHAR, 
	data_type VARCHAR, 
	cycle_time INTEGER, 
	obs_day DATETIME, 
	data_format VARCHAR, 
	suffix VARCHAR, 
	nr_tag BOOLEAN, 
	file_size INTEGER, 
	permissions VARCHAR, 
	last_modified DATETIME, 
	inserted_at DATETIME, 
	PRIMARY KEY (obs_id), 
	FOREIGN KEY(cmd_result_id) REFERENCES cmd_results (cmd_result_id)
obs_meta_nceplibs_bufr

CREATE TABLE obs_meta_nceplibs_bufr (
	meta_id INTEGER NOT NULL, 
	obs_id INTEGER NOT NULL, 
	cmd_result_id INTEGER NOT NULL, 
	cmd_str VARCHAR, 
	sat_id INTEGER, 
	sat_id_name VARCHAR, 
	obs_count INTEGER, 
	sat_inst_id INTEGER, 
	sat_inst_desc VARCHAR, 
	filename VARCHAR, 
	file_size INTEGER, 
	obs_day DATETIME, 
	inserted_at DATETIME, 
	PRIMARY KEY (meta_id), 
	FOREIGN KEY(obs_id) REFERENCES obs_inventory (obs_id), 
	FOREIGN KEY(cmd_result_id) REFERENCES cmd_results (cmd_result_id)
obs_meta_nceplibs_prepbufr

CREATE TABLE obs_meta_nceplibs_prepbufr (
	meta_id INTEGER NOT NULL, 
	obs_id INTEGER NOT NULL, 
	cmd_result_id INTEGER NOT NULL, 
	cmd_str VARCHAR, 
	typ INTEGER, 
  tot INTETER,
	qm0thru3 INTEGER,
  qm4thru7 INTEGER,
  qm8 INTEGER,
  qm9 INTEGER,
  qm10 INTEGER,
  qm11 INTEGER,
  qm12 INTEGER,
  qm13 INTEGER,
  qm14 INTEGER,
  qm15 INTEGER,
  cka INTEGER,
  ckb INTEGER, 
	filename VARCHAR, 
	file_size INTEGER, 
	obs_day DATETIME, 
	inserted_at DATETIME, 
	PRIMARY KEY (meta_id), 
	FOREIGN KEY(obs_id) REFERENCES obs_inventory (obs_id), 
	FOREIGN KEY(cmd_result_id) REFERENCES cmd_results (cmd_result_id)
obs_meta_nceplibs_prepbufr_aggregate

CREATE TABLE obs_meta_nceplibs_prepbufr (
	meta_id INTEGER NOT NULL, 
	obs_id INTEGER NOT NULL, 
	cmd_result_id INTEGER NOT NULL, 
	cmd_str VARCHAR, 
	typ INTEGER, 
  tot INTETER,
	qm0thru3 INTEGER,
  qm4thru7 INTEGER,
  qm8 INTEGER,
  qm9 INTEGER,
  qm10 INTEGER,
  qm11 INTEGER,
  qm12 INTEGER,
  qm13 INTEGER,
  qm14 INTEGER,
  qm15 INTEGER,
  cka INTEGER,
  ckb INTEGER, 
	filename VARCHAR, 
	file_size INTEGER, 
	obs_day DATETIME, 
	inserted_at DATETIME, 
	PRIMARY KEY (meta_id), 
	FOREIGN KEY(obs_id) REFERENCES obs_inventory (obs_id), 
	FOREIGN KEY(cmd_result_id) REFERENCES cmd_results (cmd_result_id)

Information regarding the values in the columns for the prepbufr tables can be found from the EMC documentation for typ and quality markers

Example row data from the cmd_results table. Note: an error_code with a negative value indicates that the command failed. The reason for the command failure can be found in the raw_error column.

$ sqlite3  observations_inventory.db

sqlite> select cmd_result_id, command, arg0, error_code, obs_day, latency, submitted_at from cmd_results order by inserted_at desc limit 1;

cmd_result_id  command     arg0                                                                                                                  error_code  obs_day                     latency     submitted_at              
-------------  ----------  ---------------------------------------------------------------------------------------------                         ----------  --------------------------  ----------  --------------------------
51             sinv        /lustre/work/1abffaa1-7ee1-4e8c-9f46-e15d33e04f8f/20150103/gdas.t06z.1bhrs4.tm00.bufr_d/gdas.t06z.1bhrs4.tm00.bufr_d  0           2015-01-03 06:00:00.000000  0.300152    2022-07-05 16:44:07.318004

Example row data from the obs_inventory table.

$ sqlite3 observations_inventory.db

sqlite> select obs_id, filename, platform, s3_bucket, cycle_tag, obs_day, file_size, last_modified, inserted_at from obs_inventory order by inserted_at desc limit 3;

obs_id      filename                         platform    s3_bucket            cycle_tag       obs_day                     file_size   last_modified               inserted_at               
----------  -------------------------------  ----------  -------------------  --------------  --------------------------  ----------  --------------------------  --------------------------
601         gefsv13_bufr.20150103120000.md5  aws_s3      noaa-reanalyses-pds  20150103120000  2015-01-03 12:00:00.000000  2518        2022-05-25 23:03:12.000000  2022-07-05 16:43:32.124680
600         gdas.t12z.vadwnd.tm00.bufr_d.nr  aws_s3      noaa-reanalyses-pds  t12z            2015-01-03 12:00:00.000000  378872      2022-05-25 23:03:12.000000  2022-07-05 16:43:32.124638
599         gdas.t12z.updated.status.tm00.b  aws_s3      noaa-reanalyses-pds  t12z            2015-01-03 12:00:00.000000  10287       2022-05-25 23:03:12.000000  2022-07-05 16:43:32.124602

Example row data from the obs_meta_nceplibs_bufr table.

$ sqlite3 observations_inventory.db

sqlite> select * from obs_meta_nceplibs_bufr order by inserted_at desc limit 5;

meta_id     obs_id      cmd_result_id  cmd_str     sat_id      sat_id_name  obs_count   sat_inst_id  sat_inst_desc  filename                      file_size   obs_day                     inserted_at               
----------  ----------  -------------  ----------  ----------  -----------  ----------  -----------  -------------  ----------------------------  ----------  --------------------------  --------------------------
90          493         51             sinv        223         NOAA 19      179480      607                         gdas.t06z.1bhrs4.tm00.bufr_d  29697936    2015-01-03 06:00:00.000000  2022-07-05 16:44:07.750504
89          493         51             sinv        209         NOAA 18      179592      607                         gdas.t06z.1bhrs4.tm00.bufr_d  29697936    2015-01-03 06:00:00.000000  2022-07-05 16:44:07.750465
88          493         51             sinv        4           METOP-2      179554      607                         gdas.t06z.1bhrs4.tm00.bufr_d  29697936    2015-01-03 06:00:00.000000  2022-07-05 16:44:07.750433
87          493         51             sinv        3           METOP-1      179744      607                         gdas.t06z.1bhrs4.tm00.bufr_d  29697936    2015-01-03 06:00:00.000000  2022-07-05 16:44:07.750343
86          492         49             sinv        223         NOAA 19      81000       570                         gdas.t06z.1bamua.tm00.bufr_d  13968480    2015-01-03 06:00:00.000000  2022-07-05 16:44:06.777232

Example Usage

The general syntax for executing an inventory search is as follows:

python3 src/obs_inv_utils/obs_inv_cli.py get-obs-inventory -c [search_config_file.yaml]

$ python3 src/obs_inv_utils/obs_inv_cli.py get-obs-inventory -c src/tests/configs/obs_inv_config__valid_s3.yaml

Example yaml config file contents:

cycling_interval: 21600
date_range:
  datestr: '%Y%m%dT%H%M%SZ'
  end: 20150103T180000Z
  start: 20150101T000000Z
search_info:
  -
    platform: aws_s3
    key: observations/atmos/gefsv13_reanalysis-md5/%Y%m%d%H%M%S/bufr/

Syntax for the NCEPLIBS-bufr utils sinv command

$ python3 src/obs_inv_utils/obs_inv_cli.py get-obs-count-meta-sinv -c src/tests/configs/obs_meta_sinv__valid_s3.yaml

Example yaml config file contents for sinv command to collect observation count file meta.

s3_bucket: noaa-reanalyses-pds
s3_prefix: observations/atmos/gefsv13_reanalysis-md5/%Y%m%d%H%M%S/bufr/
date_range:
  datestr: '%Y%m%dT%H%M%SZ'
  end: 20150103T060000Z
  start: 20150101T000000Z
bufr_files:
  - gdas.t%z.1bamua.tm00.bufr_d
  - gdas.t%z.1bhrs4.tm00.bufr_d
work_dir: '/lustre/work'
scrub_files: False

Syntax for the NCEPLIBS-bufr cmpbqm command after running the get-obs-inventory command on the necessary files.

$ python3 src/obs_inv_utils/obs_inv_cli.py get-obs-count-meta-cmpbqm -c src/tests/configs/obs_meta_cmpbqm__valid_s3.yaml

Example yaml config file contents for cmpbqm command to collect observation count file meta.

s3_bucket: noaa-reanalyses-pds
s3_prefix: observations/reanalysis/conv/prepbufr/%Y/%m/prepbufr/
date_range:
    datestr: '%Y%m%dT%H%M%SZ'
    start: 20200101T000000Z
    end: 20210101T000000Z
prepbufr_files:
    - gdas.%z.prepbufr.nr
work_dir: '/lustre/work'
scrub_files: False

Example of syntax used to plot filename file_size over the specified time range.

$ python3 src/obs_inv_utils/obs_inv_cli.py plot_groups_filesize_timeseries -c src/tests/configs/plot_config__data_type_groupings.yaml

Example yaml config file for plotting command.

---
plot_groupings:
- grouping_name: Hyperspectral Infrared
  data_type_families:
  - data_type_family_name: airs
    external_obs_intervals:
    - obs_src: era5
      intervals:
      - 
        start: 10-01-2002
        end: 01-01-2030
    - obs_src: mera20cr
      intervals:
      -
        start: 10-01-2002
        end: 01-01-2030
    family_members:
    - data_type: airs
      suffix: ".tm00.bufr_d"
    - data_type: airsev
      suffix: ".tm00.bufr_d"
    - data_type: airswm
      suffix: ".tm00.bufr_d"
  - data_type_family_name: cris
    external_obs_intervals:
    - obs_src: era5
      intervals:
      - 
        start: 01-01-2015
        end: 01-01-2030
    - obs_src: mera20cr
      intervals:
      -
        start: 01-01-2013
        end: 01-01-2030
    family_members:
    - data_type: cris
      suffix: ".tm00.bufr_d"
    - data_type: crisf4
      suffix: ".tm00.bufr_d"
    - data_type: crisdb
      suffix: ".tm00.bufr_d"
  - data_type_family_name: iasi
    external_obs_intervals:
    - obs_src: era5
      intervals:
      - 
        start: 02-01-2007
        end: 01-01-2030
    - obs_src: mera20cr
      intervals:
      -
        start: 10-01-2008
        end: 01-01-2030
    family_members:
    - data_type: iasidb
      suffix: ".tm00.bufr_d"
    - data_type: esiasi
      suffix: ".tm00.bufr_d"
    - data_type: mtiasi
      suffix: ".tm00.bufr_d"

Plot created by above yaml.

Hyperspectral Infrared Grouping

About

This repository contains a command line tool to help create an inventory of physical systems observations for data assimilation. This inventory will be created from observation files located in both the BDP's aws s3 object store and Hera's hpss tape storage and will be stored in an sqlite3 database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.5%
  • Shell 2.5%