Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Model Variable Renaming Sprint changes in to GDASApp yamls and templates #1362

Closed
RussTreadon-NOAA opened this issue Nov 5, 2024 · 71 comments · Fixed by #1355
Closed

Comments

@RussTreadon-NOAA
Copy link
Contributor

Several JEDI repositories have been updated with changes from the Model Variable Renaming Sprint. Updating JEDI hashes in sorc/ requires changes in GDASApp and jcb-gdas yamls and templates. This issue is opened to document these changes.

@RussTreadon-NOAA
Copy link
Contributor Author

Started from g-w PR #2992 with sorc/gdas.cd populated with GDASApp PR #1346. Use ush/submodules/update_develop.sh to update the hashes for the following JEDI repos

        modified:   sorc/fv3-jedi (new commits)
        modified:   sorc/ioda (new commits)
        modified:   sorc/iodaconv (new commits)
        modified:   sorc/oops (new commits)
        modified:   sorc/saber (new commits)
        modified:   sorc/soca (new commits)
        modified:   sorc/ufo (new commits)
        modified:   sorc/vader (new commits)

Changes to yamls (templates) thus far include

parm/io/fv3jedi_fieldmetadata_history.yaml
parm/jcb-gdas/model/atmosphere/atmosphere_background.yaml.j2
parm/jcb-gdas/observations/atmosphere/sondes.yaml.j2

Using test_gdasapp_atm_jjob_var_init and test_gdasapp_atm_jjob_var_run to iteratively work through issues.

Puzzled by current failure in the variational analysis job

0: Variable 'virtual_temperature' calculated using Vader recipe AirVirtualTemperature_A
0: OOPS_TRACE[0] leaving Vader::executePlanNL
0: Requested variables Vader could not produce: 25 variables: water_area_fraction, land_area_fraction, ice_area_fraction, surface\
_snow_area_fraction, skin_temperature_at_surface_where_sea, skin_temperature_at_surface_where_land, skin_temperature_at_surface_where_ice, skin_temperature_at_surface_where_snow, vegetation_area_fraction, leaf_area_index, volume_fraction_of_condensed_water_in_soil, soil_temperature, surface_snow_thickness, vegetation_type_index, soil_type, water_vapor_mixing_ratio_wrt_dry_air, mole_fraction_of_ozone_in_air, mass_content_of_cloud_liquid_water_in_atmosphere_layer, effective_radius_of_cloud_liquid_water_particle, mass_content_of_cloud_ice_in_atmosphere_layer, effective_radius_of_cloud_ice_particle, wind_speed_at_surface, wind_from_direction_at_surface, average_surface_temperature_within_field_of_view, geopotential_height
0: OOPS_TRACE[0] leaving Vader::changeVar
0: OOPS_TRACE[0] State::State (from geom, vars and time) starting
0: OOPS_TRACE[0] State::State (from geom, vars and time) done
0: OOPS_TRACE[0] fv3jedi::VarChaModel2GeoVaLs changeVar start
5: Field_fail: Field water_area_fraction cannot be obtained from input fields.
5: Abort(1) on node 5 (rank 5 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5
4: Field_fail: Field water_area_fraction cannot be obtained from input fields.
4: Abort(1) on node 4 (rank 4 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4

A check of atmanlvar.yaml in the run directory does not find any references to water_area_fraction. None of the files in directory fv3jedi/ mention water_area_fraction. The background fields do not contain water_area_fraction. Not sure where, how, or why Vadar is trying to produce water_area_fraction.

@RussTreadon-NOAA
Copy link
Contributor Author

test_gdasapp_atm_jjob_var_run assimilates amsua_n19 and sondes. Remove amsua_n19 from list of assimilated observations. The init job failed because g-w ush/python/pygfs/task/atm_analysis.py assumes bias correction files will always be staged. Modified atm_analysis.py as follows

@@ -114,12 +114,15 @@ class AtmAnalysis(Task):
         # stage bias corrections
         logger.info(f"Staging list of bias correction files")
         bias_dict = self.jedi_dict['atmanlvar'].render_jcb(self.task_config, 'atm_bias_staging')
-        bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
-        FileHandler(bias_dict).sync()
-        logger.debug(f"Bias correction files:\n{pformat(bias_dict)}")
+        if bias_dict['copy'] is None:
+            logger.info(f"No bias correction files to stage")
+        else:
+            bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
+            FileHandler(bias_dict).sync()
+            logger.debug(f"Bias correction files:\n{pformat(bias_dict)}")

-        # extract bias corrections
-        Jedi.extract_tar_from_filehandler_dict(bias_dict)
+            # extract bias corrections
+            Jedi.extract_tar_from_filehandler_dict(bias_dict)

         # stage CRTM fix files
         logger.info(f"Staging CRTM fix files from {self.task_config.CRTM_FIX_YAML}")

With this local change in place the init job ran to completion. The var job successfully ran 3dvar assimilating only sondes. The job failed the reference check since the reference state assimilates amsua_n19 and sondes.

Note that test_gdasapp_atm_jjob_var_run runs the variational analysis using the identity matrix for the background error. This test should be rerun using GSIBEC and/or an ensemble.

Has the default behavior for radiance data assimilation changed? Do we now require numerous surface fields be available? This makes sense if one wants to accurately compute surface emissivity. Surface conditions can also be used for data filtering and QC. This is a change from previous JEDI hashes. test_gdasapp_atm_jjob_var_run previously passed.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Is the failure in Jedi.remove_redundant()? Just so I know have I can fix #2992

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA Yes, the traceback mentions remove_redundant

^[[38;21m2024-11-05 20:02:14,130 - INFO     - jedi        :   END: pygfs.jedi.jedi.render_jcb^[[0m
^[[38;5;39m2024-11-05 20:02:14,132 - DEBUG    - jedi        :  returning: {'mkdir': ['/work/noaa/da/rtreadon/git/global-workflow/pr2992/sor\
c/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18/bc/'], 'copy': None}^[[0m
^[[38;21m2024-11-05 20:02:14,132 - INFO     - jedi        : BEGIN: pygfs.jedi.jedi.remove_redundant^[[0m
^[[38;5;39m2024-11-05 20:02:14,132 - DEBUG    - jedi        : ( None )^[[0m
Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exglobal_atm_analysis_initialize.py", line 26, in <module>
    AtmAnl.initialize()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/atm_analysis.py", line 117, in initialize
    bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/jedi/jedi.py", line 242, in remove_redundant
    for item in input_list:
TypeError: 'NoneType' object is not iterable
+ slurm_script[1]: postamble slurm_script 1730836856 1

If you can fix this in g-w PR #2992, great!

@RussTreadon-NOAA
Copy link
Contributor Author

@DavidNew-NOAA : Updated working copy of feature/jcb-obsbias to e59e883. Rerun test_gdasapp_atm_jjob_var_init without amsua_n19 in the list of assimilated observations. ctest failed as before in remove_redundant

^[[38;21m2024-11-06 11:22:21,613 - INFO     - jedi        :   END: pygfs.jedi.jedi.render_jcb^[[0m
^[[38;5;39m2024-11-06 11:22:21,613 - DEBUG    - jedi        :  returning: {'mkdir': ['/work/noaa/da/rtreadon/git/global-workflow/pr29\
92/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18/bc/'], 'copy': None}^[[0m
^[[38;21m2024-11-06 11:22:21,613 - INFO     - jedi        : BEGIN: pygfs.jedi.jedi.remove_redundant^[[0m
^[[38;5;39m2024-11-06 11:22:21,613 - DEBUG    - jedi        : ( None )^[[0m
Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exglobal_atm_analysis_initialize.py", line 26, in <module>
    AtmAnl.initialize()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/atm_analysis.py", line 118, in initialize
    bias_dict['copy'] = Jedi.remove_redundant(bias_dict['copy'])
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/jedi/jedi.py", line 253, in remove_redundant
    for item in input_list:
TypeError: 'NoneType' object is not iterable

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA That newest commit didn't have a fix yet for this ob issue. I will work on it this morning.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Actually, I just committed the changes you suggested. There's really no reason to mess with remove_redundant for this problem.

@DavidNew-NOAA
Copy link
Collaborator

DavidNew-NOAA commented Nov 6, 2024

Forgot a line..make sure it's commit 7ac6ccb2bbf88b25fb533185c5d481cd328415ee (latest)

@RussTreadon-NOAA
Copy link
Contributor Author

Thank you @DavidNew-NOAA . test_gdasapp_atm_jjob_var_init passes without amsua_n19!

@RussTreadon-NOAA
Copy link
Contributor Author

@danholdaway , @ADCollard , and @emilyhcliu : When I update GDASApp JEDI hashes in develop, ctest test_gdasapp_atm_jjob_var_run when when processing amsua_n19 with the error

0: Requested variables Vader could not produce: 25 variables: water_area_fraction, land_area_fraction, ice_area_fraction, surface_snow_area_fraction, skin_temperature_at_surface_where_sea, skin_temperature_at_surface_where_land, skin_temperature_at_surface_where_ice, skin_temperature_at_surface_where_snow, vegetation_area_fraction, leaf_area_index, volume_fraction_of_condensed_water_in_soil, soil_temperature, surface_snow_thickness, vegetation_type_index, soil_type, water_vapor_mixing_ratio_wrt_dry_air, mole_fraction_of_ozone_in_air, mass_content_of_cloud_liquid_water_in_at\
mosphere_layer, effective_radius_of_cloud_liquid_water_particle, mass_content_of_cloud_ice_in_atmosphere_layer, effective_radius_of_cloud_ice_particle, wind_speed_at_surface, wind_from_direction_at_surface, average_surface_temperature_within_field_of_view, geopotential_height
0: OOPS_TRACE[0] leaving Vader::changeVar
0: OOPS_TRACE[0] State::State (from geom, vars and time) starting
0: OOPS_TRACE[0] State::State (from geom, vars and time) done
0: OOPS_TRACE[0] fv3jedi::VarChaModel2GeoVaLs changeVar start
0: Field_fail: Field water_area_fraction cannot be obtained from input fields.
4: Field_fail: Field water_area_fraction cannot be obtained from input fields.
2: Field_fail: Field water_area_fraction cannot be obtained from input fields.
3: Field_fail: Field water_area_fraction cannot be obtained from input fields.
5: Field_fail: Field water_area_fraction cannot be obtained from input fields.
1: Field_fail: Field water_area_fraction cannot be obtained from input fields.
1: Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
2: Abort(1) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

Updating the JEDI hashes brings in changes from the Model Variable Renaming Sprint. What changed in fv-3jedi, ufo, or vader which now requires the variables listed on the Vader line above? The input yaml does not mention these fields.

test_gdasapp_atm_jjob_var_run only assimilates amsua_n19 and sondes. The test passes if I remove amsua_n19.

@danholdaway
Copy link
Contributor

This is failing because this if statement is not true when it should be. Likely because a variable is not being recognized as being present. Can you point me to your GDASapp and jcb-gdas code?

@RussTreadon-NOAA
Copy link
Contributor Author

This is failing because this if statement is not true when it should be. Likely because a variable is not being recognized as being present. Can you point me to your GDASapp and jcb-gdas code?

@danholdaway : Here key directories and the job log file (all on Hercules):

  • GDASApp: work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd
  • jcb-gdas: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/parm/jcb-gdas
  • atmanlvar run directory: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/RUNDIRS/gdas_test/gdasatmanl_18
  • failed job log file: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/atmanlvar-3103929.out

@RussTreadon-NOAA
Copy link
Contributor Author

Prints added to src/fv3jedi/VariableChange/Model2GeoVaLs/fv3jedi_vc_model2geovals_mod.f90 show that have_ts is .false.. The input cube sphere history surface file contains

        double land(time, tile, grid_yt, grid_xt) ;
        double weasd(time, tile, grid_yt, grid_xt) ;
        double tmpsfc(time, tile, grid_yt, grid_xt) ;
        double vtype(time, tile, grid_yt, grid_xt) ;
        double sotyp(time, tile, grid_yt, grid_xt) ;
        double veg(time, tile, grid_yt, grid_xt) ;
        double soilt1(time, tile, grid_yt, grid_xt) ;
        double soilt2(time, tile, grid_yt, grid_xt) ;
        double soilt3(time, tile, grid_yt, grid_xt) ;
        double soilt4(time, tile, grid_yt, grid_xt) ;
        double soilw1(time, tile, grid_yt, grid_xt) ;
        double soilw2(time, tile, grid_yt, grid_xt) ;
        double soilw3(time, tile, grid_yt, grid_xt) ;
        double soilw4(time, tile, grid_yt, grid_xt) ;
        double snod(time, tile, grid_yt, grid_xt) ;
        double ugrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double vgrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double f10m(time, tile, grid_yt, grid_xt) ;

There is no ts or tsea field. The cube history surface file only contains tmpsfc. A check of the sfc_data tile files shows that these files contains several temperature fields

        double tsea(Time, yaxis_1, xaxis_1) ;
        double tisfc(Time, yaxis_1, xaxis_1) ;
        double tsfc(Time, yaxis_1, xaxis_1) ;
        double tsfcl(Time, yaxis_1, xaxis_1) ;
        double tiice(Time, zaxis_1, yaxis_1, xaxis_1) ;

Does the cube history file contain all the information we need to defined surface characteristics for radiance assimilation?

@danholdaway
Copy link
Contributor

In jcb-gdas you changed surface_geopotential_height to hgtsfc and tsea to tmpsfc. Perhaps try changing as:

surface_geopotential_height -> geopotential_height_times_gravity_at_surface
tsea -> sst?

Switching from the old short name to the IO name may have resulted in crossed wires.

@danholdaway
Copy link
Contributor

I think the sst change is because of https://github.com/JCSDA-internal/fv3-jedi/pull/1258 rather than variable naming conventions.

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Nov 8, 2024

Thank you @danholdaway for pointing me at fv3-jedi PR #1258. I see there was confusion over the name used for the skin temperature. This confusion remains when I ncdump -hcs gfs history cubed_sphere_grid_sfcf006.nc and restart sfc_data.tile*.nc

Our cube sphere surface history files contain the following fields

        double land(time, tile, grid_yt, grid_xt) ;
        double weasd(time, tile, grid_yt, grid_xt) ;
        double tmpsfc(time, tile, grid_yt, grid_xt) ;
        double vtype(time, tile, grid_yt, grid_xt) ;
        double sotyp(time, tile, grid_yt, grid_xt) ;
        double veg(time, tile, grid_yt, grid_xt) ;
        double soilt1(time, tile, grid_yt, grid_xt) ;
        double soilt2(time, tile, grid_yt, grid_xt) ;
        double soilt3(time, tile, grid_yt, grid_xt) ;
        double soilt4(time, tile, grid_yt, grid_xt) ;
        double soilw1(time, tile, grid_yt, grid_xt) ;
        double soilw2(time, tile, grid_yt, grid_xt) ;
        double soilw3(time, tile, grid_yt, grid_xt) ;
        double soilw4(time, tile, grid_yt, grid_xt) ;
        double snod(time, tile, grid_yt, grid_xt) ;
        double ugrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double vgrd_hyblev1(time, tile, grid_yt, grid_xt) ;
        double f10m(time, tile, grid_yt, grid_xt) ;

There is neither sst nor tsea. The cube history surface contains tmpsfc.

Our tiled surface restart files contain the following fields starting with t

        double tsea(Time, yaxis_1, xaxis_1) ;
        double tg3(Time, yaxis_1, xaxis_1) ;
        double t2m(Time, yaxis_1, xaxis_1) ;
        double tisfc(Time, yaxis_1, xaxis_1) ;
        double tprcp(Time, yaxis_1, xaxis_1) ;
        double tsfc(Time, yaxis_1, xaxis_1) ;
        double tsfcl(Time, yaxis_1, xaxis_1) ;
        double tref(Time, yaxis_1, xaxis_1) ;
        double tvxy(Time, yaxis_1, xaxis_1) ;
        double tgxy(Time, yaxis_1, xaxis_1) ;
        double tahxy(Time, yaxis_1, xaxis_1) ;
        double taussxy(Time, yaxis_1, xaxis_1) ;
        double tiice(Time, zaxis_1, yaxis_1, xaxis_1) ;
        double tsnoxy(Time, zaxis_3, yaxis_1, xaxis_1) ;

The restart surface tiles contains tsea. There is no tmpsfc.

Our atmospheric variational and local ensemble yamls now use filetype: cube sphere history for the backgrounds. The updated fv3-jedi code does not recognize tmpsfc. Is there a way via tables, parm files, or yamls to get the code to process tmpsfc?

The restart tiles have what appear to be fields for temperature over various surface types

  • tsea - temperature over sea surface?
  • tisfc - temperature over ice surface?
  • tsfc - temperature over all surfaces?
  • tsfcl - temperature over land surface?

Which temperature or combination of temperature should we pass to CRTM?

I sidestepped this question and did a simple test. I renamed tmpsfc as ts in the cube sphere surface history file. With this change, ctest test_gdasapp_atm_jjob_var_run Passed. This is good but is tmpsfc or it's renamed variant, ts, the correct temperature to pass to CRTM?

I can replace the variable name tmpsfc with ts in our canned ctest cube sphere surface history files, but is this right approach?

Tagging @emilyhcliu , @ADCollard , @CoryMartin-NOAA , and @DavidNew-NOAA . Two questions

  1. What's the short term patch to keep this issue moving forward?
  2. What temperature should we be passing to the CRTM?

The response to question 1 can be captured in this issue. Resolution of questions 2 likely needs a new issue.

@danholdaway
Copy link
Contributor

@RussTreadon-NOAA the issue might be in the mapping between tmpsfc and the long name in the FieldMetadata file. Do you know where that is coming from? It might be a fix file I guess.

@RussTreadon-NOAA
Copy link
Contributor Author

@danholdaway , you are right.

I spent the morning wading through code, yamls, parm files, & fix files. I found the spot to make the correct linkage between fv3-jedi source code and our gfs cube sphere history files. With the change in place the variational and local ensemble DA jobs passed. The increment jobs failed. I still need to update the yamls for these jobs

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build$ ctest -R test_gdasapp_atm_jjob
Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 2016: test_gdasapp_atm_jjob_var_init
 1/11 Test #2016: test_gdasapp_atm_jjob_var_init .........   Passed   45.77 sec
      Start 2017: test_gdasapp_atm_jjob_var_run
 2/11 Test #2017: test_gdasapp_atm_jjob_var_run ..........   Passed  106.22 sec
      Start 2018: test_gdasapp_atm_jjob_var_inc
 3/11 Test #2018: test_gdasapp_atm_jjob_var_inc ..........***Failed   42.28 sec
      Start 2019: test_gdasapp_atm_jjob_var_final
 4/11 Test #2019: test_gdasapp_atm_jjob_var_final ........***Failed   42.23 sec
      Start 2020: test_gdasapp_atm_jjob_ens_init
 5/11 Test #2020: test_gdasapp_atm_jjob_ens_init .........   Passed   45.67 sec
      Start 2021: test_gdasapp_atm_jjob_ens_letkf
 6/11 Test #2021: test_gdasapp_atm_jjob_ens_letkf ........   Passed  554.37 sec
      Start 2022: test_gdasapp_atm_jjob_ens_init_split
 7/11 Test #2022: test_gdasapp_atm_jjob_ens_init_split ...   Passed   45.85 sec
      Start 2023: test_gdasapp_atm_jjob_ens_obs
 8/11 Test #2023: test_gdasapp_atm_jjob_ens_obs ..........   Passed   42.27 sec
      Start 2024: test_gdasapp_atm_jjob_ens_sol
 9/11 Test #2024: test_gdasapp_atm_jjob_ens_sol ..........   Passed   42.28 sec
      Start 2025: test_gdasapp_atm_jjob_ens_inc
10/11 Test #2025: test_gdasapp_atm_jjob_ens_inc ..........***Failed   42.26 sec
      Start 2026: test_gdasapp_atm_jjob_ens_final
11/11 Test #2026: test_gdasapp_atm_jjob_ens_final ........***Failed   74.29 sec

64% tests passed, 4 tests failed out of 11

Total Test time (real) = 1083.98 sec

The file I modified is $HOMEgfs/fix/gdas/fv3jedi/fieldmetadata/gfs-history.yaml. I replaced

- long name: skin_temperature_at_surface
  io name: tsea

with

- long name: skin_temperature_at_surface
  io name: tmpsfc

@danholdaway
Copy link
Contributor

Thanks @RussTreadon-NOAA, really nice work digging through. If that fix file came directly from fv3-jedi (and was used in the fv3-jedi tests) there wouldn't have been any work to do so perhaps we should look into doing that.

@RussTreadon-NOAA
Copy link
Contributor Author

Agreed! We've been bitten by this disconnect more than once.

@RussTreadon-NOAA
Copy link
Contributor Author

Hercules test
Install g-w PR #2992 on Hercules. Use sorc/gdas.cd/ush/submodules/update_develop.sh to update JEDI hashes. Iteratively work through issues to get atm var and ensda ctests to pass. Run all test_gdasapp ctests with the following results

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 1582: test_gdasapp_util_coding_norms
 1/55 Test #1582: test_gdasapp_util_coding_norms ......................................   Passed    4.28 sec
      Start 1583: test_gdasapp_util_ioda_example
 2/55 Test #1583: test_gdasapp_util_ioda_example ......................................   Passed   12.28 sec
      Start 1584: test_gdasapp_util_prepdata
 3/55 Test #1584: test_gdasapp_util_prepdata ..........................................   Passed    5.11 sec
      Start 1585: test_gdasapp_util_rads2ioda
 4/55 Test #1585: test_gdasapp_util_rads2ioda .........................................   Passed    0.94 sec
      Start 1586: test_gdasapp_util_ghrsst2ioda
 5/55 Test #1586: test_gdasapp_util_ghrsst2ioda .......................................   Passed    0.11 sec
      Start 1587: test_gdasapp_util_rtofstmp
 6/55 Test #1587: test_gdasapp_util_rtofstmp ..........................................   Passed    1.41 sec
      Start 1588: test_gdasapp_util_rtofssal
 7/55 Test #1588: test_gdasapp_util_rtofssal ..........................................   Passed    0.45 sec
      Start 1589: test_gdasapp_util_smap2ioda
 8/55 Test #1589: test_gdasapp_util_smap2ioda .........................................   Passed    0.09 sec
      Start 1590: test_gdasapp_util_smos2ioda
 9/55 Test #1590: test_gdasapp_util_smos2ioda .........................................   Passed    0.13 sec
      Start 1591: test_gdasapp_util_viirsaod2ioda
10/55 Test #1591: test_gdasapp_util_viirsaod2ioda .....................................   Passed    0.09 sec
      Start 1592: test_gdasapp_util_icecabi2ioda
11/55 Test #1592: test_gdasapp_util_icecabi2ioda ......................................   Passed    0.12 sec
      Start 1593: test_gdasapp_util_icecamsr2ioda
12/55 Test #1593: test_gdasapp_util_icecamsr2ioda .....................................   Passed    0.11 sec
      Start 1594: test_gdasapp_util_icecmirs2ioda
13/55 Test #1594: test_gdasapp_util_icecmirs2ioda .....................................   Passed    0.09 sec
      Start 1595: test_gdasapp_util_icecjpssrr2ioda
14/55 Test #1595: test_gdasapp_util_icecjpssrr2ioda ...................................   Passed    0.09 sec
      Start 1951: test_gdasapp_check_python_norms
15/55 Test #1951: test_gdasapp_check_python_norms .....................................   Passed    3.95 sec
      Start 1952: test_gdasapp_check_yaml_keys
16/55 Test #1952: test_gdasapp_check_yaml_keys ........................................   Passed    1.28 sec
      Start 1953: test_gdasapp_jedi_increment_to_fv3
17/55 Test #1953: test_gdasapp_jedi_increment_to_fv3 ..................................   Passed    9.07 sec
      Start 1954: test_gdasapp_fv3jedi_fv3inc
18/55 Test #1954: test_gdasapp_fv3jedi_fv3inc .........................................   Passed   24.38 sec
      Start 1955: test_gdasapp_snow_create_ens
19/55 Test #1955: test_gdasapp_snow_create_ens ........................................   Passed    0.84 sec
      Start 1956: test_gdasapp_snow_imsproc
20/55 Test #1956: test_gdasapp_snow_imsproc ...........................................   Passed    3.40 sec
      Start 1957: test_gdasapp_snow_apply_jediincr
21/55 Test #1957: test_gdasapp_snow_apply_jediincr ....................................   Passed    2.35 sec
      Start 1958: test_gdasapp_snow_letkfoi_snowda
22/55 Test #1958: test_gdasapp_snow_letkfoi_snowda ....................................   Passed    7.13 sec
      Start 1959: test_gdasapp_convert_bufr_adpsfc_snow
23/55 Test #1959: test_gdasapp_convert_bufr_adpsfc_snow ...............................   Passed    3.40 sec
      Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
24/55 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   16.80 sec
      Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
25/55 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed  1385.57 sec
      Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_202103241200
26/55 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_202103241200 .............   Passed  643.56 sec
      Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
27/55 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  269.23 sec
      Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
28/55 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......   Passed  269.72 sec
      Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
29/55 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....   Passed  395.73 sec
      Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
30/55 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....***Failed  596.78 sec
      Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
31/55 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...***Failed  294.69 sec
      Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
32/55 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...***Failed  292.32 sec
      Start 1969: test_gdasapp_convert_bufr_adpsfc
33/55 Test #1969: test_gdasapp_convert_bufr_adpsfc ....................................   Passed   11.23 sec
      Start 1970: test_gdasapp_convert_gsi_satbias
34/55 Test #1970: test_gdasapp_convert_gsi_satbias ....................................   Passed    4.34 sec
      Start 1971: test_gdasapp_setup_atm_cycled_exp
35/55 Test #1971: test_gdasapp_setup_atm_cycled_exp ...................................   Passed    2.28 sec
      Start 1972: test_gdasapp_atm_jjob_var_init
36/55 Test #1972: test_gdasapp_atm_jjob_var_init ......................................   Passed   45.04 sec
      Start 1973: test_gdasapp_atm_jjob_var_run
37/55 Test #1973: test_gdasapp_atm_jjob_var_run .......................................   Passed  106.17 sec
      Start 1974: test_gdasapp_atm_jjob_var_inc
38/55 Test #1974: test_gdasapp_atm_jjob_var_inc .......................................   Passed   42.19 sec
      Start 1975: test_gdasapp_atm_jjob_var_final
39/55 Test #1975: test_gdasapp_atm_jjob_var_final .....................................   Passed   42.17 sec
      Start 1976: test_gdasapp_atm_jjob_ens_init
40/55 Test #1976: test_gdasapp_atm_jjob_ens_init ......................................   Passed   44.80 sec
      Start 1977: test_gdasapp_atm_jjob_ens_letkf
41/55 Test #1977: test_gdasapp_atm_jjob_ens_letkf .....................................   Passed  554.25 sec
      Start 1978: test_gdasapp_atm_jjob_ens_init_split
42/55 Test #1978: test_gdasapp_atm_jjob_ens_init_split ................................   Passed  140.93 sec
      Start 1979: test_gdasapp_atm_jjob_ens_obs
43/55 Test #1979: test_gdasapp_atm_jjob_ens_obs .......................................   Passed   74.18 sec
      Start 1980: test_gdasapp_atm_jjob_ens_sol
44/55 Test #1980: test_gdasapp_atm_jjob_ens_sol .......................................   Passed   42.19 sec
      Start 1981: test_gdasapp_atm_jjob_ens_inc
45/55 Test #1981: test_gdasapp_atm_jjob_ens_inc .......................................   Passed   42.17 sec
      Start 1982: test_gdasapp_atm_jjob_ens_final
46/55 Test #1982: test_gdasapp_atm_jjob_ens_final .....................................   Passed   42.19 sec
      Start 1983: test_gdasapp_aero_gen_3dvar_yaml
47/55 Test #1983: test_gdasapp_aero_gen_3dvar_yaml ....................................   Passed    0.46 sec
      Start 1984: test_gdasapp_bufr2ioda_insitu_profile_argo
48/55 Test #1984: test_gdasapp_bufr2ioda_insitu_profile_argo ..........................***Failed    5.33 sec
      Start 1985: test_gdasapp_bufr2ioda_insitu_profile_bathy
49/55 Test #1985: test_gdasapp_bufr2ioda_insitu_profile_bathy .........................***Failed    0.22 sec
      Start 1986: test_gdasapp_bufr2ioda_insitu_profile_glider
50/55 Test #1986: test_gdasapp_bufr2ioda_insitu_profile_glider ........................***Failed    0.22 sec
      Start 1987: test_gdasapp_bufr2ioda_insitu_profile_tesac
51/55 Test #1987: test_gdasapp_bufr2ioda_insitu_profile_tesac .........................***Failed    0.21 sec
      Start 1988: test_gdasapp_bufr2ioda_insitu_profile_tropical
52/55 Test #1988: test_gdasapp_bufr2ioda_insitu_profile_tropical ......................***Failed    0.22 sec
      Start 1989: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
53/55 Test #1989: test_gdasapp_bufr2ioda_insitu_profile_xbtctd ........................***Failed    0.22 sec
      Start 1990: test_gdasapp_bufr2ioda_insitu_surface_drifter
54/55 Test #1990: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................***Failed    0.22 sec
      Start 1991: test_gdasapp_bufr2ioda_insitu_surface_trkob
55/55 Test #1991: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................***Failed    0.22 sec

80% tests passed, 11 tests failed out of 55

Label Time Summary:
gdas-utils    =  25.30 sec*proc (14 tests)
manual        = 4164.41 sec*proc (9 tests)
script        =  25.30 sec*proc (14 tests)

Total Test time (real) = 5451.27 sec

The following tests FAILED:
        1966 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 (Failed)
        1967 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 (Failed)
        1968 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 (Failed)
        1984 - test_gdasapp_bufr2ioda_insitu_profile_argo (Failed)
        1985 - test_gdasapp_bufr2ioda_insitu_profile_bathy (Failed)
        1986 - test_gdasapp_bufr2ioda_insitu_profile_glider (Failed)
        1987 - test_gdasapp_bufr2ioda_insitu_profile_tesac (Failed)
        1988 - test_gdasapp_bufr2ioda_insitu_profile_tropical (Failed)
        1989 - test_gdasapp_bufr2ioda_insitu_profile_xbtctd (Failed)
        1990 - test_gdasapp_bufr2ioda_insitu_surface_drifter (Failed)
        1991 - test_gdasapp_bufr2ioda_insitu_surface_trkob (Failed)
Errors while running CTest
Output from these tests are in: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/Testing/Temporary/LastTest.log

The test_gdasapp_bufr2ioda_insitu failures are a know problem. Each of these jobs fails with the same message ModuleNotFoundError: No module named 'pyiodaconv'. For example, here is the traceback from test_gdasapp_bufr2ioda_insitu_profile_argo

1984: Traceback (most recent call last):
1984:   File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/gdas/ush/ioda/bufr2ioda/marine/b2i/bufr2ioda_insitu_profile_argo.py", line 6, in <module>
1984:     from b2iconverter.bufr2ioda_converter import Bufr2ioda_Converter
1984:   File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/ush/ioda/bufr2ioda/marine/b2i/b2iconverter/bufr2ioda_converter.py", line 7, in <module>
1984:     from pyiodaconv import bufr
1984: ModuleNotFoundError: No module named 'pyiodaconv'
1/1 Test #1984: test_gdasapp_bufr2ioda_insitu_profile_argo ...***Failed   16.83 sec

@apchoiCMD , do you have a branch with changes that allow these tests to pass?

The test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 failure appears to be related to the Model Variable Renaming Sprint. The job fails with the message

  0: insitu_profile_argo processed vars: 2 Variables: waterTemperature, salinity
 0: insitu_profile_argo assimilated vars: 2 Variables: waterTemperature, salinity
 1: Unable to find field metadata for: cicen
 9: Unable to find field metadata for: cicen
13: Unable to find field metadata for: cicen
15: Unable to find field metadata for: cicen
 1: Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
 7: Unable to find field metadata for: cicen
 9: Abort(1) on node 9 (rank 9 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 9
13: Abort(1) on node 13 (rank 13 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 13
14: Unable to find field metadata for: cicen
15: Abort(1) on node 15 (rank 15 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 15

@guillaumevernieres , do you know where / what needs to be changed in yamls or fixed files to get the marinevar test to pass? The log file for the failed job is /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/logs/2021032418/gdas_marineanlvar.log

@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI for DA

Successfully run C96C48_ufs_hybatmDA g-w CI on Hercules.

C96C48_hybatmaerosnowDA and C48mx500_3DVarAOWCDA fail.

The C48mx500_3DVarAOWCDA failure is expected given ctest failures.

The C96C48_hybatmaerosnowDA failure is in the 20211220 18Z enkfgdas_esnowrecen.log. Executable fregrid.x aborts with the following message

NOTE: done calculating index and weight for conservative interpolation
Successfully running fregrid and the following output file are generated.
****./bkg/det_ensres//20211220.150000.sfc_data.tile1.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile2.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile3.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile4.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile5.nc
****./bkg/det_ensres//20211220.150000.sfc_data.tile6.nc
^[[38;21m2024-11-11 11:56:28,496 - INFO     - snowens_analysis:   END: pygfs.task.snowens_analysis.regridDetBkg^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis:  returning: None^[[0m
^[[38;21m2024-11-11 11:56:28,496 - INFO     - snowens_analysis: BEGIN: pygfs.task.snowens_analysis.regridDetInc^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis: ( <pygfs.task.snowens_analysis.SnowEnsAnalysis object at 0x1460b0588f10> )^[[0m
^[[38;5;39m2024-11-11 11:56:28,496 - DEBUG    - snowens_analysis: Executing /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x^[[0m
^[[38;21m2024-11-11 11:56:28,678 - INFO     - root        : BEGIN: wxflow.exceptions.__init__^[[0m
^[[38;5;39m2024-11-11 11:56:28,678 - DEBUG    - root        : ( WorkflowException('An error occured during execution of /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x'), 'An error occured during execution of /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x' )^[[0m

...

Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/snowens_analysis.py", line 230, in regridDetInc
    exec_cmd(*arg_list)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/executable.py", line 230, in __call__
    raise ProcessError(f"Command exited with status {proc.returncode}:", long_msg)
wxflow.executable.ProcessError: Command exited with status -11:
'/work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/praero_pr2992/enkfgdas.2021122018/esnowrecen.1577478/fregrid.x' '--input_mosaic' './orog/det/C96_mosaic.nc' '--input_dir' './inc/det/' '--input_file' 'snowinc.20211220.150000.sfc_data' '--scalar_field' 'snodl' '--output_dir' './inc/det_ensres/' '--output_file' 'snowinc.20211220.150000.sfc_data' '--output_mosaic' './orog/ens/C48_mosaic.nc' '--interp_method' 'conserve_order1' '--weight_file' './orog/det/C96.mx500_interp_weight' '--weight_field' 'lsm_frac' '--remap_file' './remap'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/scripts/exgdas_enkf_snow_recenter.py", line 27, in <module>
    anl.regridDetInc()
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/logger.py", line 266, in wrapper
    retval = func(*args, **kwargs)
  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/pygfs/task/snowens_analysis.py", line 234, in regridDetInc
    raise WorkflowException(f"An error occured during execution of {exec_cmd}")
wxflow.exceptions.WorkflowException
+ JGDAS_ENKF_SNOW_RECENTER[1]: postamble JGDAS_ENKF_SNOW_RECENTER 1731326133 1

It is not clear from the traceback what the actual error is. Since this installation of GDASApp includes JEDI hashes with changes from the Model Variable Renaming Sprint, one or more yaml keywords or fix file keyword most likely need to be updated.

@jiaruidong2017, @ClaraDraper-NOAA : Any ideas what we need to change in JEDI snow DA when moving to JEDI hashes which include changes from the Model Variable Renaming Sprint?

The log file for the failed job is /work/noaa/stmp/rtreadon/COMROOT/praero_pr2992/logs/2021122018/enkfgdas_esnowrecen.log on Hercules.

@RussTreadon-NOAA
Copy link
Contributor Author

test_gdasapp update

Install g-w PR #2992 on Hercules. Specifically, g-w branch DavidNew-NOAA:feature/jcb-obsbias at a6fd65ad was installed. sorc/gdas.cd was replaced with GDASApp branch feature/resume_nightly at 4561ead.

test_gdasapp was run with the following results

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
      Start 1582: test_gdasapp_util_coding_norms
 1/64 Test #1582: test_gdasapp_util_coding_norms ......................................   Passed    8.98 sec
      Start 1583: test_gdasapp_util_ioda_example
 2/64 Test #1583: test_gdasapp_util_ioda_example ......................................   Passed   12.24 sec
      Start 1584: test_gdasapp_util_prepdata
 3/64 Test #1584: test_gdasapp_util_prepdata ..........................................   Passed    3.69 sec
      Start 1585: test_gdasapp_util_rads2ioda
 4/64 Test #1585: test_gdasapp_util_rads2ioda .........................................   Passed    0.62 sec
      Start 1586: test_gdasapp_util_ghrsst2ioda
 5/64 Test #1586: test_gdasapp_util_ghrsst2ioda .......................................   Passed    0.12 sec
      Start 1587: test_gdasapp_util_rtofstmp
 6/64 Test #1587: test_gdasapp_util_rtofstmp ..........................................   Passed    1.97 sec
      Start 1588: test_gdasapp_util_rtofssal
 7/64 Test #1588: test_gdasapp_util_rtofssal ..........................................   Passed    0.46 sec
      Start 1589: test_gdasapp_util_smap2ioda
 8/64 Test #1589: test_gdasapp_util_smap2ioda .........................................   Passed    0.12 sec
      Start 1590: test_gdasapp_util_smos2ioda
 9/64 Test #1590: test_gdasapp_util_smos2ioda .........................................   Passed    0.12 sec
      Start 1591: test_gdasapp_util_viirsaod2ioda
10/64 Test #1591: test_gdasapp_util_viirsaod2ioda .....................................   Passed    0.12 sec
      Start 1592: test_gdasapp_util_icecabi2ioda
11/64 Test #1592: test_gdasapp_util_icecabi2ioda ......................................   Passed    0.13 sec
      Start 1593: test_gdasapp_util_icecamsr2ioda
12/64 Test #1593: test_gdasapp_util_icecamsr2ioda .....................................   Passed    0.12 sec
      Start 1594: test_gdasapp_util_icecmirs2ioda
13/64 Test #1594: test_gdasapp_util_icecmirs2ioda .....................................   Passed    0.12 sec
      Start 1595: test_gdasapp_util_icecjpssrr2ioda
14/64 Test #1595: test_gdasapp_util_icecjpssrr2ioda ...................................   Passed    0.12 sec
      Start 1951: test_gdasapp_check_python_norms
15/64 Test #1951: test_gdasapp_check_python_norms .....................................   Passed    2.67 sec
      Start 1952: test_gdasapp_check_yaml_keys
16/64 Test #1952: test_gdasapp_check_yaml_keys ........................................   Passed    0.91 sec
      Start 1953: test_gdasapp_jedi_increment_to_fv3
17/64 Test #1953: test_gdasapp_jedi_increment_to_fv3 ..................................   Passed    8.40 sec
      Start 1954: test_gdasapp_fv3jedi_fv3inc
18/64 Test #1954: test_gdasapp_fv3jedi_fv3inc .........................................   Passed   19.85 sec
      Start 1955: test_gdasapp_snow_create_ens
19/64 Test #1955: test_gdasapp_snow_create_ens ........................................   Passed    3.57 sec
      Start 1956: test_gdasapp_snow_imsproc
20/64 Test #1956: test_gdasapp_snow_imsproc ...........................................   Passed    3.01 sec
      Start 1957: test_gdasapp_snow_apply_jediincr
21/64 Test #1957: test_gdasapp_snow_apply_jediincr ....................................   Passed    4.58 sec
      Start 1958: test_gdasapp_snow_letkfoi_snowda
22/64 Test #1958: test_gdasapp_snow_letkfoi_snowda ....................................   Passed    9.79 sec
      Start 1959: test_gdasapp_convert_bufr_adpsfc_snow
23/64 Test #1959: test_gdasapp_convert_bufr_adpsfc_snow ...............................   Passed    3.04 sec
      Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
24/64 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   28.92 sec
      Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
25/64 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed   45.72 sec
      Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200
26/64 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200 ........   Passed  320.42 sec
      Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
27/64 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  248.05 sec
      Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
28/64 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......***Failed  171.34 sec
      Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
29/64 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....***Failed   58.38 sec
      Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
30/64 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....***Failed   53.74 sec
      Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
31/64 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...***Failed   43.18 sec
      Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
32/64 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...***Failed   45.18 sec
      Start 1969: test_gdasapp_WCDA-hyb-C48mx500
33/64 Test #1969: test_gdasapp_WCDA-hyb-C48mx500 ......................................   Passed   34.23 sec
      Start 1970: test_gdasapp_WCDA-hyb-C48mx500_gdas_stage_ic_202103241200
34/64 Test #1970: test_gdasapp_WCDA-hyb-C48mx500_gdas_stage_ic_202103241200 ...........   Passed   58.17 sec
      Start 1971: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_stage_ic_202103241200
35/64 Test #1971: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_stage_ic_202103241200 .......   Passed   44.28 sec
      Start 1972: test_gdasapp_WCDA-hyb-C48mx500_gdas_fcst_seg0_202103241200
36/64 Test #1972: test_gdasapp_WCDA-hyb-C48mx500_gdas_fcst_seg0_202103241200 ..........   Passed  566.11 sec
      Start 1973: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem001_202103241200
37/64 Test #1973: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem001_202103241200 ....   Passed  426.34 sec
      Start 1974: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem002_202103241200
38/64 Test #1974: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem002_202103241200 ....   Passed  409.62 sec
      Start 1975: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem003_202103241200
39/64 Test #1975: test_gdasapp_WCDA-hyb-C48mx500_enkfgdas_fcst_mem003_202103241200 ....   Passed  423.69 sec
      Start 1976: test_gdasapp_WCDA-hyb-C48mx500_gdas_prepoceanobs_202103241800
40/64 Test #1976: test_gdasapp_WCDA-hyb-C48mx500_gdas_prepoceanobs_202103241800 .......   Passed  237.67 sec
      Start 1977: test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800
41/64 Test #1977: test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 .....***Failed  137.85 sec
      Start 1978: test_gdasapp_convert_bufr_adpsfc
42/64 Test #1978: test_gdasapp_convert_bufr_adpsfc ....................................   Passed   16.93 sec
      Start 1979: test_gdasapp_convert_gsi_satbias
43/64 Test #1979: test_gdasapp_convert_gsi_satbias ....................................   Passed    8.28 sec
      Start 1980: test_gdasapp_setup_atm_cycled_exp
44/64 Test #1980: test_gdasapp_setup_atm_cycled_exp ...................................   Passed    7.26 sec
      Start 1981: test_gdasapp_atm_jjob_var_init
45/64 Test #1981: test_gdasapp_atm_jjob_var_init ......................................   Passed   80.19 sec
      Start 1982: test_gdasapp_atm_jjob_var_run
46/64 Test #1982: test_gdasapp_atm_jjob_var_run .......................................   Passed  106.44 sec
      Start 1983: test_gdasapp_atm_jjob_var_inc
47/64 Test #1983: test_gdasapp_atm_jjob_var_inc .......................................   Passed   74.35 sec
      Start 1984: test_gdasapp_atm_jjob_var_final
48/64 Test #1984: test_gdasapp_atm_jjob_var_final .....................................   Passed   42.34 sec
      Start 1985: test_gdasapp_atm_jjob_ens_init
49/64 Test #1985: test_gdasapp_atm_jjob_ens_init ......................................   Passed   79.58 sec
      Start 1986: test_gdasapp_atm_jjob_ens_letkf
50/64 Test #1986: test_gdasapp_atm_jjob_ens_letkf .....................................   Passed  778.65 sec
      Start 1987: test_gdasapp_atm_jjob_ens_init_split
51/64 Test #1987: test_gdasapp_atm_jjob_ens_init_split ................................   Passed  112.04 sec
      Start 1988: test_gdasapp_atm_jjob_ens_obs
52/64 Test #1988: test_gdasapp_atm_jjob_ens_obs .......................................   Passed   42.32 sec
      Start 1989: test_gdasapp_atm_jjob_ens_sol
53/64 Test #1989: test_gdasapp_atm_jjob_ens_sol .......................................   Passed   42.33 sec
      Start 1990: test_gdasapp_atm_jjob_ens_inc
54/64 Test #1990: test_gdasapp_atm_jjob_ens_inc .......................................   Passed  106.33 sec
      Start 1991: test_gdasapp_atm_jjob_ens_final
55/64 Test #1991: test_gdasapp_atm_jjob_ens_final .....................................   Passed   42.38 sec
      Start 1992: test_gdasapp_aero_gen_3dvar_yaml
56/64 Test #1992: test_gdasapp_aero_gen_3dvar_yaml ....................................   Passed    5.77 sec
      Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
57/64 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo ..........................***Failed   10.11 sec
      Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
58/64 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy .........................***Failed    0.42 sec
      Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
59/64 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider ........................***Failed    0.40 sec
      Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
60/64 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac .........................***Failed    0.40 sec
      Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
61/64 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ......................***Failed    0.40 sec
      Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
62/64 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd ........................***Failed    0.41 sec
      Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
63/64 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter .......................***Failed    0.38 sec
      Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
64/64 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob .........................***Failed    0.38 sec

78% tests passed, 14 tests failed out of 64

Label Time Summary:
gdas-utils    =  28.91 sec*proc (14 tests)
manual        = 3352.89 sec*proc (18 tests)
script        =  28.91 sec*proc (14 tests)

Total Test time (real) = 4999.35 sec

The following tests FAILED:
        1964 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 (Failed)
        1965 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 (Failed)
        1966 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 (Failed)
        1967 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 (Failed)
        1968 - test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 (Failed)
        1977 - test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 (Failed)
        1993 - test_gdasapp_bufr2ioda_insitu_profile_argo (Failed)
        1994 - test_gdasapp_bufr2ioda_insitu_profile_bathy (Failed)
        1995 - test_gdasapp_bufr2ioda_insitu_profile_glider (Failed)
        1996 - test_gdasapp_bufr2ioda_insitu_profile_tesac (Failed)
        1997 - test_gdasapp_bufr2ioda_insitu_profile_tropical (Failed)
        1998 - test_gdasapp_bufr2ioda_insitu_profile_xbtctd (Failed)
        1999 - test_gdasapp_bufr2ioda_insitu_surface_drifter (Failed)
        2000 - test_gdasapp_bufr2ioda_insitu_surface_trkob (Failed)

Log files for failed marine 3DVar jobs are in /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/logs/2021032418. The marine bmat job log file contains the message

12: Unable to find field metadata for: tocn
14: Unable to find field metadata for: tocn
 0: Unable to find field metadata for: tocn
 0: OOPS Ending   2024-11-11 16:49:33 (UTC+0000)
 1: Unable to find field metadata for: tocn
 2: Abort(1) on node 2 (rank 2 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
 3: Abort(1) on node 3 (rank 3 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
 4: Unable to find field metadata for: tocn
 5: Abort(1) on node 5 (rank 5 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5

The appears to a model variable renaming issue. Correcting the bmat job may allow the subsequent marine jobs to successfully run to completion.

The log file for failed marine hyb job contains

  File "/work/noaa/da/rtreadon/git/global-workflow/pr2992/ush/python/wxflow/attrdict.py", line 84, in __missing__
    raise KeyError(name)
KeyError: 'APRUN_MARINEANLLETKF'
+ JGLOBAL_MARINE_ANALYSIS_LETKF[1]: postamble JGLOBAL_MARINE_ANALYSIS_LETKF 1731346303 1
+ preamble.sh[70]: set +x

This error may indicate that it is premature to run the marine letkf ctest. This test may need updates from g-w PR #3401. If true, this again highlights the problem we face with GDASApp getting several development cycles ahead of g-w.

Tagging @guillaumevernieres , @AndrewEichmann-NOAA , and @apchoiCMD for help in debugging the marine DA and bufr2ioda_insitu failures.

@RussTreadon-NOAA
Copy link
Contributor Author

g-w CI update

Install g-w PR #2992 on Hercules. Specifically, g-w branch DavidNew-NOAA:feature/jcb-obsbias at a6fd65ad was installed. sorc/gdas.cd was replaced with GDASApp branch feature/resume_nightly at 4561ead.

The following g-w DA CI was configured and run

  1. C96C48_hybatmDA - GSI based atmospheric DA (prgsi)
  2. C96C48_ufs_hybatmDA - JEDI based atmospheric DA (prjedi)
  3. C96C48_hybatmaerosnowDA - GSI atmospheric DA, JEDI aerosol and snow DA (praero)
  4. C48mx500_3DVarAOWCDA - GSI atmospheric DA, JEDI marine DA (prwcda)

prgsi (1) and prjedi (2) successfully ran to completion

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prgsi_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202112201800        Done    Nov 11 2024 18:50:22    Nov 11 2024 19:20:03
202112210000        Done    Nov 11 2024 18:50:22    Nov 11 2024 21:50:03
202112210600        Done    Nov 11 2024 18:50:22    Nov 11 2024 22:10:03

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prjedi_pr2992
   CYCLE         STATE           ACTIVATED              DEACTIVATED
202402231800        Done    Nov 11 2024 18:50:23    Nov 11 2024 19:20:04
202402240000        Done    Nov 11 2024 18:50:23    Nov 11 2024 22:40:05
202402240600        Done    Nov 11 2024 18:50:23    Nov 11 2024 23:05:04

praero (3) and prwcda (4) encountered DEAD jobs which halted each parallel

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/praero_pr2992
202112201800     enkfgdas_esnowrecen                     3152798                DEAD                   1         2          51.0

rocotostat /work/noaa/stmp/rtreadon/EXPDIR/prwcda_pr2992
202103241800         gdas_marinebmat                     3152661                DEAD                   1         2         114.0

The log files for the DEAD jobs are

  • praero: /work/noaa/stmp/rtreadon/COMROOT/praero_pr2992/logs/2021122018/enkfgdas_esnowrecen.log
  • prwcda: /work/noaa/stmp/rtreadon/COMROOT/prwcda_pr2992/logs/2021032418/gdas_marinebmat.log

@danholdaway
Copy link
Contributor

Thanks you for this effort Russ.

@RussTreadon-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA , I updated feature/resume_nightly with GDASApp develop. This brought in changes from #1352. Now g-w gdas_marinefinal fails with

0: ========= Processing /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4          date: 2021032418
0: insitu_surface_trkob.2021032418.nc4: read database from /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4 (io pool size: 1)
0: insitu_surface_trkob.2021032418.nc4 processed vars: 2 Variables: seaSurfaceSalinity, seaSurfaceTemperature
0: insitu_surface_trkob.2021032418.nc4 assimilated vars: 1 Variables: seaSurfaceSalinity
0: nlocs =863
0: Exception:   Reason: An exception occurred inside ioda while opening a variable.
0:      name:   ombg/seaSurfaceSalinity
0:      source_column:  0
0:      source_filename:        /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/bundle/ioda/src/engines/ioda/src/ioda/Has_Variables.cpp

Is this failure possibly related to #1352?

@RussTreadon-NOAA
Copy link
Contributor Author

GDASApp PR #1374 modifies test/marine/CMakeLists.txt such that the correct python version is set for test_gdasapp_bufr2ioda_insitu*. With this change in place all test_gdasapp_bufr2ioda_insitu* pass

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1993: test_gdasapp_bufr2ioda_insitu_profile_argo
1/8 Test #1993: test_gdasapp_bufr2ioda_insitu_profile_argo .......   Passed   52.78 sec
    Start 1994: test_gdasapp_bufr2ioda_insitu_profile_bathy
2/8 Test #1994: test_gdasapp_bufr2ioda_insitu_profile_bathy ......   Passed    3.72 sec
    Start 1995: test_gdasapp_bufr2ioda_insitu_profile_glider
3/8 Test #1995: test_gdasapp_bufr2ioda_insitu_profile_glider .....   Passed    3.62 sec
    Start 1996: test_gdasapp_bufr2ioda_insitu_profile_tesac
4/8 Test #1996: test_gdasapp_bufr2ioda_insitu_profile_tesac ......   Passed    5.71 sec
    Start 1997: test_gdasapp_bufr2ioda_insitu_profile_tropical
5/8 Test #1997: test_gdasapp_bufr2ioda_insitu_profile_tropical ...   Passed    3.33 sec
    Start 1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd
6/8 Test #1998: test_gdasapp_bufr2ioda_insitu_profile_xbtctd .....   Passed    2.62 sec
    Start 1999: test_gdasapp_bufr2ioda_insitu_surface_drifter
7/8 Test #1999: test_gdasapp_bufr2ioda_insitu_surface_drifter ....   Passed    2.41 sec
    Start 2000: test_gdasapp_bufr2ioda_insitu_surface_trkob
8/8 Test #2000: test_gdasapp_bufr2ioda_insitu_surface_trkob ......   Passed    2.86 sec

100% tests passed, 0 tests failed out of 8

Total Test time (real) =  78.83 sec

@RussTreadon-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA , I rolled back the change to parm/soca/obs/obs_list.yaml from #1352 and reran the test_gdasapp_WCDA-3DVAR-C48mx500 suite of tests. All passed

Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1960: test_gdasapp_WCDA-3DVAR-C48mx500
1/9 Test #1960: test_gdasapp_WCDA-3DVAR-C48mx500 ....................................   Passed   32.22 sec
    Start 1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200
2/9 Test #1961: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_stage_ic_202103241200 .........   Passed   58.00 sec
    Start 1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200
3/9 Test #1962: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_fcst_seg0_202103241200 ........   Passed  408.83 sec
    Start 1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800
4/9 Test #1963: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_prepoceanobs_202103241800 .....   Passed  266.16 sec
    Start 1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800
5/9 Test #1964: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marinebmat_202103241800 .......   Passed  168.43 sec
    Start 1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800
6/9 Test #1965: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlinit_202103241800 ....   Passed  111.26 sec
    Start 1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800
7/9 Test #1966: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlvar_202103241800 .....   Passed  168.09 sec
    Start 1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800
8/9 Test #1967: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlchkpt_202103241800 ...   Passed  180.57 sec
    Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
9/9 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed   68.63 sec

100% tests passed, 0 tests failed out of 9

Label Time Summary:
manual    = 1462.19 sec*proc (9 tests)

Total Test time (real) = 1463.94 sec

Does failure of test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 with the PR #1352 parm/soca/obs/obs_list.yaml make sense?

@RussTreadon-NOAA
Copy link
Contributor Author

RussTreadon-NOAA commented Nov 14, 2024

@guillaumevernieres and @AndrewEichmann-NOAA : test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 fails with the error

^[[38;5;39m2024-11-14 03:11:43,830 - DEBUG    - marine_da_utils: Executing srun -l --export=ALL --hint=nomultithread -n 16 /work/noaa/da/rtreadon/git/global-workflow/pr2992/exec/gdas_soca_gridgen.x /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml^[[0m
 2: Exception: Cannot open /work/noaa/da/rtreadon/git/global-workflow/pr2992/parm/gdas/soca/gridgen/gridgen.yaml  (No such file or directory)

There is no g-w directory parm/gdas/soca/gridgen. I checked g-w PR #3041. I do not see any change to sorc/link_workflow.sh to add this directory to parm/gdas/soca.

Should test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 successfully run in GDASApp develop with g-w develop? Does this test work when built and run inside g-w PR #3041?

guillaumevernieres pushed a commit that referenced this issue Nov 14, 2024
cmake now detects the python version;
previously hard-coded.
This came up in #1362
RussTreadon-NOAA added a commit that referenced this issue Nov 14, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

11/14 status

g-w DA CI testing complete on Hercules. 63 out of 64 test_gdasapp tests pass on Hercules.

Two issues remain to be resolved:

  1. test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 fails when using parm/soca/obs/obs_list.yaml from GDASApp develop at 6bc2760. Reverting to the previous version of obs_list.yaml allows test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 to pass.

  2. test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 fails for at least two reasons

    • the marineanlletkf section is missing from g-w env/HERCULES.env
    • gdas_soca_gridgen.x fails because input yaml $HOMEgfs/parm/gdas/soca/gridgen/gridgen.yaml does not exist

We can not resume nightly testing until all ctest pass. Given this we need to answer two questions

  1. Do we

    • fix GDASApp develop so that test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 passes when using obs_list.yaml from develop, or
    • revert to the previous version of obs_list.yaml, or
    • disable test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800?
  2. Do we

    • fix test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 by adding the missing parm/gdas/soca/gridgen to g-w PR #2992, or
    • disable test test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800?

Tagging @guillaumevernieres , @AndrewEichmann-NOAA , @CoryMartin-NOAA , @danholdaway , @DavidNew-NOAA

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA With regard to gridgen, I deleted that in GDASApp when I refactored the marine bmat, not realizing another code would use it. It exists now as parm/jcb-gdas/algorithm/marine/gridgen.yaml, so you can just point to that file until I refactor the rest of the marine code using JCB.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA Just to answer your question, I say that I add the following to #2992

  1. Point marineanlletkf to gridgen in jcb-gdas per my above comment
  2. Add marineanlletkf to env/HERCULES.env

And then we either revert the obs_list.yaml or fix it

@RussTreadon-NOAA
Copy link
Contributor Author

gdas_marineanlletkf failure - RESOLVED

test_gdasapp_WCDA-hyb-C48mx500_gdas_marineanlletkf_202103241800 passes after making the following changes

  • update state variables in parm/soca/letkf/letkf.yaml.j2 to be consistent with soca PR #1082
  • add missing obs localizatons blocks to insitu_profile_bathy.yaml, insitu_profile_tesac.yaml, and insitu_surface_trkob.yaml in parm/soca/obs/config/

@RussTreadon-NOAA
Copy link
Contributor Author

gdas_marineanlfinal failure - UPDATE

gdas_marineanlfinal fails when gdassoca_obsstats.x attempts to extract seaSurfaceSalinity from the insitu_surface_trkob diagnostic file

0: insitu_surface_trkob.2021032418.nc4: read database from /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4 (io pool size: 1)
0: insitu_surface_trkob.2021032418.nc4 processed vars: 2 Variables: seaSurfaceSalinity, seaSurfaceTemperature
0: insitu_surface_trkob.2021032418.nc4 assimilated vars: 1 Variables: seaSurfaceSalinity
0: nlocs =863
0: Exception:   Reason: An exception occurred inside ioda while opening a variable.
0:      name:   ombg/seaSurfaceSalinity

diags_stats.yaml specifies variable seaSurfaceSalinity to be extracted from analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4.

      engine:
        type: H5File
        obsfile: /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build/gdas/test/gw-ci/../../test/gw-ci/WCDA-3DVAR-C48mx500/COMROOT/WCDA-3DVAR\
-C48mx500/gdas.20210324/18//analysis/ocean/diags/insitu_surface_trkob.2021032418.nc4
    simulated variables:
    - seaSurfaceSalinity
  variable: seaSurfaceSalinity

This is problematic. Variable seaSurfaceSalinity is not in the diagnostic file. var.yaml only specifies seaSurfaceTemperature to be written to the diagnostic file

        obsdataout:
          engine:
            type: H5File
            obsfile: /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/diags/insitu_surface_trkob.2021032418.nc4
        simulated variables:
        - seaSurfaceTemperature
        io pool:
          max pool size: 1

Method obs_space_stats in ush/python/pygfs/task/marine_analysis.py creates diag_stats.yaml. The yaml is populated by querying the diagnostic files in the run directory. Specifically, ObsValue is checked to determine the variable to add to diag_stats.yaml. The ObsValue group contains two variables

group: ObsValue {
  variables:
        float seaSurfaceSalinity(Location) ;
                seaSurfaceSalinity:_FillValue = -3.368795e+38f ;
                string seaSurfaceSalinity:long_name = "seaSurfaceSalinity" ;
                string seaSurfaceSalinity:units = "psu" ;
                seaSurfaceSalinity:valid_range = 0.f, 45.f ;
                seaSurfaceSalinity:_Storage = "chunked" ;
                seaSurfaceSalinity:_ChunkSizes = 863 ;
                seaSurfaceSalinity:_Endianness = "little" ;
        float seaSurfaceTemperature(Location) ;
                seaSurfaceTemperature:_FillValue = -3.368795e+38f ;
                string seaSurfaceTemperature:long_name = "seaSurfaceTemperature" ;
                string seaSurfaceTemperature:units = "degC" ;
                seaSurfaceTemperature:valid_range = -10.f, 50.f ;
                seaSurfaceTemperature:_Storage = "chunked" ;
                seaSurfaceTemperature:_ChunkSizes = 863 ;
                seaSurfaceTemperature:_Endianness = "little" ;

  // group attributes:
  } // group ObsValue

However, the ombg group only contains seaSurfaceTemperature

group: ombg {
  variables:
        float seaSurfaceTemperature(Location) ;
                seaSurfaceTemperature:_FillValue = -3.368795e+38f ;
                seaSurfaceTemperature:_Storage = "chunked" ;
                seaSurfaceTemperature:_ChunkSizes = 863 ;
                seaSurfaceTemperature:_Endianness = "little" ;

  // group attributes:
  } // group ombg

Do we need to change the logic in method obs_space_stats in marine_analysis.py to check ombg instead of ObsValue when populating diag_stats.yaml?

What do you think @guillaumevernieres ? Who on the Marine DA team should I discuss this issue with?

@RussTreadon-NOAA
Copy link
Contributor Author

FYI, making the change suggested above to marine_analysis.py

            # get the variable name, assume 1 variable per file
            nc = netCDF4.Dataset(obsfile, 'r')
            ##variable = next(iter(nc.groups["ObsValue"].variables))
            variable = next(iter(nc.groups["ombg"].variables))
            print(f"variable {variable}")
            nc.close()

works. With this change gdas_marineanlfinal passes.

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build$ ctest -R test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800                                                                           Test project /work/noaa/da/rtreadon/git/global-workflow/pr2992/sorc/gdas.cd/build
    Start 1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800
1/1 Test #1968: test_gdasapp_WCDA-3DVAR-C48mx500_gdas_marineanlfinal_202103241800 ...   Passed   80.23 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary:
manual    =  80.23 sec*proc (1 test)

Total Test time (real) =  84.94 sec

Another thought: Is the better solution to add seaSurfaceSalinity to simulated variables for insitu_surface_trkob That is should var.yaml read

        obsdataout:
          engine:
            type: H5File
            obsfile: /work/noaa/stmp/rtreadon/HERCULES/RUNDIRS/WCDA-3DVAR-C48mx500/gdas.2021032418/gdasmarineanalysis.2021032418/marinevariational/diags/insitu_surface_trkob.2021032418.nc4
        simulated variables:
        - seaSurfaceTemperature
        - seaSurfaceSalinity
        io pool:
          max pool size: 1

@AndrewEichmann-NOAA
Copy link
Collaborator

@RussTreadon-NOAA The letkf problems would be resolved with #1372, which adds back the original gridgen yaml under parm, and adds the localization blocks to the obs space config files.

@guillaumevernieres
Copy link
Contributor

@RussTreadon-NOAA , reverting the obs_list.yaml is what we should do.

@RussTreadon-NOAA
Copy link
Contributor Author

@AndrewEichmann-NOAA , thank you for pointing me at GDASApp PR #1372 to me.

PR #1372 places gridgen.yaml in parm/soca/gridgen/gridgen.yaml. @DavidNew-NOAA mentioned above that gridgen.yaml is now in parm/jcb-gdas/algorithm/marine/gridgen.yaml

We don't need gridgen.yaml in two places. Which location do we go with?

It's good to see that #1372 addresses the missing obs localizations blocks mentioned above.

@DavidNew-NOAA
Copy link
Collaborator

@RussTreadon-NOAA @AndrewEichmann-NOAA Let's just leave gridgen.yaml in jcb-gdas and point there for now

RussTreadon-NOAA added a commit that referenced this issue Nov 14, 2024
@AndrewEichmann-NOAA
Copy link
Collaborator

AndrewEichmann-NOAA commented Nov 14, 2024

@RussTreadon-NOAA @DavidNew-NOAA While it does belong under jcb and the letkf task should be converted to using it, that will require a PR to global-workflow, and letkf will be broken until that PR gets merged.

@DavidNew-NOAA
Copy link
Collaborator

@AndrewEichmann-NOAA I put the jcb-gdas gridgen.yaml reference in config.marineanlletkf here in the this PR in my last commit this morning

@DavidNew-NOAA
Copy link
Collaborator

@AndrewEichmann-NOAA Sorry, I meant in GW PR #2992

@AndrewEichmann-NOAA
Copy link
Collaborator

@DavidNew-NOAA Ah, ok

RussTreadon-NOAA added a commit that referenced this issue Nov 14, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA , reverting the obs_list.yaml is what we should do.

Thanks @guillaumevernieres for the guidance. parm/soca/obs/obs_list.yaml was reverted at 716dcdb

@RussTreadon-NOAA
Copy link
Contributor Author

FYI

I am manually running g-w DA CI on Hera & Hercules using g-w branch feature/jcb-obsbias at 42904ba with sorc/gdas.cd populated with GDASApp branch feature/resume_nightly at 716dcdb.

test_gdasapp had 64 out of 64 tests pass on both machines.

test_gdasapp is currently running on Orion. Pending a 64/64 result I'll launch g-w DA CI on Orion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants