Allow skipping tests that have a baseline already #4737

samsrabin · 2025-01-17T20:58:30Z

Adds --skip-tests-with-existing-baseline that skips any tests with a baseline already. Note that it only checks whether the baseline directory exists, not whether it was successfully generated.

Based on cime6.1.59 for now.

I've tested only as part of our CTSM testing, plus a check with create_test for mutual exclusivity of the new option with -o/--allow-baseline-overwrite.

ekluzek · 2025-01-17T21:13:09Z

Nice, I find "--allow-baseline-overwrite" to be unusable -- because it might not copy the namelist files again when it's rerun. So instead of using it -- I intentionally blow away the baselines I want to overwrite.

This however would be great because it would leave existing good baselines alone, and only redo the ones that aren't there saving time when you have to rerun testlists.

ekluzek

Personally I love this and hope others do as well. This is a nice solution to a situation I constantly run into and can't use the "-o" overwrite option for. I hadn't thought of this creative way to do something that will help, so I'm glad you did an have a proposal for it.

I'd like you to add a bit more to the messaging around this. So that's my main suggestion.

There are also regression testing that we should look into adding a test for this to.

See this directory in cime:

CIME/tests

And there's a README there about using the tests:

The main tester is this one:

scripts_regression_tests.py

The tests have either a "test_sys_" or "test_unit_" prefix and it looks like we don't have one for create_test for either class right now. So to test this we'd have to start a new one. And it's likely we'd both need some help to figure out how to do it.

I really would like some testing for create_test in CIME though, as one of the reasons I forgo using the "-o" overwrite option is that it isn't always used and without tests for it -- I can't have confidence that it will continue to work correctly. These specialized options need to have testing for them to make sure they don't break. The regression tests are autorun with commits -- so they do find problems.

CIME/Tools/component_generate_baseline

CIME/scripts/create_test.py

ekluzek · 2025-01-17T21:34:49Z

@billsacks and @fischer-ncar can you chime in about this one especially while @jedwards4b is away? Also have either of you added unit or sys tests to the cime regression tests and could help us figure out doing that?

billsacks

I agree with @ekluzek - this is a great addition that I can see being very useful. Thank you! I just have a couple of minor comments. I'll also respond to @ekluzek 's questions separately.

CIME/hist_utils.py

billsacks · 2025-01-17T23:11:19Z

CIME/scripts/create_test.py

+        "\nan existing baseline directory will raise an error. WARNING: Specifying this "
+        "\noption will allow tests with existing baseline directories to be silently skipped.",


Based on my understanding of the code, I'd suggest changing this documentation. My understanding is that (1) the test itself is rerun, it's just the baseline generation phase that's skipped; and (2) it's not really silently skipped, because (if I'm reading the code right) it will lead to a FAIL result in the baseline generation stage. If my understanding is right about these things, can you change the wording of the documentation to make this more clear? (I'd be happy to talk this over with you if it would help.)

Oops, good catch! The description is actually correct, but the code wasn't doing what I wanted. The idea is that I want tests with existing baselines to not even be set up, let alone run. Fixing this now (hopefully).

The idea is that I want tests with existing baselines to not even be set up, let alone run.

I can see arguments for either behavior (i.e., either what it said in the documentation or how the code is currently working). If you do change the behavior to be what you intended, then I think the name of this argument should change as well: to me, anyway, "--allow-baseline-skip" implies something more like what the code is currently doing; if you entirely skip setting up and running a test because the baseline already exists, I think this needs a name that implies this, like "--skip-tests-with-existing-baselines" (or maybe something a little less wordy?). The difference, in my mind, is that starting this with "allow" somehow suggests to me that not much will change and it's just bypassing an error condition, whereas the behavior you say you want feels a lot more active.

Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like -s?

Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like -s?

Thanks! Given the potential for silently skipping tests, I lean towards requiring a long, explicitly-named option so that the user is sure to know what they're doing when setting the option (as opposed to potentially thinking that something like -s does one thing when in fact it does this other thing that could be problematic and hard to realize that it did it).

I agree with @billsacks that non-standard things that won't be used as often should NOT have a shorthand name (and -s is usually used for --silent, so would be confusing). I like the idea of having a more explicit name that accurately describes what's going on. In the initial proposal I like that the name was similar to the --allow-baseline-overwrite option, but with the error checking and the exclusive group and everything I don't think this is needed now.

So I like --skip-tests-with-existing-baselines as I think it's really clear on what it does. So this looks good to me other than possibly adding some testing around this.

@samsrabin do you want to look at adding tests together or would you like to work with @billsacks or someone else on that? I'd like to put some effort in there, even if we only do enough to decide it isn't worth it for this, right now.

Other than that I'm looking forward to seeing this get merged in!

Thanks, Erik! It sounds like there aren't any pre-existing tests that already exist for create_test, so I'd be starting from scratch. I think that's outside the scope of this PR and should probably be raised as a separate issue.

Nope, I'm wrong: As Bill mentioned, there is some system-testing of create_test. However, there isn't any testing that involves generating a baseline, so I think it'd still require more effort than I'm able to put in for now.

@samsrabin actually according to @billsacks comment here: #4737 (comment)
there are a few that could possibly be added to. I haven't looked at them so I'm not sure how involved this would be though. And I'm totally fine if you want to open an issue and have that part done later.

I expect that we'll be using this with CTSM testing so likely will find bugs that crop up with it. In principle though because I do expect to use it, that's also a reason for doing some testing. But, in any case, I'm good either way here...

Ahh, OK. My message got sent before I saw your latest update. Go ahead and create an issue and we'll have this as a task for a later PR. It would be good for us in the CTSM team to have more experience in adding cime testing, and that's another reason to do this. But, again we'll do that later.

billsacks · 2025-01-17T23:19:28Z

Also have either of you added unit or sys tests to the cime regression tests and could help us figure out doing that?

I have some experience (now a few years old) adding unit tests... less or no experience adding higher-level system tests. I do see some coverage of create_test in CIME/tests/test_sys_cime_case.py, and there is at least one unit test of baseline generation in CIME/tests/test_unit_system_tests.py (test_generate_baseline). Perhaps most relevant, though, is CIME/tests/test_unit_hist_utils.py: since the main logic addition in this PR is in hist_utils.py, I could imagine adding a unit test of generate_baseline (which doesn't appear to currently be unit tested, from a quick search) in test_unit_hist_utils.py.

samsrabin · 2025-01-18T21:28:41Z

Thanks, both! As I mentioned in a comment to @billsacks, it turns out the code wasn't doing what I wanted. That's now fixed.

Specifically: If there's a test with a baseline directory already, I want that test to be skipped entirely—not even set up, let alone run. Note that this just checks for the existence of the baseline directory, not whether the existing baseline represents a successful test. Does that make sense?

I tested this like so, in a checkout of ctsm5.3.020 on Derecho:

# Preamble
BASELINE_ROOT="/glade/campaign/cgd/tss/ctsm_baselines"
baseline_name="test_allow_baseline_skip_${USER}"

# Generate baseline directories (not filled!) for aux_clm suite
./run_sys_tests -s aux_clm --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"
# Save for posterity
cp -a $BASELINE_ROOT/$baseline_name $BASELINE_ROOT/$baseline_name.aux_clm

# How many directories are in the baseline after aux_clm?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after aux_clm"
# "There are 235 dirs in baseline after aux_clm"

# Try to run fates suite without --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"

# How many tests are listed as already having a baseline?
echo "$(grep -oE "'[^']*'" STDERR* | wc -l) tests already have a baseline"
# E.g.: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL
# "9 tests already have a baseline"

# What are all the tests in the fates suite, and how many of those are in aux_clm?
./run_sys_tests -s fates --skip-compare --generate ${baseline_name}.fates --extra-create-test-args " --no-build"
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}.fates/*/ | wc -l) tests in the fates suite"
# "There are 49 tests in the fates suite"
# So we expect 40 tests to be run when using --allow-baseline-skip

# Run fates suite WITH --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --skip-git-status --extra-create-test-args " --no-build --allow-baseline-skip"

# How many directories are in the baseline after fates suite?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after fates suite"
# "There are 275 dirs in baseline after fates suite"
# 40 tests were set up, as expected.

billsacks

This new version is a nice, clean implementation of the behavior you want. I'm happy with the implementation, but as I note in #4737 (comment), I feel like --allow-baseline-skip no longer gives a clear sense of what this option really does. I suggested --skip-tests-with-existing-baselines, but maybe there's a less wordy name?

CIME/scripts/create_test.py

billsacks

Looks good to me now - thanks for your work on this @samsrabin !

(It looks like test_scheduler.py needs to be run through black to fix the failing pre-commit check.)

jedwards4b

LGTM

samsrabin · 2025-01-22T19:06:25Z

Thanks, all. I've added a test, test_skip_run_with_existing_baseline, that passes. ~~Do y'all think I should run the rest of the test suite for good measure?~~ Looks like it's being run automatically.

@ekluzek, I think I'll eventually need your approval in order for this to be able to merge.

ekluzek · 2025-01-22T19:16:12Z

LGTM if the testing passes looks like it should be able to merge. @jedwards4b can you push the merge button when the testing passes?

…ng_baseline.

samsrabin · 2025-01-22T19:43:58Z

Strange that my new test fails in the e3sm testing. Here's the output (note the incorrect presence in the usage output of [-o | --skip-tests-with-existing-baselines]):

FAILED CIME/tests/test_sys_cime_case.py::TestCimeCase::test_skip_run_with_existing_baseline - AssertionError: 2 != 0 : 
    COMMAND:  /src/E3SM/cime/scripts/create_test --output-root case0 --generate baseline --no-build TESTRUNPASS_P1.f19_g16_rx1.A -t fake_testing_only_20250122_191036 --baseline-root /storage/cases/scripts_regression_test.20250122_190730/baselines --machine docker --test-root=/storage/cases/scripts_regression_test.20250122_190730/case0 --output-root=/storage/cases/scripts_regression_test.20250122_190730/case0
    FROM_DIR: /src/E3SM/cime
    SHOULD HAVE WORKED, INSTEAD GOT STAT 2
    OUTPUT: 
    ERRPUT: usage: create_test [-h] [-d] [-v] [-s] [--no-run] [--no-build] [--no-setup]
                   [-u] [--save-timing] [--no-batch] [--single-exe]
                   [--single-submit] [-r TEST_ROOT]
                   [--output-root OUTPUT_ROOT] [--baseline-root BASELINE_ROOT]
                   [--clean] [-m MACHINE] [--mpilib MPILIB] [-b BASELINE_NAME]
                   [-c] [-g] [--driver DRIVER] [--compiler COMPILER] [-n]
                   [-p PROJECT] [-t TEST_ID] [-j PARALLEL_JOBS]
                   [--proc-pool PROC_POOL] [--walltime WALLTIME] [-q QUEUE]
                   [-f TESTFILE] [-o | --skip-tests-with-existing-baselines]
                   [--wait] [--allow-pnl] [--check-throughput]
                   [--check-memory] [--ignore-namelists] [--ignore-diffs]
                   [--ignore-memleak] [--force-procs FORCE_PROCS]
                   [--force-threads FORCE_THREADS] [-i INPUT_DIR]
                   [--pesfile PESFILE] [--retry RETRY] [-N]
                   [--workflow WORKFLOW] [--chksum] [--srcroot SRCROOT]
                   [--force-rebuild] [--mail-user MAIL_USER] [-M MAIL_TYPE]
                   testargs [testargs ...]
create_test: error: unrecognized arguments: TESTRUNPASS_P1.f19_g16_rx1.A

samsrabin · 2025-01-22T20:38:42Z

Is there a way for me to test this in E3SM on Derecho? When I try, I get ERROR: Could not initialize machine object from /glade/work/samrabin/E3SM/cime_config/machines/config_machines.xml. This machine is not available for the target CIME_MODEL.

rljacob · 2025-01-22T20:42:10Z

You'd have to add the derecho machine spec to E3SM/cime_config/machines/config_machines.xml

jedwards4b · 2025-01-22T20:59:20Z

I would suggest we enlist @jasonb5 to help with the e3sm test failure.

billsacks

Re-approving with the addition of the new test - thanks, @samsrabin , for adding this test!

jedwards4b · 2025-01-22T23:00:56Z

@samsrabin I notice that I can reproduce the incorrect [-o | --skip-tests-with-existing-baselines] using an invalid option to create_test eg:
./create_test ERS.ne30pg3_t232_wg37.BHIST --pesize L
results in:

usage: create_test [-h] [-d] [-v] [-s] [--no-run] [--no-build] [--no-setup] [-u] [--save-timing] [--no-batch] [--single-exe] [--single-submit] [-r TEST_ROOT]
                   [--output-root OUTPUT_ROOT] [--baseline-root BASELINE_ROOT] [--clean] [-m MACHINE] [--mpilib MPILIB] [-c COMPARE] [-g GENERATE] [--xml-machine XML_MACHINE]
                   [--xml-compiler XML_COMPILER] [--xml-category XML_CATEGORY] [--xml-testlist XML_TESTLIST] [--driver {nuopc}] [--compiler COMPILER] [-n] [-p PROJECT] [-t TEST_ID]
                   [-j PARALLEL_JOBS] [--proc-pool PROC_POOL] [--walltime WALLTIME] [-q QUEUE] [-f TESTFILE] [-o | --skip-tests-with-existing-baselines] [--wait] [--allow-pnl]
                   [--check-throughput] [--check-memory] [--ignore-namelists] [--ignore-diffs] [--ignore-memleak] [--force-procs FORCE_PROCS] [--force-threads FORCE_THREADS]
                   [-i INPUT_DIR] [--pesfile PESFILE] [--retry RETRY] [-N] [--workflow WORKFLOW] [--chksum] [--srcroot SRCROOT] [--force-rebuild] [--mail-user MAIL_USER]
                   [-M MAIL_TYPE]
                   [testargs ...]
create_test: error: unrecognized arguments: --pesize L

I think that this may help you in understanding the issue.

jasonb5 · 2025-01-22T23:41:41Z

@samsrabin @jedwards4b The test is failing for E3SM because the two projects use different flags for creating and comparing baselines. CESM uses --generate <name> while E3SM uses --generate, omitting the name of the baseline. Relevant code is here. You'll want to add something like the following so the test will work with both models.

if self._config.test_mode == "cesm":
    create_test_extra_args = ["--generate", "baseline", "--no-build", test_name]
else:
    create_test_extra_args = ["-g", "--no-build", test_name]

…e readable

samsrabin · 2025-01-23T16:32:11Z

Nice, thanks y'all!

samsrabin added 3 commits January 17, 2025 13:02

Add --allow-baseline-skip option to create_test.

19df4fa

Add --allow-baseline-skip option to create_test.

7fe8120

Try return False

eda49a9

ekluzek requested changes Jan 17, 2025

View reviewed changes

CIME/Tools/component_generate_baseline Outdated Show resolved Hide resolved

CIME/scripts/create_test.py Show resolved Hide resolved

CIME/scripts/create_test.py Outdated Show resolved Hide resolved

billsacks requested changes Jan 17, 2025

View reviewed changes

samsrabin added 5 commits January 18, 2025 13:53

Remove diagnostic printout.

cdd90d1

ACTUALLY skip tests with baselines already.

31b46b0

Remove now-unneeded allow_baseline_skip stuff from generate_baseline().

6e372ce

Note incompatibility of --allow-baseline-overwrite and -skip.

9c8c2b3

Revert remaining changes to hist_utils.py.

f48d5cf

billsacks requested changes Jan 18, 2025

View reviewed changes

gold2718 reviewed Jan 19, 2025

View reviewed changes

CIME/scripts/create_test.py Outdated Show resolved Hide resolved

samsrabin added 2 commits January 19, 2025 13:53

Use mutually exclusive parsing group for existing baseline options.

0aa6f54

Rename --allow-baseline-skip to --skip-tests-with-existing-baselines.

3f28e48

samsrabin requested review from ekluzek and billsacks January 19, 2025 21:04

billsacks approved these changes Jan 20, 2025

View reviewed changes

jedwards4b approved these changes Jan 21, 2025

View reviewed changes

samsrabin added 3 commits January 22, 2025 11:18

Add test_skip_run_with_existing_baseline.

ff177e8

Reformat CIME/ with black.

6751504

Add previous commit to .git-blame-ignore-revs.

1e9bf30

ekluzek approved these changes Jan 22, 2025

View reviewed changes

Remove unused args from create_test call in test_skip_run_with_existi…

0e4bbb9

…ng_baseline.

billsacks approved these changes Jan 22, 2025

View reviewed changes

jedwards4b added 2 commits January 23, 2025 07:17

allow for different args in e3sm testing

4147a46

reverse -o and --allow-baseline-overwrite so that help message is mor…

664c7a2

…e readable

jedwards4b merged commit 5eff840 into ESMCI:master Jan 23, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow skipping tests that have a baseline already #4737

Allow skipping tests that have a baseline already #4737

samsrabin commented Jan 17, 2025 •

edited

Loading

ekluzek commented Jan 17, 2025

ekluzek left a comment

ekluzek commented Jan 17, 2025

billsacks left a comment

billsacks Jan 17, 2025

samsrabin Jan 18, 2025

billsacks Jan 18, 2025

samsrabin Jan 19, 2025

billsacks Jan 20, 2025

ekluzek Jan 21, 2025

samsrabin Jan 21, 2025

samsrabin Jan 21, 2025

ekluzek Jan 21, 2025

ekluzek Jan 21, 2025

billsacks commented Jan 17, 2025

samsrabin commented Jan 18, 2025 •

edited

Loading

billsacks left a comment

billsacks left a comment

jedwards4b left a comment

samsrabin commented Jan 22, 2025 •

edited

Loading

ekluzek commented Jan 22, 2025

samsrabin commented Jan 22, 2025

samsrabin commented Jan 22, 2025

rljacob commented Jan 22, 2025

jedwards4b commented Jan 22, 2025

billsacks left a comment

jedwards4b commented Jan 22, 2025

jasonb5 commented Jan 22, 2025

samsrabin commented Jan 23, 2025

		"\nan existing baseline directory will raise an error. WARNING: Specifying this "
		"\noption will allow tests with existing baseline directories to be silently skipped.",

Allow skipping tests that have a baseline already #4737

Allow skipping tests that have a baseline already #4737

Conversation

samsrabin commented Jan 17, 2025 • edited Loading

ekluzek commented Jan 17, 2025

ekluzek left a comment

Choose a reason for hiding this comment

ekluzek commented Jan 17, 2025

billsacks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

billsacks commented Jan 17, 2025

samsrabin commented Jan 18, 2025 • edited Loading

billsacks left a comment

Choose a reason for hiding this comment

billsacks left a comment

Choose a reason for hiding this comment

jedwards4b left a comment

Choose a reason for hiding this comment

samsrabin commented Jan 22, 2025 • edited Loading

ekluzek commented Jan 22, 2025

samsrabin commented Jan 22, 2025

samsrabin commented Jan 22, 2025

rljacob commented Jan 22, 2025

jedwards4b commented Jan 22, 2025

billsacks left a comment

Choose a reason for hiding this comment

jedwards4b commented Jan 22, 2025

jasonb5 commented Jan 22, 2025

samsrabin commented Jan 23, 2025

samsrabin commented Jan 17, 2025 •

edited

Loading

samsrabin commented Jan 18, 2025 •

edited

Loading

samsrabin commented Jan 22, 2025 •

edited

Loading