Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow skipping tests that have a baseline already #4737

Merged
merged 16 commits into from
Jan 23, 2025

Conversation

samsrabin
Copy link
Contributor

@samsrabin samsrabin commented Jan 17, 2025

Adds --skip-tests-with-existing-baseline that skips any tests with a baseline already. Note that it only checks whether the baseline directory exists, not whether it was successfully generated.

Based on cime6.1.59 for now.

I've tested only as part of our CTSM testing, plus a check with create_test for mutual exclusivity of the new option with -o/--allow-baseline-overwrite.

@ekluzek
Copy link
Contributor

ekluzek commented Jan 17, 2025

Nice, I find "--allow-baseline-overwrite" to be unusable -- because it might not copy the namelist files again when it's rerun. So instead of using it -- I intentionally blow away the baselines I want to overwrite.

This however would be great because it would leave existing good baselines alone, and only redo the ones that aren't there saving time when you have to rerun testlists.

Copy link
Contributor

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I love this and hope others do as well. This is a nice solution to a situation I constantly run into and can't use the "-o" overwrite option for. I hadn't thought of this creative way to do something that will help, so I'm glad you did an have a proposal for it.

I'd like you to add a bit more to the messaging around this. So that's my main suggestion.

There are also regression testing that we should look into adding a test for this to.

See this directory in cime:

CIME/tests

And there's a README there about using the tests:

The main tester is this one:

scripts_regression_tests.py

The tests have either a "test_sys_" or "test_unit_" prefix and it looks like we don't have one for create_test for either class right now. So to test this we'd have to start a new one. And it's likely we'd both need some help to figure out how to do it.

I really would like some testing for create_test in CIME though, as one of the reasons I forgo using the "-o" overwrite option is that it isn't always used and without tests for it -- I can't have confidence that it will continue to work correctly. These specialized options need to have testing for them to make sure they don't break. The regression tests are autorun with commits -- so they do find problems.

CIME/Tools/component_generate_baseline Outdated Show resolved Hide resolved
CIME/scripts/create_test.py Show resolved Hide resolved
CIME/scripts/create_test.py Outdated Show resolved Hide resolved
@ekluzek
Copy link
Contributor

ekluzek commented Jan 17, 2025

@billsacks and @fischer-ncar can you chime in about this one especially while @jedwards4b is away? Also have either of you added unit or sys tests to the cime regression tests and could help us figure out doing that?

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ekluzek - this is a great addition that I can see being very useful. Thank you! I just have a couple of minor comments. I'll also respond to @ekluzek 's questions separately.

CIME/hist_utils.py Outdated Show resolved Hide resolved
Comment on lines 373 to 374
"\nan existing baseline directory will raise an error. WARNING: Specifying this "
"\noption will allow tests with existing baseline directories to be silently skipped.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of the code, I'd suggest changing this documentation. My understanding is that (1) the test itself is rerun, it's just the baseline generation phase that's skipped; and (2) it's not really silently skipped, because (if I'm reading the code right) it will lead to a FAIL result in the baseline generation stage. If my understanding is right about these things, can you change the wording of the documentation to make this more clear? (I'd be happy to talk this over with you if it would help.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch! The description is actually correct, but the code wasn't doing what I wanted. The idea is that I want tests with existing baselines to not even be set up, let alone run. Fixing this now (hopefully).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that I want tests with existing baselines to not even be set up, let alone run.

I can see arguments for either behavior (i.e., either what it said in the documentation or how the code is currently working). If you do change the behavior to be what you intended, then I think the name of this argument should change as well: to me, anyway, "--allow-baseline-skip" implies something more like what the code is currently doing; if you entirely skip setting up and running a test because the baseline already exists, I think this needs a name that implies this, like "--skip-tests-with-existing-baselines" (or maybe something a little less wordy?). The difference, in my mind, is that starting this with "allow" somehow suggests to me that not much will change and it's just bypassing an error condition, whereas the behavior you say you want feels a lot more active.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like -s?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like -s?

Thanks! Given the potential for silently skipping tests, I lean towards requiring a long, explicitly-named option so that the user is sure to know what they're doing when setting the option (as opposed to potentially thinking that something like -s does one thing when in fact it does this other thing that could be problematic and hard to realize that it did it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @billsacks that non-standard things that won't be used as often should NOT have a shorthand name (and -s is usually used for --silent, so would be confusing). I like the idea of having a more explicit name that accurately describes what's going on. In the initial proposal I like that the name was similar to the --allow-baseline-overwrite option, but with the error checking and the exclusive group and everything I don't think this is needed now.

So I like --skip-tests-with-existing-baselines as I think it's really clear on what it does. So this looks good to me other than possibly adding some testing around this.

@samsrabin do you want to look at adding tests together or would you like to work with @billsacks or someone else on that? I'd like to put some effort in there, even if we only do enough to decide it isn't worth it for this, right now.

Other than that I'm looking forward to seeing this get merged in!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Erik! It sounds like there aren't any pre-existing tests that already exist for create_test, so I'd be starting from scratch. I think that's outside the scope of this PR and should probably be raised as a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I'm wrong: As Bill mentioned, there is some system-testing of create_test. However, there isn't any testing that involves generating a baseline, so I think it'd still require more effort than I'm able to put in for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsrabin actually according to @billsacks comment here: #4737 (comment)
there are a few that could possibly be added to. I haven't looked at them so I'm not sure how involved this would be though. And I'm totally fine if you want to open an issue and have that part done later.

I expect that we'll be using this with CTSM testing so likely will find bugs that crop up with it. In principle though because I do expect to use it, that's also a reason for doing some testing. But, in any case, I'm good either way here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, OK. My message got sent before I saw your latest update. Go ahead and create an issue and we'll have this as a task for a later PR. It would be good for us in the CTSM team to have more experience in adding cime testing, and that's another reason to do this. But, again we'll do that later.

@billsacks
Copy link
Member

Also have either of you added unit or sys tests to the cime regression tests and could help us figure out doing that?

I have some experience (now a few years old) adding unit tests... less or no experience adding higher-level system tests. I do see some coverage of create_test in CIME/tests/test_sys_cime_case.py, and there is at least one unit test of baseline generation in CIME/tests/test_unit_system_tests.py (test_generate_baseline). Perhaps most relevant, though, is CIME/tests/test_unit_hist_utils.py: since the main logic addition in this PR is in hist_utils.py, I could imagine adding a unit test of generate_baseline (which doesn't appear to currently be unit tested, from a quick search) in test_unit_hist_utils.py.

@samsrabin
Copy link
Contributor Author

samsrabin commented Jan 18, 2025

Thanks, both! As I mentioned in a comment to @billsacks, it turns out the code wasn't doing what I wanted. That's now fixed.

Specifically: If there's a test with a baseline directory already, I want that test to be skipped entirely—not even set up, let alone run. Note that this just checks for the existence of the baseline directory, not whether the existing baseline represents a successful test. Does that make sense?

I tested this like so, in a checkout of ctsm5.3.020 on Derecho:

# Preamble
BASELINE_ROOT="/glade/campaign/cgd/tss/ctsm_baselines"
baseline_name="test_allow_baseline_skip_${USER}"

# Generate baseline directories (not filled!) for aux_clm suite
./run_sys_tests -s aux_clm --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"
# Save for posterity
cp -a $BASELINE_ROOT/$baseline_name $BASELINE_ROOT/$baseline_name.aux_clm

# How many directories are in the baseline after aux_clm?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after aux_clm"
# "There are 235 dirs in baseline after aux_clm"

# Try to run fates suite without --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"

# How many tests are listed as already having a baseline?
echo "$(grep -oE "'[^']*'" STDERR* | wc -l) tests already have a baseline"
# E.g.: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL
# "9 tests already have a baseline"

# What are all the tests in the fates suite, and how many of those are in aux_clm?
./run_sys_tests -s fates --skip-compare --generate ${baseline_name}.fates --extra-create-test-args " --no-build"
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}.fates/*/ | wc -l) tests in the fates suite"
# "There are 49 tests in the fates suite"
# So we expect 40 tests to be run when using --allow-baseline-skip

# Run fates suite WITH --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --skip-git-status --extra-create-test-args " --no-build --allow-baseline-skip"

# How many directories are in the baseline after fates suite?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after fates suite"
# "There are 275 dirs in baseline after fates suite"
# 40 tests were set up, as expected.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new version is a nice, clean implementation of the behavior you want. I'm happy with the implementation, but as I note in #4737 (comment), I feel like --allow-baseline-skip no longer gives a clear sense of what this option really does. I suggested --skip-tests-with-existing-baselines, but maybe there's a less wordy name?

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now - thanks for your work on this @samsrabin !

(It looks like test_scheduler.py needs to be run through black to fix the failing pre-commit check.)

Copy link
Contributor

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samsrabin
Copy link
Contributor Author

samsrabin commented Jan 22, 2025

Thanks, all. I've added a test, test_skip_run_with_existing_baseline, that passes. Do y'all think I should run the rest of the test suite for good measure? Looks like it's being run automatically.

@ekluzek, I think I'll eventually need your approval in order for this to be able to merge.

@ekluzek
Copy link
Contributor

ekluzek commented Jan 22, 2025

LGTM if the testing passes looks like it should be able to merge. @jedwards4b can you push the merge button when the testing passes?

@samsrabin
Copy link
Contributor Author

Strange that my new test fails in the e3sm testing. Here's the output (note the incorrect presence in the usage output of [-o | --skip-tests-with-existing-baselines]):

FAILED CIME/tests/test_sys_cime_case.py::TestCimeCase::test_skip_run_with_existing_baseline - AssertionError: 2 != 0 : 
    COMMAND:  /src/E3SM/cime/scripts/create_test --output-root case0 --generate baseline --no-build TESTRUNPASS_P1.f19_g16_rx1.A -t fake_testing_only_20250122_191036 --baseline-root /storage/cases/scripts_regression_test.20250122_190730/baselines --machine docker --test-root=/storage/cases/scripts_regression_test.20250122_190730/case0 --output-root=/storage/cases/scripts_regression_test.20250122_190730/case0
    FROM_DIR: /src/E3SM/cime
    SHOULD HAVE WORKED, INSTEAD GOT STAT 2
    OUTPUT: 
    ERRPUT: usage: create_test [-h] [-d] [-v] [-s] [--no-run] [--no-build] [--no-setup]
                   [-u] [--save-timing] [--no-batch] [--single-exe]
                   [--single-submit] [-r TEST_ROOT]
                   [--output-root OUTPUT_ROOT] [--baseline-root BASELINE_ROOT]
                   [--clean] [-m MACHINE] [--mpilib MPILIB] [-b BASELINE_NAME]
                   [-c] [-g] [--driver DRIVER] [--compiler COMPILER] [-n]
                   [-p PROJECT] [-t TEST_ID] [-j PARALLEL_JOBS]
                   [--proc-pool PROC_POOL] [--walltime WALLTIME] [-q QUEUE]
                   [-f TESTFILE] [-o | --skip-tests-with-existing-baselines]
                   [--wait] [--allow-pnl] [--check-throughput]
                   [--check-memory] [--ignore-namelists] [--ignore-diffs]
                   [--ignore-memleak] [--force-procs FORCE_PROCS]
                   [--force-threads FORCE_THREADS] [-i INPUT_DIR]
                   [--pesfile PESFILE] [--retry RETRY] [-N]
                   [--workflow WORKFLOW] [--chksum] [--srcroot SRCROOT]
                   [--force-rebuild] [--mail-user MAIL_USER] [-M MAIL_TYPE]
                   testargs [testargs ...]
create_test: error: unrecognized arguments: TESTRUNPASS_P1.f19_g16_rx1.A

@samsrabin
Copy link
Contributor Author

Is there a way for me to test this in E3SM on Derecho? When I try, I get ERROR: Could not initialize machine object from /glade/work/samrabin/E3SM/cime_config/machines/config_machines.xml. This machine is not available for the target CIME_MODEL.

@rljacob
Copy link
Member

rljacob commented Jan 22, 2025

You'd have to add the derecho machine spec to E3SM/cime_config/machines/config_machines.xml

@jedwards4b
Copy link
Contributor

I would suggest we enlist @jasonb5 to help with the e3sm test failure.

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving with the addition of the new test - thanks, @samsrabin , for adding this test!

@jedwards4b
Copy link
Contributor

@samsrabin I notice that I can reproduce the incorrect [-o | --skip-tests-with-existing-baselines] using an invalid option to create_test eg:
./create_test ERS.ne30pg3_t232_wg37.BHIST --pesize L
results in:

usage: create_test [-h] [-d] [-v] [-s] [--no-run] [--no-build] [--no-setup] [-u] [--save-timing] [--no-batch] [--single-exe] [--single-submit] [-r TEST_ROOT]
                   [--output-root OUTPUT_ROOT] [--baseline-root BASELINE_ROOT] [--clean] [-m MACHINE] [--mpilib MPILIB] [-c COMPARE] [-g GENERATE] [--xml-machine XML_MACHINE]
                   [--xml-compiler XML_COMPILER] [--xml-category XML_CATEGORY] [--xml-testlist XML_TESTLIST] [--driver {nuopc}] [--compiler COMPILER] [-n] [-p PROJECT] [-t TEST_ID]
                   [-j PARALLEL_JOBS] [--proc-pool PROC_POOL] [--walltime WALLTIME] [-q QUEUE] [-f TESTFILE] [-o | --skip-tests-with-existing-baselines] [--wait] [--allow-pnl]
                   [--check-throughput] [--check-memory] [--ignore-namelists] [--ignore-diffs] [--ignore-memleak] [--force-procs FORCE_PROCS] [--force-threads FORCE_THREADS]
                   [-i INPUT_DIR] [--pesfile PESFILE] [--retry RETRY] [-N] [--workflow WORKFLOW] [--chksum] [--srcroot SRCROOT] [--force-rebuild] [--mail-user MAIL_USER]
                   [-M MAIL_TYPE]
                   [testargs ...]
create_test: error: unrecognized arguments: --pesize L

I think that this may help you in understanding the issue.

@jasonb5
Copy link
Collaborator

jasonb5 commented Jan 22, 2025

@samsrabin @jedwards4b The test is failing for E3SM because the two projects use different flags for creating and comparing baselines. CESM uses --generate <name> while E3SM uses --generate, omitting the name of the baseline. Relevant code is here. You'll want to add something like the following so the test will work with both models.

if self._config.test_mode == "cesm":
    create_test_extra_args = ["--generate", "baseline", "--no-build", test_name]
else:
    create_test_extra_args = ["-g", "--no-build", test_name]

@jedwards4b jedwards4b merged commit 5eff840 into ESMCI:master Jan 23, 2025
7 checks passed
@samsrabin
Copy link
Contributor Author

Nice, thanks y'all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants