-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow skipping tests that have a baseline already #4737
Conversation
Nice, I find "--allow-baseline-overwrite" to be unusable -- because it might not copy the namelist files again when it's rerun. So instead of using it -- I intentionally blow away the baselines I want to overwrite. This however would be great because it would leave existing good baselines alone, and only redo the ones that aren't there saving time when you have to rerun testlists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I love this and hope others do as well. This is a nice solution to a situation I constantly run into and can't use the "-o" overwrite option for. I hadn't thought of this creative way to do something that will help, so I'm glad you did an have a proposal for it.
I'd like you to add a bit more to the messaging around this. So that's my main suggestion.
There are also regression testing that we should look into adding a test for this to.
See this directory in cime:
CIME/tests
And there's a README there about using the tests:
The main tester is this one:
scripts_regression_tests.py
The tests have either a "test_sys_" or "test_unit_" prefix and it looks like we don't have one for create_test for either class right now. So to test this we'd have to start a new one. And it's likely we'd both need some help to figure out how to do it.
I really would like some testing for create_test in CIME though, as one of the reasons I forgo using the "-o" overwrite option is that it isn't always used and without tests for it -- I can't have confidence that it will continue to work correctly. These specialized options need to have testing for them to make sure they don't break. The regression tests are autorun with commits -- so they do find problems.
@billsacks and @fischer-ncar can you chime in about this one especially while @jedwards4b is away? Also have either of you added unit or sys tests to the cime regression tests and could help us figure out doing that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIME/scripts/create_test.py
Outdated
"\nan existing baseline directory will raise an error. WARNING: Specifying this " | ||
"\noption will allow tests with existing baseline directories to be silently skipped.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my understanding of the code, I'd suggest changing this documentation. My understanding is that (1) the test itself is rerun, it's just the baseline generation phase that's skipped; and (2) it's not really silently skipped, because (if I'm reading the code right) it will lead to a FAIL result in the baseline generation stage. If my understanding is right about these things, can you change the wording of the documentation to make this more clear? (I'd be happy to talk this over with you if it would help.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, good catch! The description is actually correct, but the code wasn't doing what I wanted. The idea is that I want tests with existing baselines to not even be set up, let alone run. Fixing this now (hopefully).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that I want tests with existing baselines to not even be set up, let alone run.
I can see arguments for either behavior (i.e., either what it said in the documentation or how the code is currently working). If you do change the behavior to be what you intended, then I think the name of this argument should change as well: to me, anyway, "--allow-baseline-skip" implies something more like what the code is currently doing; if you entirely skip setting up and running a test because the baseline already exists, I think this needs a name that implies this, like "--skip-tests-with-existing-baselines" (or maybe something a little less wordy?). The difference, in my mind, is that starting this with "allow" somehow suggests to me that not much will change and it's just bypassing an error condition, whereas the behavior you say you want feels a lot more active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like -s
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; changed to that with 3f28e48. I think I'm fine with the wordiness, but maybe it'd be good to add a synonym like
-s
?
Thanks! Given the potential for silently skipping tests, I lean towards requiring a long, explicitly-named option so that the user is sure to know what they're doing when setting the option (as opposed to potentially thinking that something like -s
does one thing when in fact it does this other thing that could be problematic and hard to realize that it did it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @billsacks that non-standard things that won't be used as often should NOT have a shorthand name (and -s is usually used for --silent, so would be confusing). I like the idea of having a more explicit name that accurately describes what's going on. In the initial proposal I like that the name was similar to the --allow-baseline-overwrite option, but with the error checking and the exclusive group and everything I don't think this is needed now.
So I like --skip-tests-with-existing-baselines as I think it's really clear on what it does. So this looks good to me other than possibly adding some testing around this.
@samsrabin do you want to look at adding tests together or would you like to work with @billsacks or someone else on that? I'd like to put some effort in there, even if we only do enough to decide it isn't worth it for this, right now.
Other than that I'm looking forward to seeing this get merged in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Erik! It sounds like there aren't any pre-existing tests that already exist for create_test
, so I'd be starting from scratch. I think that's outside the scope of this PR and should probably be raised as a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I'm wrong: As Bill mentioned, there is some system-testing of create_test
. However, there isn't any testing that involves generating a baseline, so I think it'd still require more effort than I'm able to put in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samsrabin actually according to @billsacks comment here: #4737 (comment)
there are a few that could possibly be added to. I haven't looked at them so I'm not sure how involved this would be though. And I'm totally fine if you want to open an issue and have that part done later.
I expect that we'll be using this with CTSM testing so likely will find bugs that crop up with it. In principle though because I do expect to use it, that's also a reason for doing some testing. But, in any case, I'm good either way here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, OK. My message got sent before I saw your latest update. Go ahead and create an issue and we'll have this as a task for a later PR. It would be good for us in the CTSM team to have more experience in adding cime testing, and that's another reason to do this. But, again we'll do that later.
I have some experience (now a few years old) adding unit tests... less or no experience adding higher-level system tests. I do see some coverage of create_test in |
Thanks, both! As I mentioned in a comment to @billsacks, it turns out the code wasn't doing what I wanted. That's now fixed. Specifically: If there's a test with a baseline directory already, I want that test to be skipped entirely—not even set up, let alone run. Note that this just checks for the existence of the baseline directory, not whether the existing baseline represents a successful test. Does that make sense? I tested this like so, in a checkout of ctsm5.3.020 on Derecho: # Preamble
BASELINE_ROOT="/glade/campaign/cgd/tss/ctsm_baselines"
baseline_name="test_allow_baseline_skip_${USER}"
# Generate baseline directories (not filled!) for aux_clm suite
./run_sys_tests -s aux_clm --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"
# Save for posterity
cp -a $BASELINE_ROOT/$baseline_name $BASELINE_ROOT/$baseline_name.aux_clm
# How many directories are in the baseline after aux_clm?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after aux_clm"
# "There are 235 dirs in baseline after aux_clm"
# Try to run fates suite without --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --extra-create-test-args " --no-build"
# How many tests are listed as already having a baseline?
echo "$(grep -oE "'[^']*'" STDERR* | wc -l) tests already have a baseline"
# E.g.: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL
# "9 tests already have a baseline"
# What are all the tests in the fates suite, and how many of those are in aux_clm?
./run_sys_tests -s fates --skip-compare --generate ${baseline_name}.fates --extra-create-test-args " --no-build"
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}.fates/*/ | wc -l) tests in the fates suite"
# "There are 49 tests in the fates suite"
# So we expect 40 tests to be run when using --allow-baseline-skip
# Run fates suite WITH --allow-baseline-skip
./run_sys_tests -s fates --skip-compare --generate ${baseline_name} --skip-git-status --extra-create-test-args " --no-build --allow-baseline-skip"
# How many directories are in the baseline after fates suite?
echo "There are $(ls -d $BASELINE_ROOT/${baseline_name}/*/ | wc -l) dirs in baseline after fates suite"
# "There are 275 dirs in baseline after fates suite"
# 40 tests were set up, as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new version is a nice, clean implementation of the behavior you want. I'm happy with the implementation, but as I note in #4737 (comment), I feel like --allow-baseline-skip
no longer gives a clear sense of what this option really does. I suggested --skip-tests-with-existing-baselines
, but maybe there's a less wordy name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now - thanks for your work on this @samsrabin !
(It looks like test_scheduler.py needs to be run through black to fix the failing pre-commit check.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks, all. I've added a test, @ekluzek, I think I'll eventually need your approval in order for this to be able to merge. |
LGTM if the testing passes looks like it should be able to merge. @jedwards4b can you push the merge button when the testing passes? |
Strange that my new test fails in the e3sm testing. Here's the output (note the incorrect presence in the
|
Is there a way for me to test this in E3SM on Derecho? When I try, I get |
You'd have to add the derecho machine spec to E3SM/cime_config/machines/config_machines.xml |
I would suggest we enlist @jasonb5 to help with the e3sm test failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving with the addition of the new test - thanks, @samsrabin , for adding this test!
@samsrabin I notice that I can reproduce the incorrect
I think that this may help you in understanding the issue. |
@samsrabin @jedwards4b The test is failing for E3SM because the two projects use different flags for creating and comparing baselines. CESM uses if self._config.test_mode == "cesm":
create_test_extra_args = ["--generate", "baseline", "--no-build", test_name]
else:
create_test_extra_args = ["-g", "--no-build", test_name] |
Nice, thanks y'all! |
Adds
--skip-tests-with-existing-baseline
that skips any tests with a baseline already. Note that it only checks whether the baseline directory exists, not whether it was successfully generated.Based on
cime6.1.59
for now.I've tested only as part of our CTSM testing, plus a check with
create_test
for mutual exclusivity of the new option with-o
/--allow-baseline-overwrite
.