Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize unit tests #902

Merged
merged 11 commits into from
Apr 4, 2024
Merged

Parallelize unit tests #902

merged 11 commits into from
Apr 4, 2024

Conversation

ahelwer
Copy link
Contributor

@ahelwer ahelwer commented Apr 2, 2024

This was actually easier than expected (edit: it was not easier than expected). Somehow I missed that the JUnit task has a threads parameter. I guess this is a newer addition which is why not a lot of documentation mentions it. Now the unit test target will run with number of threads equal to available cores on the machine. Decent improvement, unit tests finish in less than 10 minutes now which puts them below the time of the tlaplus/examples integration tests! There were a few small changes I had to make to some tests to ensure they don't interfere with each other with operations on the states metadir.

I also added a useful python script to run during the CI after the tests; it:

  • Prints the total number of tests and total number of failed tests
  • Prints out any tests that were run multiple times by mistake
  • Prints out the top-20 longest-running tests and their runtimes
  • If any tests failed, fetches all the error info from the xml file in the surefire runner directory and prints it to console

I made the following changes to the ant build:

  • Added a separate sequential unit test target that runs before the parallelized unit test target, for tests which cannot be parallelized for whatever reason (exclusively bind to a system network port or something).
  • Ensure all unit tests run even if some fail by not passing in the -Dtest.halt=true parameter; previously the tests would stop at the first failure, which IMO is not as useful for debugging. This necessitated adding a conditional fail target under the test targets, which activates when failureproperty is set in the junit target. This ensures the CI still fails with nonzero exit code upon any failures.
  • Moved the longest-running tests first so their execution is covered by the concurrent execution of other tests and doesn't singlehandedly cause the unit tests to run for an additional minute or whatever.

I ran this workflow locally on my 8-thread workstation over 10 times so I have fairly high confidence there aren't any other concurrency failures lurking.

Closes #901

ahelwer added 3 commits April 2, 2024 10:39
Signed-off-by: Andrew Helwer <[email protected]>
Signed-off-by: Andrew Helwer <[email protected]>
@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 2, 2024

Used iwaitnotify to debug which test was sneaking and deleting all the metadirs:

directory=./states/tlc2.tool.liveness.CodePlexBug08EWD840FL2Test/

monitor_directory() {
    ant -f customBuild.xml compile compile-test dist test &

    # Get the PID of the program
    pid=$!
    echo $pid

    while [ ! -d "./states" ]; do
      inotifywait -e create "."
    done
    echo "\e[31Directory states created\e[0m"

    while [ ! -d "$directory" ]; do
      inotifywait -e create "./states"
    done
    echo "\e[31Directory '$directory' created\e[0m"

    # Monitor the directory for delete events
    while true; do
        inotifywait -e delete_self "$directory"
        echo "\e[31Directory '$directory' deleted, pausing program\e[0m"
        kill -STOP $pid
        echo "\e[31Program paused, press Enter to resume or Ctrl+C to terminate\e[0m"
        read -r
        kill -CONT $pid
    done
}

monitor_directory

Turns out it was tlc2.tool.simulation.SimulationWorkerTest. Never would have figured that out on my own. It deletes the entire states dir lol.

@lemmy
Copy link
Member

lemmy commented Apr 2, 2024

CommonTestCase should probably set TLCGlobals.metaRoot to a directory whose name is the concatenation of the test name and a timestamp.

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 2, 2024

I discovered that all the tlc2/tool/distributed tests exclusively bind to a network port, so I moved those out into a separate sequential unit test target that goes first.

@ahelwer ahelwer marked this pull request as ready for review April 3, 2024 00:20
@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 3, 2024

@lemmy @Calvin-L this is good to be merged AFAIK. There might be some nondeterministic failures lurking in there that surface under a weird execution order but I think I've caught most of them.

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 3, 2024

I have now successfully run this workflow on my 8-threaded local workstation over 10 times, giving reasonable confidence that there are no additional concurrency failures lurking in the tests.

@lemmy lemmy added enhancement Lets change things for the better refactoring Steadily improving code base DevEnvironment Everything related to the Toolbox development environment labels Apr 3, 2024
Copy link
Member

@lemmy lemmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @Calvin-L Feel free to merge as you see fit.

tlatools/org.lamport.tlatools/customBuild.xml Show resolved Hide resolved
.github/scripts/parse-unit-test-output.py Outdated Show resolved Hide resolved
.github/scripts/parse-unit-test-output.py Show resolved Hide resolved
@lemmy lemmy added this to the 1.8.0 milestone Apr 4, 2024
@Calvin-L Calvin-L merged commit 704bd4d into tlaplus:master Apr 4, 2024
5 checks passed
@lemmy
Copy link
Member

lemmy commented Apr 5, 2024

Dang, some tests fail when executed in parallel on my macbook m1:

[...]
    [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.685 sec, Thread: 8, Class: tlc2.TraceExpressionSpecSafetySimTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.956 sec, Thread: 3, Class: tlc2.TraceExpressionSpecRuntimeTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.069 sec, Thread: 6, Class: tlc2.TraceExpressionSpecDeadlockTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.875 sec, Thread: 7, Class: tlc2.TraceExpressionSpecSafetyBFSTest
    [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.42 sec, Thread: 9, Class: tlc2.debug.DebugTLCVariableTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.443 sec, Thread: 1, Class: tlc2.REPLTest
    [junit] Test tlc2.TraceExpressionSpecSafetySimTest FAILED
[...]
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.121 sec, Thread: 9, Class: tlc2.tool.suite.ETest1
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.153 sec, Thread: 7, Class: tlc2.tool.suite.ETest13
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.952 sec, Thread: 8, Class: tlc2.tool.suite.ETest14
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.399 sec, Thread: 5, Class: tlc2.tool.suite.ETest11
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.339 sec, Thread: 6, Class: tlc2.tool.suite.ETest12
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.492 sec, Thread: 3, Class: tlc2.tool.suite.ETest10
    [junit] Running tlc2.tool.suite.ETest15 in thread 9
    [junit] Test tlc2.tool.suite.ETest13 FAILED
    [junit] Test tlc2.tool.suite.ETest14 FAILED
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.714 sec, Thread: 10, Class: tlc2.tool.simulation.NQSpecTest
    [junit] Running tlc2.tool.suite.ETest16 in thread 7
    [junit] Running tlc2.tool.suite.ETest2 in thread 8
    [junit] Running tlc2.tool.suite.ETest3 in thread 6
    [junit] Running tlc2.tool.suite.ETest4 in thread 3
    [junit] Running tlc2.tool.suite.ETest5 in thread 5
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.476 sec, Thread: 9, Class: tlc2.tool.suite.ETest15
    [junit] Test tlc2.tool.suite.ETest15 FAILED
    [junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.784 sec, Thread: 4, Class: tlc2.tool.simulation.SimulatorTest
    [junit] Running tlc2.tool.suite.ETest6 in thread 10
    [junit] Running tlc2.tool.suite.ETest7 in thread 9
    [junit] Running tlc2.tool.suite.ETest8 in thread 4
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.175 sec, Thread: 7, Class: tlc2.tool.suite.ETest16
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.2 sec, Thread: 8, Class: tlc2.tool.suite.ETest2
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.125 sec, Thread: 5, Class: tlc2.tool.suite.ETest5
    [junit] Running tlc2.tool.suite.Test1 in thread 5
    [junit] Running tlc2.tool.suite.ETest9 in thread 7
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.956 sec, Thread: 9, Class: tlc2.tool.suite.ETest7
    [junit] Running tlc2.tool.suite.Test10 in thread 8
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.655 sec, Thread: 6, Class: tlc2.tool.suite.ETest3
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.651 sec, Thread: 3, Class: tlc2.tool.suite.ETest4
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.338 sec, Thread: 10, Class: tlc2.tool.suite.ETest6
    [junit] Test tlc2.tool.suite.ETest3 FAILED
    [junit] Test tlc2.tool.suite.ETest4 FAILED
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.208 sec, Thread: 4, Class: tlc2.tool.suite.ETest8
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.744 sec, Thread: 7, Class: tlc2.tool.suite.ETest9
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.749 sec, Thread: 5, Class: tlc2.tool.suite.Test1
    [junit] Running tlc2.tool.suite.Test11 in thread 9
    [junit] Running tlc2.tool.suite.Test12 in thread 6
    [junit] Running tlc2.tool.suite.Test13 in thread 3
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.705 sec, Thread: 8, Class: tlc2.tool.suite.Test10
    [junit] Running tlc2.tool.suite.Test14 in thread 10
    [junit] Test tlc2.tool.suite.ETest9 FAILED
    [junit] Test tlc2.tool.suite.Test1 FAILED
    [junit] Test tlc2.tool.suite.Test10 FAILED
    [junit] Running tlc2.tool.suite.Test15 in thread 4
    [junit] Running tlc2.tool.suite.Test16 in thread 7
    [junit] Running tlc2.tool.suite.Test17 in thread 5
    [junit] Running tlc2.tool.suite.Test18 in thread 8
    [junit] Tests run: 10, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.34 sec, Thread: 2, Class: tlc2.tool.simulation.SimulatorMultiThreadTest
    [junit] Test tlc2.tool.simulation.SimulatorMultiThreadTest FAILED
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.48 sec, Thread: 7, Class: tlc2.tool.suite.Test16
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.015 sec, Thread: 9, Class: tlc2.tool.suite.Test11
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.016 sec, Thread: 6, Class: tlc2.tool.suite.Test12
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.793 sec, Thread: 10, Class: tlc2.tool.suite.Test14
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.012 sec, Thread: 3, Class: tlc2.tool.suite.Test13
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.6 sec, Thread: 4, Class: tlc2.tool.suite.Test15
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.385 sec, Thread: 8, Class: tlc2.tool.suite.Test18
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.486 sec, Thread: 5, Class: tlc2.tool.suite.Test17
    [junit] Running tlc2.tool.suite.Test2 in thread 2
    [junit] Tests run: 12, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 7.522 sec, Thread: 1, Class: tlc2.tool.simulation.SimulationWorkerTest
    [junit] Test tlc2.tool.simulation.SimulationWorkerTest FAILED
    [junit] Test tlc2.tool.suite.Test17 FAILED
    [junit] Test tlc2.tool.suite.Test15 FAILED
    [junit] Test tlc2.tool.suite.Test12 FAILED
    [junit] Test tlc2.tool.suite.Test13 FAILED
    [junit] Test tlc2.tool.suite.Test18 FAILED
    [junit] Test tlc2.tool.suite.Test14 FAILED
    [junit] Test tlc2.tool.suite.Test11 FAILED
    [junit] Test tlc2.tool.suite.Test16 FAILED
    [junit] Running tlc2.tool.suite.Test20 in thread 1
    [junit] Running tlc2.tool.suite.Test207 in thread 9
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.53 sec, Thread: 2, Class: tlc2.tool.suite.Test2
    [junit] Running tlc2.tool.suite.Test201 in thread 5
    [junit] Running tlc2.tool.suite.Test202 in thread 4
    [junit] Running tlc2.tool.suite.Test208 in thread 7
    [junit] Running tlc2.tool.suite.Test205 in thread 8
    [junit] Running tlc2.tool.suite.Test203 in thread 6
    [junit] Running tlc2.tool.suite.Test204 in thread 3
    [junit] Running tlc2.tool.suite.Test206 in thread 10
    [junit] Test tlc2.tool.suite.Test2 FAILED
[...]

@lemmy
Copy link
Member

lemmy commented Apr 5, 2024

FWIW: Execution time is down from ~12 minutes to ~3 minutes.

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 5, 2024

Drats! Impressive execution time though. Okay I'll try executing them on my M1 macbook tomorrow

@lemmy
Copy link
Member

lemmy commented Apr 5, 2024

If you touch this again, it would also be useful to support limiting the parallelism (up to a single thread) to not consume all system resources. For example, I typically launch the test suite and then tend to other work.

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 5, 2024

@lemmy can't reproduce on my m1 macbook unfortunately. It runs a bit slower than yours, completing in about 4 minutes. Ran it about five times without failures.

Would you be able to run the following on your mini then paste the output here? The python script summary should give a good head start on debugging this:

ant -f tlatools/org.lamport.tlatools/customBuild.xml compile compile-test dist test 2>&1 | tee test-output.txt
python .github/scripts/parse-unit-test-output.py test-output.txt tlatools/org.lamport.tlatools/target/surefire-reports/

One weird thing I noticed, tlc2.tool.fp.OffHeapDiskFPSetTest ran much slower on my m1 macbook than my 10-year-old workstation. Took 141.3 seconds to finish on my macbook vs. 70 seconds on my workstation and the github CI.

@lemmy
Copy link
Member

lemmy commented Apr 5, 2024

All tests pass if there is no other activity on machine. Could be that I ran TLC from VSCode yesterday while running the suite in the background.

What's the OS of your workstation?

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 5, 2024

Workstation is running linux

@Calvin-L
Copy link
Collaborator

Calvin-L commented Apr 5, 2024

Just to add another point of data: the tests run successfully on my 10-core M1 laptop (MacOS 14.4) in 2m 16s. I haven't tried running them with significant background activity.

@ahelwer
Copy link
Contributor Author

ahelwer commented Apr 5, 2024

How long did it take to run the tlc2.tool.fp.OffHeapDiskFPSetTest test on your laptop?

@Calvin-L
Copy link
Collaborator

Calvin-L commented Apr 5, 2024

Looks like 1m 22s:

    [junit] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 82.953 sec, Thread: 4, Class: tlc2.tool.fp.OffHeapDiskFPSetTest

@lemmy
Copy link
Member

lemmy commented Jun 17, 2024

It is a bad idea to parse Ant's output. parse-unit-test-output.py failed to identify the test failure in https://github.com/tlaplus/tlaplus/actions/runs/9554885211/job/26336900210?pr=953.

@ahelwer
Copy link
Contributor Author

ahelwer commented Jun 17, 2024

I don't necessarily agree it's a bad idea; the summary is very useful even if the summary is not always correct. The summary is a nice-to-have; we still have the original output and the original exit code. We can fix the summary to handle these cases as they arise.

@ahelwer
Copy link
Contributor Author

ahelwer commented Jun 17, 2024

In this case here is the failure (which I had to find by downloading the full text of the CI log and searching through it for the keyword "FAILED", significantly less easy than looking at a summary):

2024-06-17T21:23:10.4138636Z [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 4, Class: tlc2.tool.liveness.SymmetryTableauLiveCheckTest
2024-06-17T21:23:10.4400421Z [junit] Test tlc2.tool.liveness.SymmetryTableauLiveCheckTest FAILED

@lemmy
Copy link
Member

lemmy commented Jun 17, 2024

"This script parses the Ant unit test output and identifies tests which have failed. It then retrieves their error messages from the surefire runner log directory."

In other words, this comment is factually incorrect. More importantly, it raises concerns about our reliance on a script that may or may not detect test failures.

@ahelwer
Copy link
Contributor Author

ahelwer commented Jun 17, 2024

Markus I feel like you're being unnecessarily unfriendly today and would kindly request you show more grace.

@Calvin-L
Copy link
Collaborator

parse-unit-test-output.py failed to identify the test failure in https://github.com/tlaplus/tlaplus/actions/runs/9554885211/job/26336900210?pr=953. [...] it raises concerns about our reliance on a script that may or may not detect test failures.

This is an unfairly harsh assessment of something that is, in my estimation, a strict upgrade to the CI job.

parse-unit-test-output.py is not responsible for identifying or detecting test failures, only showing the output of failing tests and summarizing some statistics.

It's true that parse-unit-test-output.py has a bug, so it did not show the output of the failing test in the linked run---but ant ... test alone doesn't show test output either, so the script didn't lose any information.

Furthermore, the exit code of ant ... test is preserved unaltered, so the key safety property

test_failure ~> ci_failure

holds with or without parse-unit-test-output.py as one of the build steps. In fact, that property would hold even if we replace the script with the worst possible implementation (print(random_text()); exit(random_int());). In other words, that script can only improve the safety of the CI job.

@lemmy
Copy link
Member

lemmy commented Jun 19, 2024

@Calvin-L Thank you for taking the time to check that the interception of exit values with pipefail and passing it between steps worked reliably. I didn't have time to check it myself.

While parse-unit-test-output.py did offer new features beyond the basic CI, it added complexity by parsing potentially unstable output, introducing Python as a second language, and not using GitHub Action workflow best practices, i.e., (portable) if: always() that ensures a step runs after a previous one fails. #954 demonstrates a simpler, more robust, and standard way to achieve similar if not better functionality. In general, I'd like to encourage reuse of 3rd party libraries to keep PRs small and our codebase manageable.

@ahelwer
Copy link
Contributor Author

ahelwer commented Jun 19, 2024

That is appreciated, and I am glad we have found a more robust alternative (and probably would have happily worked on it myself) but please do not rewrite history: until Calvin mentioned he appreciated the features of the script the linked PR was a simple deletion of the script from the repo.

@lemmy
Copy link
Member

lemmy commented Jul 10, 2024

Before this change, you could insert System.exit(-1) anywhere in the codebase and see which tests failed (when passing the argument -Dtest.halt=false to ant). However, with this PR, the test suite will now abort immediately after the first test failure, displaying the following message:

BUILD FAILED
.../tlatools/org.lamport.tlatools/customBuild.xml:535: Context loader has not been reset

@lemmy
Copy link
Member

lemmy commented Jul 11, 2024

@ahelwer I suggest bringing back the previous sequential test execution if restoring the original behavior is not possible.

@ahelwer
Copy link
Contributor Author

ahelwer commented Jul 11, 2024

Will take a look

@ahelwer
Copy link
Contributor Author

ahelwer commented Jul 12, 2024

Findings running on Arch Linux with openjdk 22 and Ant 1.10.14:

Execution mode test.halt Behavior when injecting System.exit(-1)
Serial undefined Tests register error (not failure) and continue
Serial false Tests register error (not failure) and continue
Serial true Tests register error (not failure) then halt
Parallel undefined Tests register error (not failure) and continue
Parallel false Tests register error (not failure) and continue
Parallel true Tests register error (not failure) then halt once all concurrently-running tests have completed

So unfortunately I cannot reproduce this behavior, are you running on macOS? What is your openjdk and Ant version?

@lemmy
Copy link
Member

lemmy commented Jul 20, 2024

-> % git diff src/tlc2/tool/liveness/LiveCheck.java
diff --git a/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java b/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
index 28660a2ff..348636518 100644
--- a/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
+++ b/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
@@ -668,6 +668,9 @@ public class LiveCheck implements ILiveCheck {
                                        return;
                                }

+                               if (nodes.length == 2) {
+                                       System.exit(-1);
+                               }
                                final int alen = oos.getCheckAction().length;

                                // See node0.addTransition(..) of previous case for what the
-> % ant -d -f customBuild.xml info test  > out.txt
    [junit] Test pcal.CCallReturn1Test FAILED (crashed)
    [junit] Test pcal.CallReturn1Test FAILED (crashed)
    [junit] Test pcal.CMultiprocDefineTest FAILED (crashed)

BUILD FAILED
/Users/markus/src/TLA/tla/tlatools/org.lamport.tlatools/customBuild.xml:535: Context loader has not been reset
	at org.apache.tools.ant.AntClassLoader.setThreadContextLoader(AntClassLoader.java:446)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.logVmExit(JUnitTask.java:1898)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.logVmCrash(JUnitTask.java:1870)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1302)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:1037)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.oneJunitThread(JUnitTask.java:941)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.access$000(JUnitTask.java:137)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask$JunitTestThread.run(JUnitTask.java:895)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Total time: 49 seconds

out.txt

@ahelwer
Copy link
Contributor Author

ahelwer commented Jul 20, 2024

Unfortunately I still can't reproduce this. Running on arm64 macOS with:

openjdk 21.0.2 2024-01-16 LTS
OpenJDK Runtime Environment Zulu21.32+17-CA (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Zulu21.32+17-CA (build 21.0.2+13-LTS, mixed mode, sharing)

and

Apache Ant(TM) version 1.10.14 compiled on August 16 2023

which seems to mostly match the versions you are using.

When I run the tests I see a number of crashes:

[junit] Running pcal.CallReturn1Test in thread 2
[junit] Running pcal.CEither1Test in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 1, Class: pcal.CEither1Test
[junit] Test pcal.CEither1Test FAILED (crashed)
[junit] Running pcal.CallReturn2Test in thread 7
[junit] Running pcal.DetlefSpecTest in thread 1
[junit] Running pcal.CMultiprocDefineTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 5, Class: pcal.CMultiprocDefineTest
[junit] Test pcal.CMultiprocDefineTest FAILED (crashed)
[junit] Running pcal.Dijkstra1Test in thread 5
[junit] Running pcal.CallReturn1Test in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 2, Class: pcal.CallReturn1Test
[junit] Test pcal.CallReturn1Test FAILED (crashed)

but the test execution continues without the exception you saw.

Searching "Context loader has not been reset" only brings up the place in the source code where the exception is thrown, nobody else has hit this issue. Does the issue happen every time on your computer? Do you have access to another computer you can try to reproduce the issue on? I could certainly believe this is a weird concurrency bug in Ant.

Probably most importantly, does the problem occur when you run with -Dthreads=1? If so, that could be an adequate workaround for the occasions where you want to run tests with a system crash injected.

@Calvin-L
Copy link
Collaborator

Calvin-L commented Jul 20, 2024

Ant 1.10.14 has some changes related to System.exit() and SecurityManager:

When using Java 18 or higher, Ant will no longer use Java SecurityManager [...] Furthermore, when using Java 18 or higher, if the build executes tasks that call "java.lang.System.exit()" and if those tasks aren't running in a forked VM of their own, then such tasks will now kill the entire Ant build process.

The release notes are a little poorly worded, but I bet that Ant < 1.10.14 + Java >= 18 + System.exit() was not a supported combination (forked task or not). Starting with Ant 1.10.14, forked tasks like the TLA+ unit tests should work properly even if they invoke System.exit().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DevEnvironment Everything related to the Toolbox development environment enhancement Lets change things for the better refactoring Steadily improving code base
Development

Successfully merging this pull request may close these issues.

Feature proposal: parallelize tlatools unit tests
3 participants