Parallelize unit tests #902

ahelwer · 2024-04-02T14:41:18Z

This was actually easier than expected (edit: it was not easier than expected). Somehow I missed that the JUnit task has a threads parameter. I guess this is a newer addition which is why not a lot of documentation mentions it. Now the unit test target will run with number of threads equal to available cores on the machine. Decent improvement, unit tests finish in less than 10 minutes now which puts them below the time of the tlaplus/examples integration tests! There were a few small changes I had to make to some tests to ensure they don't interfere with each other with operations on the states metadir.

I also added a useful python script to run during the CI after the tests; it:

Prints the total number of tests and total number of failed tests
Prints out any tests that were run multiple times by mistake
Prints out the top-20 longest-running tests and their runtimes
If any tests failed, fetches all the error info from the xml file in the surefire runner directory and prints it to console

I made the following changes to the ant build:

Added a separate sequential unit test target that runs before the parallelized unit test target, for tests which cannot be parallelized for whatever reason (exclusively bind to a system network port or something).
Ensure all unit tests run even if some fail by not passing in the -Dtest.halt=true parameter; previously the tests would stop at the first failure, which IMO is not as useful for debugging. This necessitated adding a conditional fail target under the test targets, which activates when failureproperty is set in the junit target. This ensures the CI still fails with nonzero exit code upon any failures.
Moved the longest-running tests first so their execution is covered by the concurrent execution of other tests and doesn't singlehandedly cause the unit tests to run for an additional minute or whatever.

I ran this workflow locally on my 8-thread workstation over 10 times so I have fairly high confidence there aren't any other concurrency failures lurking.

Closes #901

Signed-off-by: Andrew Helwer <[email protected]>

tlatools/org.lamport.tlatools/customBuild.xml

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer · 2024-04-02T21:18:19Z

Used iwaitnotify to debug which test was sneaking and deleting all the metadirs:

directory=./states/tlc2.tool.liveness.CodePlexBug08EWD840FL2Test/

monitor_directory() {
    ant -f customBuild.xml compile compile-test dist test &

    # Get the PID of the program
    pid=$!
    echo $pid

    while [ ! -d "./states" ]; do
      inotifywait -e create "."
    done
    echo "\e[31Directory states created\e[0m"

    while [ ! -d "$directory" ]; do
      inotifywait -e create "./states"
    done
    echo "\e[31Directory '$directory' created\e[0m"

    # Monitor the directory for delete events
    while true; do
        inotifywait -e delete_self "$directory"
        echo "\e[31Directory '$directory' deleted, pausing program\e[0m"
        kill -STOP $pid
        echo "\e[31Program paused, press Enter to resume or Ctrl+C to terminate\e[0m"
        read -r
        kill -CONT $pid
    done
}

monitor_directory

Turns out it was tlc2.tool.simulation.SimulationWorkerTest. Never would have figured that out on my own. It deletes the entire states dir lol.

lemmy · 2024-04-02T21:47:45Z

CommonTestCase should probably set TLCGlobals.metaRoot to a directory whose name is the concatenation of the test name and a timestamp.

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer · 2024-04-02T23:46:03Z

I discovered that all the tlc2/tool/distributed tests exclusively bind to a network port, so I moved those out into a separate sequential unit test target that goes first.

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer · 2024-04-03T00:21:27Z

@lemmy @Calvin-L this is good to be merged AFAIK. There might be some nondeterministic failures lurking in there that surface under a weird execution order but I think I've caught most of them.

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer · 2024-04-03T14:50:14Z

I have now successfully run this workflow on my 8-threaded local workstation over 10 times, giving reasonable confidence that there are no additional concurrency failures lurking in the tests.

tlatools/org.lamport.tlatools/test/tlc2/tool/liveness/ModelCheckerTestCase.java

.github/scripts/parse-unit-test-output.py

tlatools/org.lamport.tlatools/customBuild.xml

lemmy

LGTM. @Calvin-L Feel free to merge as you see fit.

tlatools/org.lamport.tlatools/customBuild.xml

.github/scripts/parse-unit-test-output.py

Signed-off-by: Andrew Helwer <[email protected]>

lemmy · 2024-04-05T01:03:07Z

Dang, some tests fail when executed in parallel on my macbook m1:

[...]
    [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.685 sec, Thread: 8, Class: tlc2.TraceExpressionSpecSafetySimTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.956 sec, Thread: 3, Class: tlc2.TraceExpressionSpecRuntimeTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.069 sec, Thread: 6, Class: tlc2.TraceExpressionSpecDeadlockTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.875 sec, Thread: 7, Class: tlc2.TraceExpressionSpecSafetyBFSTest
    [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.42 sec, Thread: 9, Class: tlc2.debug.DebugTLCVariableTest
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.443 sec, Thread: 1, Class: tlc2.REPLTest
    [junit] Test tlc2.TraceExpressionSpecSafetySimTest FAILED
[...]
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.121 sec, Thread: 9, Class: tlc2.tool.suite.ETest1
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.153 sec, Thread: 7, Class: tlc2.tool.suite.ETest13
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.952 sec, Thread: 8, Class: tlc2.tool.suite.ETest14
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.399 sec, Thread: 5, Class: tlc2.tool.suite.ETest11
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.339 sec, Thread: 6, Class: tlc2.tool.suite.ETest12
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.492 sec, Thread: 3, Class: tlc2.tool.suite.ETest10
    [junit] Running tlc2.tool.suite.ETest15 in thread 9
    [junit] Test tlc2.tool.suite.ETest13 FAILED
    [junit] Test tlc2.tool.suite.ETest14 FAILED
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.714 sec, Thread: 10, Class: tlc2.tool.simulation.NQSpecTest
    [junit] Running tlc2.tool.suite.ETest16 in thread 7
    [junit] Running tlc2.tool.suite.ETest2 in thread 8
    [junit] Running tlc2.tool.suite.ETest3 in thread 6
    [junit] Running tlc2.tool.suite.ETest4 in thread 3
    [junit] Running tlc2.tool.suite.ETest5 in thread 5
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.476 sec, Thread: 9, Class: tlc2.tool.suite.ETest15
    [junit] Test tlc2.tool.suite.ETest15 FAILED
    [junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.784 sec, Thread: 4, Class: tlc2.tool.simulation.SimulatorTest
    [junit] Running tlc2.tool.suite.ETest6 in thread 10
    [junit] Running tlc2.tool.suite.ETest7 in thread 9
    [junit] Running tlc2.tool.suite.ETest8 in thread 4
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.175 sec, Thread: 7, Class: tlc2.tool.suite.ETest16
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.2 sec, Thread: 8, Class: tlc2.tool.suite.ETest2
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.125 sec, Thread: 5, Class: tlc2.tool.suite.ETest5
    [junit] Running tlc2.tool.suite.Test1 in thread 5
    [junit] Running tlc2.tool.suite.ETest9 in thread 7
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.956 sec, Thread: 9, Class: tlc2.tool.suite.ETest7
    [junit] Running tlc2.tool.suite.Test10 in thread 8
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.655 sec, Thread: 6, Class: tlc2.tool.suite.ETest3
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.651 sec, Thread: 3, Class: tlc2.tool.suite.ETest4
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.338 sec, Thread: 10, Class: tlc2.tool.suite.ETest6
    [junit] Test tlc2.tool.suite.ETest3 FAILED
    [junit] Test tlc2.tool.suite.ETest4 FAILED
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.208 sec, Thread: 4, Class: tlc2.tool.suite.ETest8
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.744 sec, Thread: 7, Class: tlc2.tool.suite.ETest9
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.749 sec, Thread: 5, Class: tlc2.tool.suite.Test1
    [junit] Running tlc2.tool.suite.Test11 in thread 9
    [junit] Running tlc2.tool.suite.Test12 in thread 6
    [junit] Running tlc2.tool.suite.Test13 in thread 3
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.705 sec, Thread: 8, Class: tlc2.tool.suite.Test10
    [junit] Running tlc2.tool.suite.Test14 in thread 10
    [junit] Test tlc2.tool.suite.ETest9 FAILED
    [junit] Test tlc2.tool.suite.Test1 FAILED
    [junit] Test tlc2.tool.suite.Test10 FAILED
    [junit] Running tlc2.tool.suite.Test15 in thread 4
    [junit] Running tlc2.tool.suite.Test16 in thread 7
    [junit] Running tlc2.tool.suite.Test17 in thread 5
    [junit] Running tlc2.tool.suite.Test18 in thread 8
    [junit] Tests run: 10, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.34 sec, Thread: 2, Class: tlc2.tool.simulation.SimulatorMultiThreadTest
    [junit] Test tlc2.tool.simulation.SimulatorMultiThreadTest FAILED
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.48 sec, Thread: 7, Class: tlc2.tool.suite.Test16
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.015 sec, Thread: 9, Class: tlc2.tool.suite.Test11
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.016 sec, Thread: 6, Class: tlc2.tool.suite.Test12
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.793 sec, Thread: 10, Class: tlc2.tool.suite.Test14
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 2.012 sec, Thread: 3, Class: tlc2.tool.suite.Test13
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.6 sec, Thread: 4, Class: tlc2.tool.suite.Test15
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.385 sec, Thread: 8, Class: tlc2.tool.suite.Test18
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.486 sec, Thread: 5, Class: tlc2.tool.suite.Test17
    [junit] Running tlc2.tool.suite.Test2 in thread 2
    [junit] Tests run: 12, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 7.522 sec, Thread: 1, Class: tlc2.tool.simulation.SimulationWorkerTest
    [junit] Test tlc2.tool.simulation.SimulationWorkerTest FAILED
    [junit] Test tlc2.tool.suite.Test17 FAILED
    [junit] Test tlc2.tool.suite.Test15 FAILED
    [junit] Test tlc2.tool.suite.Test12 FAILED
    [junit] Test tlc2.tool.suite.Test13 FAILED
    [junit] Test tlc2.tool.suite.Test18 FAILED
    [junit] Test tlc2.tool.suite.Test14 FAILED
    [junit] Test tlc2.tool.suite.Test11 FAILED
    [junit] Test tlc2.tool.suite.Test16 FAILED
    [junit] Running tlc2.tool.suite.Test20 in thread 1
    [junit] Running tlc2.tool.suite.Test207 in thread 9
    [junit] Tests run: 1, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.53 sec, Thread: 2, Class: tlc2.tool.suite.Test2
    [junit] Running tlc2.tool.suite.Test201 in thread 5
    [junit] Running tlc2.tool.suite.Test202 in thread 4
    [junit] Running tlc2.tool.suite.Test208 in thread 7
    [junit] Running tlc2.tool.suite.Test205 in thread 8
    [junit] Running tlc2.tool.suite.Test203 in thread 6
    [junit] Running tlc2.tool.suite.Test204 in thread 3
    [junit] Running tlc2.tool.suite.Test206 in thread 10
    [junit] Test tlc2.tool.suite.Test2 FAILED
[...]

lemmy · 2024-04-05T01:03:51Z

FWIW: Execution time is down from ~12 minutes to ~3 minutes.

ahelwer · 2024-04-05T01:54:24Z

Drats! Impressive execution time though. Okay I'll try executing them on my M1 macbook tomorrow

lemmy · 2024-04-05T02:00:58Z

If you touch this again, it would also be useful to support limiting the parallelism (up to a single thread) to not consume all system resources. For example, I typically launch the test suite and then tend to other work.

ahelwer · 2024-04-05T13:35:56Z

@lemmy can't reproduce on my m1 macbook unfortunately. It runs a bit slower than yours, completing in about 4 minutes. Ran it about five times without failures.

Would you be able to run the following on your mini then paste the output here? The python script summary should give a good head start on debugging this:

ant -f tlatools/org.lamport.tlatools/customBuild.xml compile compile-test dist test 2>&1 | tee test-output.txt
python .github/scripts/parse-unit-test-output.py test-output.txt tlatools/org.lamport.tlatools/target/surefire-reports/

One weird thing I noticed, tlc2.tool.fp.OffHeapDiskFPSetTest ran much slower on my m1 macbook than my 10-year-old workstation. Took 141.3 seconds to finish on my macbook vs. 70 seconds on my workstation and the github CI.

lemmy · 2024-04-05T17:56:39Z

All tests pass if there is no other activity on machine. Could be that I ran TLC from VSCode yesterday while running the suite in the background.

What's the OS of your workstation?

ahelwer · 2024-04-05T18:04:34Z

Workstation is running linux

Calvin-L · 2024-04-05T22:22:55Z

Just to add another point of data: the tests run successfully on my 10-core M1 laptop (MacOS 14.4) in 2m 16s. I haven't tried running them with significant background activity.

ahelwer · 2024-04-05T22:24:55Z

How long did it take to run the tlc2.tool.fp.OffHeapDiskFPSetTest test on your laptop?

Calvin-L · 2024-04-05T22:33:35Z

Looks like 1m 22s:

    [junit] Tests run: 21, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 82.953 sec, Thread: 4, Class: tlc2.tool.fp.OffHeapDiskFPSetTest

lemmy · 2024-06-17T21:42:52Z

It is a bad idea to parse Ant's output. parse-unit-test-output.py failed to identify the test failure in https://github.com/tlaplus/tlaplus/actions/runs/9554885211/job/26336900210?pr=953.

ahelwer · 2024-06-17T21:55:47Z

I don't necessarily agree it's a bad idea; the summary is very useful even if the summary is not always correct. The summary is a nice-to-have; we still have the original output and the original exit code. We can fix the summary to handle these cases as they arise.

ahelwer · 2024-06-17T22:01:21Z

In this case here is the failure (which I had to find by downloading the full text of the CI log and searching through it for the keyword "FAILED", significantly less easy than looking at a summary):

2024-06-17T21:23:10.4138636Z [junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 4, Class: tlc2.tool.liveness.SymmetryTableauLiveCheckTest
2024-06-17T21:23:10.4400421Z [junit] Test tlc2.tool.liveness.SymmetryTableauLiveCheckTest FAILED

lemmy · 2024-06-17T22:10:03Z

"This script parses the Ant unit test output and identifies tests which have failed. It then retrieves their error messages from the surefire runner log directory."

In other words, this comment is factually incorrect. More importantly, it raises concerns about our reliance on a script that may or may not detect test failures.

ahelwer · 2024-06-17T22:17:00Z

Markus I feel like you're being unnecessarily unfriendly today and would kindly request you show more grace.

Calvin-L · 2024-06-18T21:19:14Z

parse-unit-test-output.py failed to identify the test failure in https://github.com/tlaplus/tlaplus/actions/runs/9554885211/job/26336900210?pr=953. [...] it raises concerns about our reliance on a script that may or may not detect test failures.

This is an unfairly harsh assessment of something that is, in my estimation, a strict upgrade to the CI job.

parse-unit-test-output.py is not responsible for identifying or detecting test failures, only showing the output of failing tests and summarizing some statistics.

It's true that parse-unit-test-output.py has a bug, so it did not show the output of the failing test in the linked run---but ant ... test alone doesn't show test output either, so the script didn't lose any information.

Furthermore, the exit code of ant ... test is preserved unaltered, so the key safety property

test_failure ~> ci_failure

holds with or without parse-unit-test-output.py as one of the build steps. In fact, that property would hold even if we replace the script with the worst possible implementation (print(random_text()); exit(random_int());). In other words, that script can only improve the safety of the CI job.

lemmy · 2024-06-19T01:45:19Z

@Calvin-L Thank you for taking the time to check that the interception of exit values with pipefail and passing it between steps worked reliably. I didn't have time to check it myself.

While parse-unit-test-output.py did offer new features beyond the basic CI, it added complexity by parsing potentially unstable output, introducing Python as a second language, and not using GitHub Action workflow best practices, i.e., (portable) if: always() that ensures a step runs after a previous one fails. #954 demonstrates a simpler, more robust, and standard way to achieve similar if not better functionality. In general, I'd like to encourage reuse of 3rd party libraries to keep PRs small and our codebase manageable.

ahelwer · 2024-06-19T02:13:26Z

That is appreciated, and I am glad we have found a more robust alternative (and probably would have happily worked on it myself) but please do not rewrite history: until Calvin mentioned he appreciated the features of the script the linked PR was a simple deletion of the script from the repo.

lemmy · 2024-07-10T22:09:08Z

Before this change, you could insert System.exit(-1) anywhere in the codebase and see which tests failed (when passing the argument -Dtest.halt=false to ant). However, with this PR, the test suite will now abort immediately after the first test failure, displaying the following message:

BUILD FAILED
.../tlatools/org.lamport.tlatools/customBuild.xml:535: Context loader has not been reset

lemmy · 2024-07-11T16:41:38Z

@ahelwer I suggest bringing back the previous sequential test execution if restoring the original behavior is not possible.

ahelwer · 2024-07-11T20:22:14Z

Will take a look

ahelwer · 2024-07-12T18:07:06Z

Findings running on Arch Linux with openjdk 22 and Ant 1.10.14:

Execution mode	`test.halt`	Behavior when injecting `System.exit(-1)`
Serial	undefined	Tests register error (not failure) and continue
Serial	false	Tests register error (not failure) and continue
Serial	true	Tests register error (not failure) then halt
Parallel	undefined	Tests register error (not failure) and continue
Parallel	false	Tests register error (not failure) and continue
Parallel	true	Tests register error (not failure) then halt once all concurrently-running tests have completed

So unfortunately I cannot reproduce this behavior, are you running on macOS? What is your openjdk and Ant version?

lemmy · 2024-07-20T00:59:57Z

-> % git diff src/tlc2/tool/liveness/LiveCheck.java
diff --git a/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java b/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
index 28660a2ff..348636518 100644
--- a/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
+++ b/tlatools/org.lamport.tlatools/src/tlc2/tool/liveness/LiveCheck.java
@@ -668,6 +668,9 @@ public class LiveCheck implements ILiveCheck {
                                        return;
                                }

+                               if (nodes.length == 2) {
+                                       System.exit(-1);
+                               }
                                final int alen = oos.getCheckAction().length;

                                // See node0.addTransition(..) of previous case for what the

-> % ant -d -f customBuild.xml info test  > out.txt
    [junit] Test pcal.CCallReturn1Test FAILED (crashed)
    [junit] Test pcal.CallReturn1Test FAILED (crashed)
    [junit] Test pcal.CMultiprocDefineTest FAILED (crashed)

BUILD FAILED
/Users/markus/src/TLA/tla/tlatools/org.lamport.tlatools/customBuild.xml:535: Context loader has not been reset
	at org.apache.tools.ant.AntClassLoader.setThreadContextLoader(AntClassLoader.java:446)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.logVmExit(JUnitTask.java:1898)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.logVmCrash(JUnitTask.java:1870)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.executeAsForked(JUnitTask.java:1302)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.execute(JUnitTask.java:1037)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.oneJunitThread(JUnitTask.java:941)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask.access$000(JUnitTask.java:137)
	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTask$JunitTestThread.run(JUnitTask.java:895)
	at java.base/java.lang.Thread.run(Thread.java:1570)

Total time: 49 seconds

out.txt

ahelwer · 2024-07-20T18:01:29Z

Unfortunately I still can't reproduce this. Running on arm64 macOS with:

openjdk 21.0.2 2024-01-16 LTS
OpenJDK Runtime Environment Zulu21.32+17-CA (build 21.0.2+13-LTS)
OpenJDK 64-Bit Server VM Zulu21.32+17-CA (build 21.0.2+13-LTS, mixed mode, sharing)

and

Apache Ant(TM) version 1.10.14 compiled on August 16 2023

which seems to mostly match the versions you are using.

When I run the tests I see a number of crashes:

[junit] Running pcal.CallReturn1Test in thread 2
[junit] Running pcal.CEither1Test in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 1, Class: pcal.CEither1Test
[junit] Test pcal.CEither1Test FAILED (crashed)
[junit] Running pcal.CallReturn2Test in thread 7
[junit] Running pcal.DetlefSpecTest in thread 1
[junit] Running pcal.CMultiprocDefineTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 5, Class: pcal.CMultiprocDefineTest
[junit] Test pcal.CMultiprocDefineTest FAILED (crashed)
[junit] Running pcal.Dijkstra1Test in thread 5
[junit] Running pcal.CallReturn1Test in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec, Thread: 2, Class: pcal.CallReturn1Test
[junit] Test pcal.CallReturn1Test FAILED (crashed)

but the test execution continues without the exception you saw.

Searching "Context loader has not been reset" only brings up the place in the source code where the exception is thrown, nobody else has hit this issue. Does the issue happen every time on your computer? Do you have access to another computer you can try to reproduce the issue on? I could certainly believe this is a weird concurrency bug in Ant.

Probably most importantly, does the problem occur when you run with -Dthreads=1? If so, that could be an adequate workaround for the occasions where you want to run tests with a system crash injected.

Calvin-L · 2024-07-20T22:15:16Z

Ant 1.10.14 has some changes related to System.exit() and SecurityManager:

When using Java 18 or higher, Ant will no longer use Java SecurityManager [...] Furthermore, when using Java 18 or higher, if the build executes tasks that call "java.lang.System.exit()" and if those tasks aren't running in a forked VM of their own, then such tasks will now kill the entire Ant build process.

The release notes are a little poorly worded, but I bet that Ant < 1.10.14 + Java >= 18 + System.exit() was not a supported combination (forked task or not). Starting with Ant 1.10.14, forked tasks like the TLA+ unit tests should work properly even if they invoke System.exit().

ahelwer added 3 commits April 2, 2024 10:39

Parallelize unit tests

0ec6ca6

Signed-off-by: Andrew Helwer <[email protected]>

Use tabs instead of spaces

affe352

Signed-off-by: Andrew Helwer <[email protected]>

Define metadir in ModelCheckerTest

08ff8d0

Signed-off-by: Andrew Helwer <[email protected]>

lemmy reviewed Apr 2, 2024

View reviewed changes

tlatools/org.lamport.tlatools/customBuild.xml Outdated Show resolved Hide resolved

Put metadir under states directory

6b0cdbc

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer added 2 commits April 2, 2024 18:18

Add script to summarize unit tests, don't delete states dir

53c0855

Signed-off-by: Andrew Helwer <[email protected]>

Fixed script and workflow

2156654

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer added 3 commits April 2, 2024 19:47

Add sequential unit test target

24b651f

Signed-off-by: Andrew Helwer <[email protected]>

idk why that didn't work, bash is a mystery

f9cd015

Signed-off-by: Andrew Helwer <[email protected]>

Added missing classpaths

0d3ae9e

Signed-off-by: Andrew Helwer <[email protected]>

ahelwer marked this pull request as ready for review April 3, 2024 00:20

Detect duplicate tests, run longest tests first

3579f0e

Signed-off-by: Andrew Helwer <[email protected]>

lemmy reviewed Apr 3, 2024

View reviewed changes

tlatools/org.lamport.tlatools/test/tlc2/tool/liveness/ModelCheckerTestCase.java Show resolved Hide resolved

lemmy reviewed Apr 3, 2024

View reviewed changes

.github/scripts/parse-unit-test-output.py Outdated Show resolved Hide resolved

lemmy reviewed Apr 3, 2024

View reviewed changes

tlatools/org.lamport.tlatools/customBuild.xml Show resolved Hide resolved

lemmy added enhancement Lets change things for the better refactoring Steadily improving code base DevEnvironment Everything related to the Toolbox development environment labels Apr 3, 2024

lemmy approved these changes Apr 3, 2024

View reviewed changes

tlatools/org.lamport.tlatools/customBuild.xml Show resolved Hide resolved

.github/scripts/parse-unit-test-output.py Outdated Show resolved Hide resolved

.github/scripts/parse-unit-test-output.py Show resolved Hide resolved

Added comment about how tests could be run multiple times

7608cc2

Signed-off-by: Andrew Helwer <[email protected]>

lemmy added this to the 1.8.0 milestone Apr 4, 2024

Calvin-L merged commit 704bd4d into tlaplus:master Apr 4, 2024
5 checks passed

lemmy mentioned this pull request Jun 17, 2024

Simple, robust, fail-fast unit test invocation. #954

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize unit tests #902

Parallelize unit tests #902

ahelwer commented Apr 2, 2024 •

edited

Loading

ahelwer commented Apr 2, 2024 •

edited

Loading

lemmy commented Apr 2, 2024

ahelwer commented Apr 2, 2024

ahelwer commented Apr 3, 2024

ahelwer commented Apr 3, 2024 •

edited

Loading

lemmy left a comment

lemmy commented Apr 5, 2024

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024 •

edited

Loading

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024

Calvin-L commented Apr 5, 2024

ahelwer commented Apr 5, 2024

Calvin-L commented Apr 5, 2024

lemmy commented Jun 17, 2024

ahelwer commented Jun 17, 2024 •

edited

Loading

ahelwer commented Jun 17, 2024 •

edited

Loading

lemmy commented Jun 17, 2024

ahelwer commented Jun 17, 2024

Calvin-L commented Jun 18, 2024

lemmy commented Jun 19, 2024

ahelwer commented Jun 19, 2024

lemmy commented Jul 10, 2024 •

edited

Loading

lemmy commented Jul 11, 2024

ahelwer commented Jul 11, 2024

ahelwer commented Jul 12, 2024

lemmy commented Jul 20, 2024 •

edited

Loading

ahelwer commented Jul 20, 2024 •

edited

Loading

Calvin-L commented Jul 20, 2024 •

edited

Loading

Parallelize unit tests #902

Parallelize unit tests #902

Conversation

ahelwer commented Apr 2, 2024 • edited Loading

ahelwer commented Apr 2, 2024 • edited Loading

lemmy commented Apr 2, 2024

ahelwer commented Apr 2, 2024

ahelwer commented Apr 3, 2024

ahelwer commented Apr 3, 2024 • edited Loading

lemmy left a comment

Choose a reason for hiding this comment

lemmy commented Apr 5, 2024

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024 • edited Loading

lemmy commented Apr 5, 2024

ahelwer commented Apr 5, 2024

Calvin-L commented Apr 5, 2024

ahelwer commented Apr 5, 2024

Calvin-L commented Apr 5, 2024

lemmy commented Jun 17, 2024

ahelwer commented Jun 17, 2024 • edited Loading

ahelwer commented Jun 17, 2024 • edited Loading

lemmy commented Jun 17, 2024

ahelwer commented Jun 17, 2024

Calvin-L commented Jun 18, 2024

lemmy commented Jun 19, 2024

ahelwer commented Jun 19, 2024

lemmy commented Jul 10, 2024 • edited Loading

lemmy commented Jul 11, 2024

ahelwer commented Jul 11, 2024

ahelwer commented Jul 12, 2024

lemmy commented Jul 20, 2024 • edited Loading

ahelwer commented Jul 20, 2024 • edited Loading

Calvin-L commented Jul 20, 2024 • edited Loading

ahelwer commented Apr 2, 2024 •

edited

Loading

ahelwer commented Apr 2, 2024 •

edited

Loading

ahelwer commented Apr 3, 2024 •

edited

Loading

ahelwer commented Apr 5, 2024 •

edited

Loading

ahelwer commented Jun 17, 2024 •

edited

Loading

ahelwer commented Jun 17, 2024 •

edited

Loading

lemmy commented Jul 10, 2024 •

edited

Loading

lemmy commented Jul 20, 2024 •

edited

Loading

ahelwer commented Jul 20, 2024 •

edited

Loading

Calvin-L commented Jul 20, 2024 •

edited

Loading