-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erik's mods to EWD998 from class on 2022-07-12 #7
base: 072022
Are you sure you want to change the base?
Conversation
EWD998.tla
Outdated
\* /\ WF_vars(Next) | ||
/\ \A i \in Node: WF_vars(PassToken(i) \/ InitiateToken(i)) | ||
|
||
/\ (\A i \in Node: WF_vars(PassToken(i) \/ InitiateToken(i) \/ Terminate(i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if we should enforce Terminate
here, as our termination algorithm should ideally be safe and live even for a system that never terminates? But realistically a valid computation task should terminate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also thinking if WF termination might be too strong as it might ask the state chain to terminate sooner?
Primarily closely read the paper's description of the algorithm and made changes to match the spec to the text description. One of the biggest changes was that nodes get marked as tainted when they RECEIVE a message, not when they SEND a message. Whoops! There were also minor changes necessary to match the token's initial state (I think we previously marked it tainted at the start if the initiator is tainted, but that's not right, the token always starts untained and the initiator's taint status gets checked when the token makes its way back around). I also had to make small adjustments to the ATD spec, most notably to change the WF criteria to only check for Terminate, instead of Next. Not sure if that's "cheating", but I think it makes sense based on my understanding: as with EWD998, we don't want to force nodes to send/rcv messages indefinitely, we just want to ensure that we detect termination. I validated this up to 4 nodes. Note that it took about 2-3 hours to validate on a 2019 MBP for 4 nodes, generate 10M states and exploring over 350M. Below is the output for the following config file: ``` SPECIFICATION Spec INVARIANT TypeOK CONSTANT N = 4 CONSTRAINT Constraint CONSTANT Init <- MCInit ALIAS Alias INVARIANT IInv PROPERTY ATDSpec ``` ``` /Library/Java/JavaVirtualMachines/jdk11.0.13.8-msft.jdk/Contents/Home/bin/java -XX:+UseParallelGC -cp /Users/ekrogen/.vscode/extensions/alygin.vscode-tlaplus-nightly-2022.5.2815/tools/tla2tools.jar -Dtlc2.TLC.ide=vscode tlc2.TLC EWD998.tla -tool -modelcheck -config EWD998.cfg -deadlock -noTE TLC2 Version 2.18 of Day Month 20?? (rev: 6ee6e9f) Running breadth-first search Model-Checking with fp 57 and seed 3965354632801637035 with 1 worker on 16 cores with 7282MB heap and 64MB offheap memory [pid: 78154] (Mac OS X 12.4 x86_64, Microsoft 11.0.13 x86_64, MSBDiskFPSet, DiskStateQueue). Starting SANY... Parsing file /Users/ekrogen/dev/tlaplus-workshop/EWD998.tla ... Starting... (2022-07-12 18:06:50) Implied-temporal checking--satisfiability problem has 1 branches. .... Finished computing initial states: 16 distinct states generated at 2022-07-12 18:06:50. Checking temporal properties for the current state space with 2834 total distinct states at (2022-07-12 18:06:53) Finished checking temporal properties in 00s at 2022-07-12 18:06:53 ... Progress(5) at 2022-07-12 18:06:53: 90,980 states generated (90,980 s/min), 9,131 distinct states found (9,131 ds/min), 6,296 states left on queue. Checking temporal properties for the current state space with 96397 total distinct states at (2022-07-12 18:07:53) Finished checking temporal properties in 01s at 2022-07-12 18:07:55 .... Progress(26) at 2022-07-13 09:01:07: 346,168,712 states generated (4,791,239 s/min), 10,338,601 distinct states found (105,372 ds/min), 163,790 states left on queue. Progress(27) at 2022-07-13 09:02:07: 351,116,470 states generated (4,947,758 s/min), 10,434,714 distinct states found (96,113 ds/min), 110,216 states left on queue. Progress(29) at 2022-07-13 09:03:07: 355,834,008 states generated (4,717,538 s/min), 10,532,678 distinct states found (97,964 ds/min), 65,050 states left on queue. Checkpointing of run states/22-07-12-18-06-49.957 Checkpointing completed at (2022-07-13 09:04:08) Progress(32) at 2022-07-13 09:04:08: 360,659,672 states generated (4,825,664 s/min), 10,629,235 distinct states found (96,557 ds/min), 17,243 states left on queue. Progress(49) at 2022-07-13 09:04:25: 362,113,272 states generated, 10,656,918 distinct states found, 0 states left on queue. Checking temporal properties for the complete state space with 10656918 total distinct states at (2022-07-13 09:04:25) Finished checking temporal properties in 02min 36s at 2022-07-13 09:07:01 Model checking completed. No error has been found. Estimates of the probability that TLC did not check all reachable states because two distinct states had the same fingerprint: calculated (optimistic): val = 2.0E-4 based on the actual fingerprints: val = 6.6E-9 Progress(49) at 2022-07-13 09:07:02: 362,113,272 states generated (402,253 s/min), 10,656,918 distinct states found (11,838 ds/min), 0 states left on queue. 362113272 states generated, 10656918 distinct states found, 0 states left on queue. The depth of the complete state graph search is 49. The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 18 and the 95th percentile is 3). Finished in 54013024ms at (2022-07-13 09:07:02) ```
a70049b
to
4a8fa47
Compare
…Termination action and apply fairness on that
Primarily closely read the paper's description of the algorithm and made changes to match the spec to the text description. One of the biggest changes was that nodes get marked as tainted when they RECEIVE a message, not when they SEND a message. Whoops! There were also minor changes necessary to match the token's initial state (I think we previously marked it tainted at the start if the initiator is tainted, but that's not right, the token always starts untained and the initiator's taint status gets checked when the token makes its way back around).
I also had to make small adjustments to the ATD spec, most notably to change the WF criteria to only check for Terminate, instead of Next. Not sure if that's "cheating", but I think it makes sense based on my understanding: as with EWD998, we don't want to force nodes to send/rcv messages indefinitely, we just want to ensure that we detect termination.
I validated this up to 3 nodes, and am trying on 4, though it has been running for multiple hours and generated 10M+ states without finishing...