-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtermination.tex
1021 lines (905 loc) · 66.8 KB
/
termination.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\srcnote{Text of this chapter is based on~\mcite{SB2019}.
The benchmarks are unmodified from the original paper.}
\bigskip\noindent
One of the critical problems with parallel programs is ensuring that they do not hang or wait indefinitely -- i.e., there are no deadlocks, livelocks and the program proceeds towards its goals.
In this work, we present a practical approach to detection of nonterminating sections of programs written in C or C++ and its implementation into the \divine model checker.
This complements the existing techniques for finding safety violations such as assertion failures and memory errors.
Our approach makes it possible to detect partial deadlocks and livelocks, i.e., those situations in which some of the threads are progressing normally while the others are waiting indefinitely.
The approach is also applicable to proving nontermination of components in programs that do not terminate themselves, but the components should eventually finish their work.
Such programs include, for example, server-like applications that have infinite event loops, but each event should be handled in a finite time.
The termination criteria can be user-provided; however, \divine comes with the set of built-in termination criteria suited for the analysis of programs with mutexes and other common synchronisation primitives.
\section{Motivation and Introduction}
A significant limitation of many existing tools for analysis of parallel
programs in programming languages such as C and C++ is that they are only concerned with safety checking -- they check that a bad state of the program is unreachable.
Most common examples of bad states include assertion failures and memory errors (such as invalid memory accesses and memory leaks).
Unfortunately, this is far from being sufficient in practice.
See, for example, the code given in~\autoref{fig:livelock0}.
That piece of code easily passes any safety checks; however, when executed in reality, it often hangs and does not terminate.
\begin{figure}[t]
\begin{minipage}[t]{0.49\textwidth}
\begin{cppcode}
// can be used for
// synchronisation
std::atomic< int > x = 0;
void worker() {
while ( x != 0 ) { } // wait
do_work();
}
\end{cppcode}
\end{minipage}
\hfill%
\begin{minipage}[t]{0.49\textwidth}
\begin{cppcode}
int main() {
// start thread
// running worker
std::thread t( worker );
x = 42; // let worker run
// ...
t.join();
}
\end{cppcode}
\end{minipage}
\caption{
A simple C++ code snippet with two threads, it uses C++ standard threading support and atomic variables.
A programmer's intention was that the \texttt{worker} function first waits until \texttt{x}
becomes non-zero, and then proceeds with \texttt{do\_work}. However, the waiting
condition (at the first line of the \texttt{worker} function) is incorrectly just the
opposite. Therefore, if \texttt{main} executes \cpp{x = 42} before waiting in \texttt{worker} starts, the wait will never end (assuming \texttt{x} is never set to 0 again).
Note that none of safety checks is able to detect that the program might hang.
For the rest of the paper, we will omit the \cpp{std::} namespace to
simplify the notation.
}\label{fig:livelock0}
\end{figure}
In this work, we report about our new technique for checking nontermination for parallel programs written in C and C++ that may be applied to programs with arbitrary synchronisation primitives.
In particular, we can check that \emph{a specified part} of a program finishes
whenever its execution has been started, which in turn enables us to check for problems such as partial deadlocks or local nontermination.
Note that our technique does not require the program under analysis to terminate at all.
Therefore, it is also applicable to programs that do not terminate but have some
parts that are supposed to finish.
It does; however, require that the program has a finite state space because our
technique is built on top of a state space exploration. Note that even for a finite state space, a program may exhibit infinite behaviour.\mnote{A finite state space can contain cyclical infinite behaviour -- a loop in the state space.
\medskip\noindent\centering
\begin{tikzpicture}[>=latex,>=stealth',auto,node distance=2cm,semithick,initial text=]
\node[state,initial] (t) {$x$};
\node[state] (f) [right of = t] {$\lnot x$};
\path[->, shorten >=1pt]
(t) edge[bend left] (f)
(f) edge[bend left] (t)
;
\end{tikzpicture}
}
The main observation is that a program often has sections which once entered
should also be left: for example critical sections, certain function calls (such
as pop from a queue, which can wait for an element to become available; or a
thread join, etc.), or parts of code which wait for a resource or an action (waiting for a mutex, waiting on a barrier, waiting until a variable is set to a given value).
If the analysis of the program focuses on such sections, it is possible to
detect when these sections are started but do not terminate.
This covers partial deadlock and partial livelock detection in which such sections participate.
We also provide a global nontermination detection mode that decides if the
program as a whole terminates, nevertheless this is not the primary goal of our approach.
Our technique is built on top of explicit-state model checking. We believe that while explicit-state model checking is prone to state space explosion, it is well suited for the detection of problems related to infinite runs of parallel programs which cannot be handled by techniques such as bounded model checking or stateless model checking.
While our approach is closely related to checking for properties written in temporal logic such as LTL or CTL*, our \emph{local nontermination} technique cannot be substituted equivalently with CTL* model checking.
One of the reasons is that these logics are unable to relate to entities which are dynamically created during the execution of the program, and there is no bound to their number.
For example, there is no way to express in CTL* that for all mutexes it holds that if they are locked, they are also eventually unlocked unless all the mutexes are enumerated beforehand.
This is an essential concern for realistic programs where mutexes and other synchronisation primitives can be created dynamically at runtime, and their number can depend on the computation of the program itself.
Furthermore, to avoid counterexamples which are unrealistic with practical
thread schedulers, we need a form of \emph{fairness} of process scheduling
different from the fairness constraints used typically with LTL model checking.
The approach described in this work is implemented in a modified version of the \divine model checker~\mcite{DIVINEToolPaper2017,RSCB2018}.
The implementation, as well as all the examples, can be found on the paper webpage\mnote{\url{https://divine.fi.muni.cz/2019/lnterm/}}.
% The rest of the work is structured as follows: \autoref{sec:related} gives a short
% overview of related work and \autoref{sec:prelim} gives definitions and preliminaries needed for the rest of the work.
% In \autoref{sec:local-nontermination} we define our \emph{local nontermination} property, in \autoref{sec:detection} we discuss how it can be checked and the implementation in \divine, and it \autoref{sec:evaluation} we evaluate it.
% Finally, \autoref{sec:conclusion} concludes this work.
\section{Resource Sections}
A \emph{resource section} of a program is a block of code with an identifier of a resource and type of the resource section.
Each resource section is delimited in the source code by section start and section end annotations.
Examples of such sections are a mutex-waiting section that denotes a block of code in which a thread is waiting for the acquisition of a mutex.
Mutex-waiting section is identified by a mutex and the thread which waits for it.
Another example can be a critical section, which is identified by a mutex (there is no need to use a thread for the identification, as a mutex can be owned by at most one thread at any point in time).
Resource section can also be bound to a function -- in this case, it is identified by the stack frame of the function and by the program counter of its beginning.
Regardless of the identification, the idea for a resource section is that once it is entered, it should also be exited.
As a resource section can be entered repeatedly (for example when it is on a cycle or in a function which is called multiple times), we will define a \emph{resource section instance} to be a particular execution of a resource section with the given identifier.
The author of annotations which define resource sections should ensure that the same resource section is not entered again before it is left.
This does not limit the usage of function-associated resource sections to non-recursive functions -- each such section is also identified by the stack frame, and therefore resource sections corresponding to different recursion depths are different resource sections.
Similarly, a program can be in multiple resource sections which wait for the same mutex at the same time, each of them corresponding to a different waiting thread.
We propose to introduce resource sections in two ways -- some synchronisation primitives (mutexes, condition variable, threads) have built-in resource sections in DIVINE, and the user can introduce their own resource sections.
Examples of resource sections can be seen in \autoref{fig:lnterm:secs:a} and \autoref{fig:lnterm:secs:b}.
\begin{figure}[t]
\begin{cppcodeln}
mutex m;
void worker() {
unique_lock lock(m); // wait resource section,
// start of critical section
do_work();
} // unlock = end of critical section
int main() {
thread w(worker);
do_work();
w.join(); // wait resource section
}
\end{cppcodeln}
\vspace{-\bigskipamount}
\caption{
Examples of predefined resource sections.
There is a resource section in the mutex lock on line 4 (active while the
thread waits for the mutex to become available), a resource sections
which corresponds to the critical section on lines 4--6, and a resource
section in the join of thread \texttt{w} on line 12 (active while main
waits for \texttt{worker} to finish).
}\label{fig:lnterm:secs:a}
\end{figure}
\begin{widefigure}
\begin{cppcodeln}
#include <atomic>
#include <rst/termsec.h> // for user-defined
// resource sections
struct SpinLock {
void lock() {
{ // define an explicit scope
termsec::CheckWait check( &_flag );
while ( _flag.test_and_set() ) { }
} // end of scope -- end of 'check' resource section
termsec_begin_exclusive( &_flag );
}
void unlock() {
termsec_end_exclusive( &_flag );
_flag.clear();
}
private:
std::atomic_flag _flag = ATOMIC_FLAG_INIT;
};
\end{cppcodeln}
\vspace{-\bigskipamount}
\caption[UNUSED]{
User defined resource sections in a spinlock implementation.
The spinlock is implemented using a C++ atomic flag which is an atomic
variable which has two operations -- \texttt{clear} which resets its
value to \cpp{false}, and \texttt{test\_and\_set} which sets its value to
\cpp{true} and returns the original value.
In this spinlock the value \cpp{true} indicates that the spinlock is
locked.
One resource section guards the wait for \texttt{\_flag} to change at
lines 7--8.
The other resource section guards the critical section of the spinlock,
it starts in \texttt{lock} at line 10 and ends in \texttt{unlock} at line
14.
Both resource sections are identified by the address of the
\texttt{\_flag} variable which uniquely identifies a particular instance
of a \texttt{SpinLock} class.
Furthermore, the waiting resource section (which uses \texttt{CheckWait}
helper) is automatically also identified by the thread which executes it
-- this allows more than one thread to wait for the same spin lock.
}\label{fig:lnterm:secs:b}
\end{widefigure}
\section{Local Nontermination} \label{sec:local-nontermination}
With our local nontermination property, we aim at detection of resource section instances which are entered but are never left -- \emph{nonterminating resource section instances}.
We will first use examples of terminating and nonterminating resource section instances, and then we will define them precisely.
\begin{widefigure}
\begin{subfigure}[t]{0.475\textwidth}
\begin{cppcode}
mutex m;
void thread0() {
// Error:
unique_lock lock(m);
while (true) {
do_work();
}
} // unlock
void thread1() {
while (true) {
unique_lock lock(m);
do_other_work();
} // unlock
}
\end{cppcode}
\vspace{-\medskipamount}
\caption{
A program with a nonterminating critical section (in \texttt{thread0}) and a deadlock (if \texttt{thread0} enters its critical section, \texttt{thread1} will wait infinitely).
In C++ it is possible to use scope-based locks: the critical section belonging to mutex \cpp{m} is entered when \cpp{unique_lock lock(m)} is executed and left at the end of the scope in which the \cpp{lock} variable was defined (at the matching curly brace; also marked with comment \cpp{// unlock}).
}\label{fig:ntermsec0}
\end{subfigure}
%
\hfill
%
\begin{subfigure}[t]{0.475\textwidth}
\begin{cppcode}
mutex m;
void thread0() {
// Fixed:
while (true) {
unique_lock lock(m);
do_work();
} // unlock
}
void thread1() {
while (true) {
unique_lock lock(m);
do_other_work();
} // unlock
}
\end{cppcode}
\vspace{-\medskipamount}
\caption{
A fixed version of the program from \autoref{fig:ntermsec0} (the start of the critical section was moved from the position \cpp{// Error} in the left code to \cpp{// Fixed} and therefore the critical section can end now).
Intuitively, each critical section in this program terminates.
However, as we can see in \autoref{fig:ntermsec1ce}, it is possible to find an infinite path in the state space of this program that infinitely waits for one of the critical sections.
To make matters worse, this path can respect weak fairness.
} \label{fig:ntermsec1}
\end{subfigure}
\caption{Example programs with nonterminating resource section (\subref{fig:ntermsec0}) and terminating resource sections (\subref{fig:ntermsec1}).
}
\end{widefigure}
\begin{widefigure}
\center
\begin{tikzpicture}[node distance=2.3cm,>=stealth',bend angle=45,auto, baseline=(i.base), line width = 2pt]
\tikzstyle{state}=[circle, thick, draw, minimum size=6mm]
\tikzstyle{endcheck}=[circle, thick, draw, cross out, minimum size=3mm]
\node (i) {};
\node [state, right = of i] (lt0) {}
edge [pre, draw=red] node[yshift=-1mm] {\texttt{0: lock(m)}} (i);
\node [state, right = of lt0] (lt0wt1) {}
edge [pre, draw=blue, dash pattern=on 5pt off 5pt] node[yshift=-1mm] {\texttt{1: wait(m)}} (lt0);
\node [state, right = of lt0wt1] (lt0wt1ut0) {}
edge [pre, draw=red] node[yshift=-1mm] {\texttt{0: unlock(m)}} (lt0wt1);
\node [state, above = 1cm of lt0wt1] (lt0wt1ut0work) {}
edge [pre, draw=red, out=0, in=90] node {\texttt{0: do\_work()}} (lt0wt1ut0)
edge [post, draw=red, out=180, in=90] node[above left] {\texttt{0: lock(m)}} (lt0);
\node [right = of lt0wt1ut0] (out) {}
edge [pre, draw=blue, dashed, line width = 0.8pt] node[yshift=-1mm] {\texttt{1: lock(m)}} (lt0wt1ut0);
\end{tikzpicture}
\caption{
A fragment of state space of program in \autoref{fig:ntermsec1} with starving lasso marked with bold edges.
Each edge is marked by the thread it belongs to and the action of this thread.
Furthermore, to ease the orientation, actions belonging to \texttt{thread0} are marked with continuous \textcolor{red}{red} edges while actions belonging to \texttt{thread1} are marked with dashed \textcolor{blue}{blue} edges.
We can see that both threads participate in the repeated part of the counterexample and \texttt{thread0} is denied the possibility (starves) to execute after \texttt{0:~unlock(m)} (the thin blue dashed edge).
} \label{fig:ntermsec1ce}
\end{widefigure}
A simple example can be seen in \autoref{fig:ntermsec0}. There we have a mutex which
is locked, but never unlocked as the corresponding critical section contains an
infinite loop. We have four different resource sections in this example. Two
of them corresponds to the critical sections guarded by the mutex, and two of
them are hidden inside \texttt{unique\_lock}, where they implement
waiting until the mutex is unlocked.
Nonterminating resource section instances are the instances corresponding to the critical
section in \texttt{thread0} and any instances corresponding to waiting for the mutex
in \texttt{thread1} that is executed after the critical section in
\texttt{thread0} is entered. We can fix this example by putting the critical
section in \texttt{thread0} inside the infinite loop, as shown in \autoref{fig:ntermsec1}.
Suppose that we have defined nonterminating section as one in which it is possible to stay indefinitely (i.e., for the specific case of waiting for \texttt{m} in \texttt{thread1}, termination could be expressed by LTL formula $\mathbf{G}(\textit{wait-m-t1-start} \implies \mathbf{F}\,\textit{wait-m-t1-end})$).
We can witness the existence of such nonterminating section in a program with a finite state
space by a lasso-shaped path. Such the nontermination witness can also be found
for the program in \autoref{fig:ntermsec1}, even though the code might intuitively seem to
terminate.
First, \texttt{thread0} executes its \texttt{lock} action, then \texttt{thread1} starts waiting.
If \texttt{thread0} always executes \texttt{unlock} and \texttt{lock} before
\texttt{thread1} is allowed to run, \texttt{thread1} will never be able to finish waiting.
The counterexample is illustrated in \autoref{fig:ntermsec1ce} and is valid also under weak fairness assumptions.
% We call this behaviour -- where some of the threads are only allowed to run in specific conditions -- \emph{starving}.
% Starving is of interest mostly for checking lock-free and wait-free algorithms and we will not focus on it here.
In general, if a thread waits for some condition which is both infinitely often true and infinitely often false, there can be a run in which the waiting thread is only allowed to run at those moments when the condition is false. This type of runs is present in any program that uses busy waiting, which is very common in practice.
For this practical reason, we
cannot rely on the definition of nontermination as expressed with the LTL
formula above, and we need a different way to describe nonterminating sections.
%will use a different definition which better matches the realities of
%nonterminating parts of parallel programs.
\begin{definition}[Nonterminating Resource Section Instance]\label{def:lnterm}
A~resource section instance is nonterminating if and only if it can reach a point from which it is not possible to reach its end.
\end{definition}
For a particular resource section (e.g., waiting for \texttt{m} in \texttt{thread1}), checking for the absence of nonterminating resource section instances can be expressed using a CTL* property \[ \mathbf{AG}\left(\textit{wait-m-t1-start} \implies \mathbf{A}[(\mathbf{EF}\,\textit{wait-m-t1-end}) \mathbin{\mathbf{W}} \textit{wait-m-t1-end}]\right) \] (where $\mathbf{W}$ is the weak until operator).
In general, the CTL* approach cannot be used, as it requires the set of resource sections to be known before the analysis starts, so that the formula can be created as a conjunction of formulas for each resource section.
This is hard to do if resource sections can be created at runtime, which is often the case when dealing with programs in languages such as C and C++ -- the number of objects such as threads, mutexes, or function invocations which are used to identify resource sections might be hard to determine without exploration of all the runs of the program.
\section{Detection of Nontermination}\label{sec:detection}
The detection of nonterminating resource section instances in the context of
explicit-state model checking proceeds as follows. The basic idea behind the
detection of nonterminating resource section instances is that the model checker
focuses on them one at a time. Every time a resource section instance is about
to be entered during the state space exploration, the algorithm introduces a
nondeterministic branching to the state space graph. In one branch, the resource
section instance remains inactive, in which case the state space exploration
proceeds as usual to discover other resource sections. However, in the other
branch, the instance becomes active. Under this branch, the resource section
instance is checked for being nonterminating -- it becomes \emph{active resource section instance} (ARSI).
Note that the nondeterministic
branching happens only outside of active resource sections, which means the
ARSIs cannot be nested. Once the
state space graph in the active branch reaches a state that is out of the scope of an ARSI, the state
space exploration within this branch is stopped (a state with no successors is
generated outside the ARSI). Active resource section instances cannot be
nested, but for any instance of a resource section nested in an active section
instance, there is also an instance which is nested in an inactive section
instance, and therefore can become active elsewhere in the state space. As a
result of this construction, for every nonterminating resource section in the
original program, there is a corresponding ARSI in the augmented state space
graph. To let the exploration algorithm know that it is exploring a part of the
state space that is within an ARSI, we mark all edges within ARSIs as accepting.\mnote{The accepting label of an edge is a notation borrowed from (transition-based) Büchy automata.
In DIVINE this information can be used by the exploration algorithm to behave differently for accepting edges.}
\begin{widefigure}
\begin{cppcode}
mutex m1, m2;
{
unique_lock l1(m1);
do_work_1();
{
unique_lock l2(m2);
do_work_2();
} // unlock(m2)
} // unlock(m1)
\end{cppcode}
\centering
\begin{tikzpicture}[node distance=0.6cm,>=stealth',bend angle=45,auto, baseline=(i.base)]
\tikzstyle{state}=[circle, thick, draw, minimum size=5mm]
\tikzstyle{endcheck}=[circle, thick, draw, cross out, minimum size=3mm]
\tikzstyle{acc}=[line width=2pt]
\node [state, initial, initial text=, initial above] (i) {};
\node [state, below left = 0.7cm and 3cm of i] (cs1) {}
edge [pre] node {\texttt{lock(m1)}} (i);
\node [state, below right = 0.7cm and 1.5cm of i] (cs1a) {}
edge [pre] node[above right] {\texttt{lock(m1)}} (i);
\node [state, below = of cs1a] (cs1aw) {}
edge [pre, acc] node[right] {\texttt{do\_work\_1}} (cs1a);
\node [state, below = 0.6cm of cs1aw] (cs1acs2) {}
edge [pre, acc] node[right] {\texttt{lock(m2)}} (cs1aw);
\node [state, below = of cs1acs2] (cs1acs2w) {}
edge [pre, acc] node[right, name=cs1acs2wl] {\texttt{do\_work\_2}} (cs1acs2);
\node [state, below = 0.6cm of cs1acs2w] (cs1acs2u) {}
edge [pre, acc] node[right, name=cs1aul] {\texttt{unlock(m2)}} (cs1acs2w);
\node [endcheck, below = 0.6cm of cs1acs2u] (cs1au) {}
edge [pre] node[right] {\texttt{unlock(m1)}} (cs1acs2u);
\node [state, below = of cs1] (cs1w) {}
edge [pre] node[right] {\texttt{do\_work\_1}} (cs1);
\node [state, below left = 0.5cm and 1.1cm of cs1w] (cs1cs2) {}
edge [pre] node[above left, name=cs1wl] {\texttt{lock(m2)}} (cs1w);
\node [state, below = of cs1cs2] (cs1cs2w) {}
edge [pre] node[right, name=cs1cs2wl] {\texttt{do\_work\_2}} (cs1cs2);
\node [state, below = 0.6cm of cs1cs2w] (cs1cs2u) {}
edge [pre] node[right] {\texttt{unlock(m2)}} (cs1cs2w);
\node [state, below = 0.6cm of cs1cs2u] (cs1u) {}
edge [pre] node[right] {\texttt{unlock(m1)}} (cs1cs2u);
\node [state, below = of cs1u] (end) {}
edge [pre] node[right] {\texttt{end}} (cs1u);
\node [state, below right = 0.5cm and 1.1cm of cs1w] (cs1cs2a) {}
edge [pre] node[above right] {\texttt{lock(m2)}} (cs1w);
\node [state, below = of cs1cs2a] (cs1cs2aw) {}
edge [pre, acc] node[right, name=cs1cs2awl] {\texttt{do\_work\_2}} (cs1cs2a);
\node [endcheck, below = 0.6cm of cs1cs2aw] (cs1cs2au) {}
edge [pre] node[right, name=cs1cs2aul] {\texttt{unlock(m2)}} (cs1cs2aw);
\begin{pgfonlayer}{background}
\tikzstyle{dashed}=[dash pattern=on 5pt off 5pt]
\node[draw, fit=(cs1a) (cs1acs2u) (cs1aul), inner xsep=0.2cm, inner ysep=0.1cm, fill = yellow!70, xshift=-0.1cm, line width = 2pt] (cs1ains) {};
\node[below left = 1mm of cs1ains.north east] {ARSI};
\node[draw, dashed, fit=(cs1cs2a) (cs1cs2aw) (cs1cs2awl), inner sep=0.1cm, fill = yellow!70, line width = 2pt] (cs1cs2ains) {};
\node[below left = 1mm of cs1cs2ains.north east] {ARSI};
\node[draw, fit=(cs1) (cs1wl) (cs1cs2u) (cs1cs2aul), inner ysep=0.1cm, inner xsep=0.1cm] (cs1ins) {};
\node[draw, dashed, fit=(cs1cs2) (cs1cs2w) (cs1cs2wl), inner sep=0.1cm] (cs1cs2ins) {};
\node[draw, dashed, fit=(cs1acs2) (cs1acs2w) (cs1acs2wl), inner sep=0.1cm] (cs1acs2ins) {};
\end{pgfonlayer}
\end{tikzpicture}
\smallskip
\caption{
A small example of a program with two resource section instances (on the top) and its state space, which shows active resource section instances (ARSIs; on the bottom).
In order to keep the state space simple, this example program is sequential and deterministic; the nondeterminism is caused only by the construction which gives rise to ARSIs.
The resource section instances belonging to the critical section of mutex \texttt{m1} are wrapped in a solid rectangle in the image, while resource section instances belonging to \texttt{m2} are wrapped in a dashed rectangle.
ARSIs are denoted by thick frame and yellow background and accepting edges in the state space are marked by thick arcs.
Recall that active resource section instances cannot be nested.
Crosses at the end of edges denote points where exploration of the state space was terminated due to reaching the end of an active resource section instance.
} \label{fig:active}
\end{widefigure}
An illustration of a state space graph augmented with nondeterministic choices
and accepting edges is given in \autoref{fig:active}. This augmentation of the state
space can be performed by program instrumentation. Now to discover ARSIs which
are nonterminating according to \autoref{def:lnterm}, it is enough to
detect terminal strongly connected components made of accepting edges only.
\subsection{Detection Algorithm}
Henceforward, we assume the state space graph is finite, and if a run of the
program to be verified terminates then this fact is reflected by a state with
no successors in the underlying state space graph. Note that the program may
terminate even within a resource section instance.
An ARSI terminates either by
reaching the end of the section instance or by the termination of the whole
underlying program. In both cases, this means a state with no successors is
generated and reachable from the ARSI entrance point.
Finally, we assume that any waiting is implemented in a nonblocking way; in
particular, we require that waiting operations give rise to cycles in the state
space of the waiting thread.\mnote{This is not a problem in practice as any
blocking synchronisation (such as waiting for a mutex) can be simulated by a
busy waiting loop.}
As a result, the detection
of nonterminating ARSIs can be performed as a search for an accepting terminal
strongly connected component in the state space graph.
\begin{definition}[Terminal Strongly Connected Component]
A strongly connected component $S$ is \emph{terminal}\mnote{Also sometimes
called bottom strongly connected components, or closed communicating classes,
especially in the area of probabilistic system analysis~\cite{norris}.} if for
each state $v$ in $S$ all successors of $v$ are in $S$ (there are no edges out
of $S$).
\end{definition}
\begin{definition}[Fully Accepting Terminal SCC (FATSCC)]
A terminal strongly connected component of the state space is \emph{fully
accepting} (\emph{fully accepting terminal SCC, or FATSCC}) if and only if it is nontrivial and all its edges are accepting.
\end{definition}
% \begin{lemma}\label{lem:active:atscc}
% A FATSCC has to be part of exactly one active resource section instance (ARSI).
% \end{lemma}
% \begin{proof}
% This follows directly from the construction of ARSIs in the state space: any edge which enters or leaves an ARSI is not accepting and therefore accepting edges occur only in ARSIs.
% FATSCC has to be nontrivial, therefore it contains at least one edge which must be accepting and therefore the FATSCC has to be part of an ARSI, as there are no accepting edges outside ARSIs.
% \end{proof}
\begin{theorem}\label{thm:lnterm}
A program contains a nonterminating resource section instance if and only if
its state space graph contains a fully accepting terminal strongly connected component.
\end{theorem}
\begin{proof}
Assume the program contains a nonterminating ARSI $\mathcal{A}$.
Then there must exist a set of states in $\mathcal{A}$ from which neither program end nor the corresponding resource section end can be reached.
Among these states, there must be a subset which can be repeated
indefinitely and cannot be left -- a nontrivial terminal SCC which is part
of an ARSI and therefore it is fully accepting -- a FATSCC in the state space.
For the other direction, let us assume that there is a FATSCC in the state
space graph. Since any edge which enters or leaves an ARSI is not accepting
(which follows directly from the construction of the state space graph), all
states that are part of the FATSCC must be states within a single
ARSI. Since the component is terminal and nontrivial, it cannot be left.
Furthermore, a program termination point cannot be part of the FATSCC as it
has no successors and an ARSI end cannot be part the FATSCC as the edges going to
it are not accepting. Therefore, it is impossible to reach either a program
termination point or a state that would be outside of the resource section
instance from the FATSCC, therefore, the FATSCC witnesses a resource section
instance that does not terminate.
\end{proof}
To detect the presence of a FATSCC in the state space graph, we employ the
standard Tarjan's algorithm for finding strongly connected components.
To decide if an SCC is terminal, it suffices to check that there are no
edges going from it to any different SCC. Finally, to detect if a terminal
component is nontrivial and fully accepting it is enough to check that the
component contains at least one state with some successors (it is nontrivial) and
that all states of the component have only accepting outgoing edges (it is fully
accepting). These are minor modifications of the algorithm.
Furthermore, it is possible to extend the algorithm to also perform safety
checking while checking for nontermination -- when a new edge with an error label
is traversed, the exploration can be terminated immediately with a safety
counterexample.
This way, any need for separate safety checking is eliminated.
% \paragraph{Size of State Space}
%
% The additional nondeterminism introduced by active resource section instances is bounded by the number of resource section instance starts in the state space of the original program.
% In the worst case, this can be proportional to the size of the original state space (giving us quadratic bound for the size of state space for nontermination checking with respect to the size of the original state space), but in reality the overhead is expected to be smaller.
%\paragraph{Global Nontermination}
Note that it is also possible to define \emph{global nontermination} using
\autoref{def:lnterm}. In this case, we only need to treat the whole
program as a single active resource section instance.
% If we do so, we can even
% avoid the introduction of extra nondeterminism, but global nontermination cannot
% be applied to checking that annotated parts of nonterminating programs
% terminate.
\subsection{Scheduling and Fairness} \label{sec:fairness}
% An idea for finding such resource section instances can be taken from considerations about fairness in parallel systems.
To provide further context, we also want to discuss the relation of our
nontermination property to LTL model checking with fairness. Fairness
constraints~\mcite[Chapter 3.5]{Baier2008} are needed in the analysis of temporal
properties of parallel systems to avoid reporting of unrealistic
counterexamples, such as those in which an enabled thread never gets the chance
to make an action. However, even if we use LTL formula to
describe nontermination and allow for LTL model checking under weak fairness, we
still may obtain counterexamples that are totally unrealistic. This is because a
weakly-fair scheduler\mnote{For our purposes, a weakly-fair
scheduler is a scheduler which ensures that on every accepting cycle in the
state space all threads which existed (and were not blocked) during the
execution of this cycle were also executed at least once on the cycle.}
admits runs in which the context switches that happen among participating
threads are very regular, hence unrealistic.
The nontermination as defined in \autoref{def:lnterm} can be seen as a
manifestation of an additional assumption about the thread scheduler. It
claims that the scheduler is in essence somehow irregular, i.e.,
it will not allow for a context switch always after a fixed number of
instructions or at a specific location in the code. Another way of looking at
this is to assume that the scheduler is probabilistic and assigns some non-zero
probability to interruption between any two instructions. % \footnotemark.
With a probabilistic scheduler, we can equivalently define nonterminating
resource section instance as a section instance which can get to the point when
there is zero probability of reaching its end.
Under the probabilistic view, we can
also say that programs we denote as correct, i.e., without nonterminating
sections, have zero probability of looping forever.
% In practice, this would
% manifest as a program that runs for quite some time seemingly hanging and then
% terminates. % Nevertheless, we believe our definition of nontermination matches
% well with real-world applications and allows easy addition of nontermination
% checking to programs which are already tested for safety.
% \footnotetext{
% In \divine it is possible to execute a sequence of instructions atomically using an atomic block.
% These atomic blocks are used only in the implementation of synchronisation primitives in such a way that they do not need to be interleaved.
% Therefore, from the point of scheduling, it is safe to treat atomic blocks in the same way as simple instructions.
% }
% The probabilistic view also gives us additional validation of our approach which
% uses terminal strongly connected components -- in probabilistic systems, it
% holds that the system will eventually end in one of its terminal strongly
% connected components.
\subsection{Implementation and Usage}
We have implemented our nontermination detection approach in a branch of the \divine model checker.
Resource sections can be specified by annotations in the source code
of the program to be analysed by the user of the tool.
Furthermore, \divine provides predefined resource sections for various POSIX
thread (\texttt{pthread}) synchronisation primitives, namely for
mutexes (including recursive and reader-writer mutexes), %; both for critical sections and for waiting to enter them),
condition variables, barriers, and joining of threads.
Since C++ threading support in \divine uses the libc++ library which uses POSIX threads, these resource sections are also used for native C++ threading.
User-defined annotations can be given in one of the following categories: exclusive section,
waiting for an event, and
waiting for function end.
For user-defined resource sections, \divine provides C and C++ interface which can be found on the web page accompanying this work.\mnote{\url{https://divine.fi.muni.cz/2019/lnterm}}
To make it possible to specify which resource section types should be considered
for analysis, we use program instrumentation, which enables resource sections
based on command line arguments (for more details see the accompanying web page).
The instrumentation also ensures that edges which are part of an ARSI are accepting.
The detection of nonterminating resource sections in \divine uses Tarjan's algorithm for finding strongly connected components.
The algorithm runs on-the-fly, which means that it generates the state space
graph as needed, and therefore, it can terminate before the entire state space graph is explored.
The algorithm finishes if it finds a fully accepting terminal strongly connected
component, if it discovers a safety error (to avoid the need for a separate
safety verification), or once the entire state space is explored.
\subsection{Interaction with Other Features of \divine}
Since \divine is a research tool, not all the features implemented within the
tool are expected to run together. In this case, there are some
features of \divine which interfere with local nontermination detection
in a not so obvious way.
\paragraph{Counterexamples}
%
When an error is found, \divine has support to show a counterexample and walk through it using an interactive simulator~\mcite{DIVINEToolPaper2017}.
For safety properties, this counterexample is a sequence of states which ends with an error.
For verification of properties described by LTL or Büchi automata (which are partially supported by \divine), the counterexample is a lasso-shaped trace.
For nontermination, the part of the state space to be reported consists of a fully
accepting terminal strongly connected component and a path that leads to it.
However, it is not practical to output the information about the whole SCC, as it can be large.
For this reason, \divine gives only a trace to the first state of the FATSCC (i.e., the first state from which end of the given resource section instance is not reachable).
\paragraph{Spurious Wakeups}
%
Condition variables are often used in parallel programs to block threads until some event occurs (e.g., a shared queue becomes non-empty).
They provide a function which blocks the current thread (\texttt{wait}) and a function which signals the condition variable and causes waiting threads to proceed (\texttt{signal}).
In most implementations, including C++ standard APIs and platform-specific APIs on Windows and Linux, \texttt{wait} is allowed to return before it is signalled: this behaviour is called \emph{spurious wakeup} and programmers must take it into account when using condition variables.
To help with the discovery of bugs caused by spurious wakeup, \divine simulates
spurious wakeup using nondeterministic choice.
For nontermination detection, it is necessary to ensure that any spurious wakeup does not hide nontermination -- we want to report resource section instances which can be only left by spurious wakeup as nonterminating.
This can be done by careful implementation of the \texttt{wait} function in
\divine{} -- it first nondeterministically decides if a spurious wakeup will
happen, and then, if it is not happening, it enters resource section which waits
for \texttt{signal} and cannot be woken up spuriously.
If the spurious wakeup is simulated, it behaves as if the thread was blocked and allows other threads to run.
Once the waiting thread is used again for generation of successor states, it is unblocked and \texttt{wait} returns spuriously.
The exhaustive enumeration of possible thread interleavings ensures that other threads can run arbitrarily long.
\paragraph{Data Nondeterminism and Symbolic Data}
%
To make it possible to verify programs that depend on input data, \divine has support for symbolic values~\mcite{LRB2018}.
In an analysis of programs with symbolic values, the computation can be split when a branch depends on a symbolic value.
This splitting can cause problems for nontermination detection if leaving some resource section instance requires a particular value of an input variable.
Therefore, in the presence of symbolic data, nontermination checking might miss some instances of nontermination.
We defer this problem to future work.
\paragraph{Relaxed Memory Models}
%
\divine has support for analysis of parallel programs under the x86-TSO memory model of Intel and AMD CPUs (\mcite{SB2018x86tso} and \autoref{chap:mm}), which allows the program to exhibit behaviour not present under the interleaving semantics of threads.
One of the main problems in interaction between nontermination and relaxed memory is that relaxed memory models over-approximate the possible behaviours of the system to cover all possibilities of contemporary and presumably also future processors of a given architecture.
As nontermination is checking for \emph{absence of termination}, it can spuriously hide nontermination if the state space of the program is over-approximated.
Again, we defer this problem to future work.
\section{Evaluation} \label{sec:evaluation}
\pgfplotsset{ylabel style = {yshift = -2mm}}
\begin{figure}[t]
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [s]}, ylabel={local nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark size=2pt ]
table [x=safety, y=local, meta=sl] {wall.dat};
\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\hfill%
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [MB]}, ylabel={local nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
xmin = 385, ymin=385, xmax=16384, ymax=16384,
title = Memory Used (megabytes)]
\addplot [ scatter, only marks,mark size=2pt ]
table [x=safety, y=local, meta=sl] {memory.dat};
\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\caption{
Scatter plots which compare local nontermination detection with safety checking as implemented in \divine.
Both axes use a logarithmic scale.
The dashed and dotted lines in wall time graphs signify $10 \times$ and $100 \times$ difference respectively.
For graphs of memory usage, the dotted lines signify $3 \times$ difference and the dashed $10 \times$ difference.
Green squares correspond to benchmarks which were error-less in both modes and blue circles correspond to benchmarks which contained errors in both cases.
Red triangles correspond to benchmarks which contained a nonterminating section.
The crosses on the outer edge of the plot correspond to timeouts and out-of-memory errors.
All the failures for local/global nontermination were due to timeouts, benchmarks which failed with out-of-memory did so in all cases.
}
\label{fig:eval}
\end{figure}
\begin{figure}[tp]
\centering
% \newcommand{\ct}{\multicolumn{1}{c}}
% \newcommand{\ctv}{\multicolumn{1}{c|}}
% \newcolumntype{R}{>{\raggedleft\arraybackslash}X}
% \caption{}\label{tab:eval}
% \begin{tabularx}{\textwidth}{ll|R|R|R|R}
% & & \ctv{Count} & \ctv{Wall Time} & \ctv{Memory Used} & \ct{State Space} \\
% & & \ctv{} & \ctv{[s]} & \ctv{[MB]} & \ct{} \\
% \hline
% \multirow{3}{*}{Boost Thread} & safety & 32 & 0 & 0 & 0 \\
% & local & 29 & & & \\
% & global & 32 & & & \\ \hline
% \multirow{3}{*}{LockFree} & safety & 6 & & & \\
% & local & 6 & & & \\
% & global & 6 & & & \\ \hline
% \multirow{3}{*}{\divine} & safety & 16 & & & \\
% & local & 7 & & & \\
% & global & 16 & & & \\ \hline
% \multirow{3}{*}{all} & safety & 54 & & & \\
% & local & 42 & & & \\
% & global & 54 & & &
% \end{tabularx}
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [\# of states]}, ylabel={local nonterm. [\# of states]}, width=0.49\textwidth, height=0.49\textwidth,
xmin = 1, ymin=1, xmax=3200000, ymax=3200000,
title = Number of states]
\addplot [ scatter, only marks,mark size=2pt ]
table [x=safety, y=local, meta=sl] {states.dat};
\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\caption{
A comparison of state space sizes for local nontermination and safety.
The dashed and dotted lines signify $10 \times$ and $100 \times$ difference respectively.
The meaning of the marks in the graph is the same as in \autoref{fig:eval}.
} \label{fig:evalst}
%
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={safety [\# of states]}, ylabel={local nonterm. [\# of states]}, width=0.49\textwidth, height=0.49\textwidth,
% xmin = 1, ymin=1, xmax=3200000, ymax=3200000,
% title = Number of states]
% \addplot [ scatter, only marks,mark size=2pt ]
% table [x=global, y=safety, meta=sl] {states.dat};
%
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
\end{figure}
To our best knowledge, there is no suitable benchmark set that would cover
termination in parallel programs.
Therefore, we had to develop a suitable
benchmark on our own. We naturally wanted to analyse the performance of our
verification method on real-world data structures. Unfortunately, it is hard to
reuse any existing real-world test cases of parallel data structures for
verification, as these tests are usually developed as stress tests. Stress
tests use large amounts of data and are supposed to be run for a long time in
order to maximise a chance that a parallelism-related bug is found during the
testing period. For the purpose of application of a formal verification tool such
as \divine, the mentioned approach to testing of parallel programs is
inappropriate. Since a model checker explores all interleavings of the program
systematically within a single execution, further repeated executions, such as
the ones within a stress test, are useless and only add to the complexity of
the verification task. For these reasons, the tests we included in our benchmark
are tests we created or adapted and modified specifically for the purpose of
nontermination detection we wanted to evaluate.
To preserve some diversity at least, we opted for the following tests to be
included in our benchmark. First, to cover some real-world scenarios, we
created some tests for the Thread library from widely used C++
Boost\mnote{\url{https://www.boost.org/doc/libs/1_69_0/doc/html/thread.html}}
(35 test cases). Second, we used some tests from \divine project itself (8 test
cases), and finally, we developed a couple of specific tests for small programs
demonstrating the behaviour of local nontermination with various synchronisation
primitives (16 test cases). Overall, the benchmark covered usage of lock-free and
mutex guarded parallel data structures (e.g. parallel queues), synchronised
variables, less-used synchronisation primitives such as reader-writer locks, or
a single-producer-single-consumer queue and the parallel hashset from
\mcite{BRSW15HS}.
To evaluate our verification approach, we let each test run with a 4 hours
timeout and 16 GB memory limit. We measured runtime and memory requirements for
the three following configurations of our tool:
\begin{description}
\item[safety] A baseline configuration, in which the tool merely generates the
state space of the program and checks for the standard safety issues, such as
assertion violation, invalid memory access, etc. In this mode, no
nontermination can be detected.
\item[local nontermination] The configuration in which the nonterminating
resource section detection is used. Under this configuration, the state
space of the original program is expanded with every entrance to the
resource section as described in \autoref{sec:local-nontermination}.
\item[global nontermination] The configuration that treats the whole program
as a single resource section and detects if it terminates according to
\autoref{def:lnterm}. Since this configuration does not introduce
additional nondeterminism, the state space of the program is roughly the
same size as for \emph{safety}.
\end{description}
The difference between local and global nontermination configurations is
in the shape of the state space; both use the same algorithm (Tarjan's
algorithm for SCC decomposition). Thanks to this difference, local nontermination
can be applied to programs which should not terminate, to check if each of its
resource sections terminate.
Comparison of safety and local nontermination can be seen in \autoref{fig:eval}. We
evaluate wall time and memory consumption -- in practice, heavy-duty tools like
\divine are likely to be used in long-running overnight tests (preferably only
if anything relevant for the test changed since the last run), therefore longer
runtimes might not be a big problem up to some point, but it is important to
test that the verification tasks fit in some reasonable amount of memory. As we
can see, the time overhead of local nontermination configuration is quite
significant (up to $59 \times$) especially for larger programs which are
correct. As for memory consumption, we can see that total overhead is less than
threefold, which is mostly due to the state space compression employed by
\divine.
The wall time blow-up is due to extra nondeterminism introduced by active
resource sections -- the state space can grow by a factor that is related to the
number of resource section instances encountered in the original state
space. Note that many resource sections are likely to be very short. For
programs that were invalid, i.e., contained some nonterminating resource
sections, the verification usually exited faster under the local
nontermination configuration than under the safety configuration, which
means that once a nonterminating section is encountered, it is checked
relatively quickly. Further insight into the comparison of safety and local
nontermination can be seen in \autoref{fig:evalst}, which compares sizes of state
spaces for these two configurations. Here, we can see that the overhead in the
size of the state space is lower than the time overhead (less than $10 \times$).
The extra time overhead is likely caused by inefficiencies in \divine{}. For
example, when \divine nondeterministically chooses from $N$ values, it will
re-execute instructions between the last remembered state and the point of the
nondeterministic choice $N$ times.
\begin{figure}[tbp]
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={global [s]}, ylabel={local nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark=*,mark size=2pt ]
table [x=global, y=local, meta=gl] {wall.dat};
\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\hfill %
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={global [MB]}, ylabel={local nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
% xmin = 385, ymin=385, xmax=16384, ymax=16384,
% title = Memory Used (in megabytes)]
% \addplot [ scatter, only marks,mark=*,mark size=2pt ]
% table [x=global, y=local, meta=gl] {memory.dat};
%
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
% %
% \medskip\\
\begin{tikzpicture}
\begin{loglogaxis}[ xlabel={safety [s]}, ylabel={global nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark=*,mark size=2pt ]
table [x=safety, y=global, meta=sg] {wall.dat};
\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
% \hfill %
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={safety [MB]}, ylabel={global nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
% xmin = 385, ymin=385, xmax=16384, ymax=16384]
% \addplot [ scatter, only marks,mark=*,mark size=2pt ]
% table [x=safety, y=global, meta=sg] {memory.dat};
%
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
\caption{
The first scatter plot compares local nontermination checking with checking if the whole program terminates (global nontermination).
In this comparison, the red triangles correspond to benchmarks which did not end, but for which all resource sections terminated.
Finally, in the second graph, we compare global nontermination checking with safety.
Here, the red triangles correspond to benchmarks which did not terminate but were safe.
See \autoref{fig:eval} for the general description of the plot layouts.
} \label{fig:eval2}
\end{figure}
\autoref{fig:eval2} shows a comparison of local nontermination with global nontermination and safety with global nontermination.
Here, we can see that global nontermination behaves similarly to safety, with some time overhead caused by the somewhat more involved algorithm.
This is well in line with our expectations, as global nontermination does not introduce any extra nondeterminism compared to safety and Tarjan's algorithm runs in linear time with respect to the size of the state space, and so does reachability.
This further highlights that the overwhelming part of the time overhead of local
nontermination is in the increase of the state space size.
It is important to note that local nontermination can be applied to programs
which are intended to run infinitely (but have finite state space) -- it can detect if there is a nonterminating resource section in such a program.
As state space sizes and memory consumption are almost the same for safety and for global nontermination, we omit memory and state space size comparisons for the later two pairs of configurations.
\paragraph{Errors Found}
%
No errors were found in the C++ Boost tests. On the other hand, all the errors
we artificially implanted in the test cases were found. As for the errors which
were not deliberately introduced in the tests, we have found one error in a test
of a lock-free queue from an older version of \divine. The test was part of
\divine's test suite for a long time and was used to test that the queue works
when it is continuously fed with elements while keeping its size bounded. This
means that the test was deliberately nonterminating and the intention was that
all the operations executed by main loops of the test's two threads terminate,
which was not the case -- a variable which was supposed to keep track of the
size of the queue was not maintained properly, and therefore it could have happened
that the reader thread would wait indefinitely, attempting to read from an empty
queue which would never fill up. So far, the test case was run under \divine with
the safety algorithm only, therefore, the error did not manifest and remained
undetected.
\section{Related Work} \label{sec:related}
For the related work, we consider only results which go beyond safety checking.
There are many approaches to find problems such as assertion violations or memory safety violations, but they are often fundamentally limited to properties concerning finite runs of the program, and we are focusing here on an infinite behaviour, namely on the absence of termination.
Similarly, we do not explore in depth techniques which specialise on checking sequential programs and have no support for parallelism, as well as techniques which are tailored to a specific modelling language and cannot be applied in general.
Several techniques for checking properties other than safety exist -- indeed
usage of various temporal logics, such as Linear Temporal Logic (LTL)
\mcite[Chapter 5]{Baier2008} and Computation Tree Logic (CTL) \mcite[Chapter 6]{Baier2008} in the context of model checking dates way back to the beginning of the research of formal methods.
Unfortunately, these techniques are not often applied to programs written in real-world programming languages such as C and C++.
As for techniques which detect nontermination, both static and dynamic
techniques exist for the detection of deadlocks caused by circular waiting for
mutexes~\mcite{CC14,agarwal2010detection,bensalem2005scalable}.
These techniques specialise on mutexes and do not allow general nontermination detection, and it is unlikely that they could be naturally extended to cover it.
Some methods detect deadlocks of the whole program (i.e., a program state from which the program cannot move) \mcite{Chaki2005,Demartini99}, but these techniques cannot find cases in which only some threads of the program are making progress, while other threads are blocked forever.
Also, these global deadlock detection techniques are inadequate in the presence of synchronisation mechanisms which causes busy waiting instead of blocking (for example spin locks) or in the cases when normally blocking operations are implemented using busy waiting (which can be easier to handle for the verifier in some cases).
A somewhat different approach based on communicating channels is proposed in \mcite{Ng2016}, but this approach is aiming at the Go programming language which primarily uses shared channels for communication between threads.
Overall, neither of these techniques is applicable in general for the detection of nontermination in programs which use a combination of synchronisation primitives in shared memory.
There are also techniques for checking termination of sequential programs \mcite{Cook2006,Berdine2006,Dietsch2015,Brockschmidt2016,Hensel2017,Giesl2017} and techniques for termination analysis in parallel programs that target modelling languages \mcite{Kupriyanov2014,Albert2017}.
A more realistically-targeted termination checker for parallel programs is presented in \mcite{Cook2007thr}.
Its primary goal is to show a given thread terminates.
It abstracts other threads into a model of the environment which is incrementally refined and attempts to prove termination of the thread locally.
It appears to support C programs, but it assumes the number of threads is fixed and it does not support synchronisation with mutexes and similar primitives.
Another tool for termination-checking parallel C programs is presented in \mcite{Popeea2012}, it is similar to \mcite{Cook2007thr} but more general.
% rely-guarantee reasoning -- keeps track of communication of each thread with the rest of the program.
It uses transition invariants and well-foundedness to prove termination of whole programs or individual threads and uses predicate abstraction and refinement.
The downside of this approach is that it can report unfair paths as termination counterexamples.
Probably the closes relative to our approach is shown in \mcite{Atig2012term}.
It presents a tool MUTANT that can detect fair nontermination.
It aims to identify ultimately periodic executions (i.e., executions in which the same sequence of states is repeated, this sequence can be preceded by a finite stem) and therefore is complete only in programs with finite state space.
Furthermore, the number of context switches in the stem and each of the periods is bounded.
The tool uses sequentialisation -- it reduces the problem of (context-bounded) termination checking of a parallel program to assertion checking in a sequential program.
It does not target any realistic programming language; however, it targets the Boogie modelling language for which translators from C and other languages exist.
Therefore, it might be possible to use this tool even for C and other programming languages.
The main difference between our approach and the one in \mcite{Atig2012term} is that our approach uses a different notion of fairness, targets primarily C and C++, does not impose any limits on the number of context switches, and can detect local nontermination.
\section{Conclusion} \label{sec:conclusion}
We have presented a novel approach to detection of parts of real-world programs written in C and C++ which do not terminate.
Our method allows for detection of partial deadlocks (and livelocks) caused by misuse of synchronisation, but it is not limited to any particular mode of parallel programming (such as lock-based synchronisation, or programs with communication channels) and indeed allows any combination of synchronisation allowed by C++ itself.
To achieve this, it is necessary to provide simple annotations for parts of the code which are to be checked for termination.