termination.tex

\srcnote{Text of this chapter is based on~\mcite{SB2019}.
The benchmarks are unmodified from the original paper.}

\bigskip\noindent
One of the critical problems with parallel programs is ensuring that they do not hang or wait indefinitely -- i.e., there are no deadlocks, livelocks and the program proceeds towards its goals.
In this work, we present a practical approach to detection of nonterminating sections of programs written in C or C++ and its implementation into the \divine model checker.
This complements the existing techniques for finding safety violations such as assertion failures and memory errors.
Our approach makes it possible to detect partial deadlocks and livelocks, i.e., those situations in which some of the threads are progressing normally while the others are waiting indefinitely.
The approach is also applicable to proving nontermination of components in programs that do not terminate themselves, but the components should eventually finish their work.
Such programs include, for example, server-like applications that have infinite event loops, but each event should be handled in a finite time.
The termination criteria can be user-provided; however, \divine comes with the set of built-in termination criteria suited for the analysis of programs with mutexes and other common synchronisation primitives.

\section{Motivation and Introduction}

A significant limitation of many existing tools for analysis of parallel
programs in programming languages such as C and C++ is that they are only concerned with safety checking -- they check that a bad state of the program is unreachable.
Most common examples of bad states include assertion failures and memory errors (such as invalid memory accesses and memory leaks).
Unfortunately, this is far from being sufficient in practice.
See, for example, the code given in~\autoref{fig:livelock0}.
That piece of code easily passes any safety checks; however, when executed in reality, it often hangs and does not terminate.

\begin{figure}[t]
\begin{minipage}[t]{0.49\textwidth}
\begin{cppcode}
    // can be used for
    // synchronisation
    std::atomic< int > x = 0;

    void worker() {
      while ( x != 0 ) { } // wait
      do_work();
    }
\end{cppcode}
\end{minipage}
\hfill%
\begin{minipage}[t]{0.49\textwidth}
\begin{cppcode}
    int main() {
      // start thread
      // running worker
      std::thread t( worker );
      x = 42; // let worker run
      // ...
      t.join();
    }
\end{cppcode}
\end{minipage}
\caption{
A simple C++ code snippet with two threads, it uses C++ standard threading support and atomic variables.
A programmer's intention was that the \texttt{worker} function first waits until \texttt{x}
becomes non-zero, and then proceeds with \texttt{do\_work}. However, the waiting
condition (at the first line of the \texttt{worker} function) is incorrectly just the
opposite. Therefore, if \texttt{main} executes \cpp{x = 42} before waiting in \texttt{worker} starts, the wait will never end (assuming \texttt{x} is never set to 0 again).
Note that none of safety checks is able to detect that the program might hang.
For the rest of the paper, we will omit the \cpp{std::} namespace to
simplify the notation.
}\label{fig:livelock0}
\end{figure}

In this work, we report about our new technique for checking nontermination for parallel programs written in C and C++ that may be applied to programs with arbitrary synchronisation primitives.
In particular, we can check that \emph{a specified part} of a program finishes
whenever its execution has been started, which in turn enables us to check for problems such as partial deadlocks or local nontermination.
Note that our technique does not require the program under analysis to terminate at all.
Therefore, it is also applicable to programs that do not terminate but have some
parts that are supposed to finish.
It does; however, require that the program has a finite state space because our
technique is built on top of a state space exploration. Note that even for a finite state space, a program may exhibit infinite behaviour.\mnote{A finite state space can contain cyclical infinite behaviour -- a loop in the state space.

\medskip\noindent\centering
\begin{tikzpicture}[>=latex,>=stealth',auto,node distance=2cm,semithick,initial text=]
  \node[state,initial] (t) {$x$};
  \node[state] (f) [right of = t] {$\lnot x$};

  \path[->, shorten >=1pt]
    (t) edge[bend left] (f)
    (f) edge[bend left] (t)
  ;
\end{tikzpicture}
}

The main observation is that a program often has sections which once entered
should also be left: for example critical sections, certain function calls (such
as pop from a queue, which can wait for an element to become available; or a
thread join, etc.), or parts of code which wait for a resource or an action (waiting for a mutex, waiting on a barrier, waiting until a variable is set to a given value).
If the analysis of the program focuses on such sections, it is possible to
detect when these sections are started but do not terminate.
This covers partial deadlock and partial livelock detection in which such sections participate.
We also provide a global nontermination detection mode that decides if the
program as a whole terminates, nevertheless this is not the primary goal of our approach.

Our technique is built on top of explicit-state model checking.  We believe that while explicit-state model checking is prone to state space explosion, it is well suited for the detection of problems related to infinite runs of parallel programs which cannot be handled by techniques such as bounded model checking or stateless model checking.
While our approach is closely related to checking for properties written in temporal logic such as LTL or CTL*, our \emph{local nontermination} technique cannot be substituted equivalently with CTL* model checking.
One of the reasons is that these logics are unable to relate to entities which are dynamically created during the execution of the program, and there is no bound to their number.
For example, there is no way to express in CTL* that for all mutexes it holds that if they are locked, they are also eventually unlocked unless all the mutexes are enumerated beforehand.
This is an essential concern for realistic programs where mutexes and other synchronisation primitives can be created dynamically at runtime, and their number can depend on the computation of the program itself.
Furthermore, to avoid counterexamples which are unrealistic with practical
thread schedulers, we need a form of \emph{fairness} of process scheduling
different from the fairness constraints used typically with LTL model checking.


The approach described in this work is implemented in a modified version of the \divine model checker~\mcite{DIVINEToolPaper2017,RSCB2018}.
The implementation, as well as all the examples, can be found on the paper webpage\mnote{\url{https://divine.fi.muni.cz/2019/lnterm/}}.

% The rest of the work is structured as follows: \autoref{sec:related} gives a short
% overview of related work and \autoref{sec:prelim} gives definitions and preliminaries needed for the rest of the work.
% In \autoref{sec:local-nontermination} we define our \emph{local nontermination} property, in \autoref{sec:detection} we discuss how it can be checked and the implementation in \divine, and it \autoref{sec:evaluation} we evaluate it.
% Finally, \autoref{sec:conclusion} concludes this work.

\section{Resource Sections}

A \emph{resource section} of a program is a block of code with an identifier of a resource and type of the resource section.
Each resource section is delimited in the source code by section start and section end annotations.
Examples of such sections are a mutex-waiting section that denotes a block of code in which a thread is waiting for the acquisition of a mutex.
Mutex-waiting section is identified by a mutex and the thread which waits for it.
Another example can be a critical section, which is identified by a mutex (there is no need to use a thread for the identification, as a mutex can be owned by at most one thread at any point in time).
Resource section can also be bound to a function -- in this case, it is identified by the stack frame of the function and by the program counter of its beginning.
Regardless of the identification, the idea for a resource section is that once it is entered, it should also be exited.

As a resource section can be entered repeatedly (for example when it is on a cycle or in a function which is called multiple times), we will define a \emph{resource section instance} to be a particular execution of a resource section with the given identifier.
The author of annotations which define resource sections should ensure that the same resource section is not entered again before it is left.
This does not limit the usage of function-associated resource sections to non-recursive functions -- each such section is also identified by the stack frame, and therefore resource sections corresponding to different recursion depths are different resource sections.
Similarly, a program can be in multiple resource sections which wait for the same mutex at the same time, each of them corresponding to a different waiting thread.

We propose to introduce resource sections in two ways -- some synchronisation primitives (mutexes, condition variable, threads) have built-in resource sections in DIVINE, and the user can introduce their own resource sections.
Examples of resource sections can be seen in \autoref{fig:lnterm:secs:a} and \autoref{fig:lnterm:secs:b}.

\begin{figure}[t]
    \begin{cppcodeln}
      mutex m;

      void worker() {
        unique_lock lock(m); // wait resource section,
                             // start of critical section
        do_work();
      } // unlock = end of critical section

      int main() {
        thread w(worker);
        do_work();
        w.join(); // wait resource section
      }
    \end{cppcodeln}
    \vspace{-\bigskipamount}
    \caption{
      Examples of predefined resource sections.
      There is a resource section in the mutex lock on line 4 (active while the
      thread waits for the mutex to become available), a resource sections
      which corresponds to the critical section on lines 4--6, and a resource
      section in the join of thread \texttt{w} on line 12 (active while main
      waits for \texttt{worker} to finish).
    }\label{fig:lnterm:secs:a}
\end{figure}
\begin{widefigure}
    \begin{cppcodeln}
      #include <atomic>
      #include <rst/termsec.h> // for user-defined
                               // resource sections
      struct SpinLock {
        void lock() {
          { // define an explicit scope
            termsec::CheckWait check( &_flag );
            while ( _flag.test_and_set() ) { }
          } // end of scope -- end of 'check' resource section
          termsec_begin_exclusive( &_flag );
        }

        void unlock() {
          termsec_end_exclusive( &_flag );
          _flag.clear();
        }

        private:
        std::atomic_flag _flag = ATOMIC_FLAG_INIT;
      }; 
    \end{cppcodeln}
    \vspace{-\bigskipamount}
    \caption[UNUSED]{
      User defined resource sections in a spinlock implementation.
      The spinlock is implemented using a C++ atomic flag which is an atomic
      variable which has two operations -- \texttt{clear} which resets its
      value to \cpp{false}, and \texttt{test\_and\_set} which sets its value to
      \cpp{true} and returns the original value.
      In this spinlock the value \cpp{true} indicates that the spinlock is
      locked.

      One resource section guards the wait for \texttt{\_flag} to change at
      lines 7--8.
      The other resource section guards the critical section of the spinlock,
      it starts in \texttt{lock} at line 10 and ends in \texttt{unlock} at line
      14.
      Both resource sections are identified by the address of the
      \texttt{\_flag} variable which uniquely identifies a particular instance
      of a \texttt{SpinLock} class.
      Furthermore, the waiting resource section (which uses \texttt{CheckWait}
      helper) is automatically also identified by the thread which executes it
      -- this allows more than one thread to wait for the same spin lock.
    }\label{fig:lnterm:secs:b}
\end{widefigure}

\section{Local Nontermination} \label{sec:local-nontermination}

With our local nontermination property, we aim at detection of resource section instances which are entered but are never left -- \emph{nonterminating resource section instances}.
We will first use examples of terminating and nonterminating resource section instances, and then we will define them precisely.

\begin{widefigure}
\begin{subfigure}[t]{0.475\textwidth}
\begin{cppcode}
    mutex m;

    void thread0() {
      // Error:
      unique_lock lock(m);
      while (true) {
        do_work();
      }
    } // unlock

    void thread1() {
      while (true) {
        unique_lock lock(m);
        do_other_work();
      } // unlock
    }
\end{cppcode}
\vspace{-\medskipamount}
\caption{
    A program with a nonterminating critical section (in \texttt{thread0}) and a deadlock (if \texttt{thread0} enters its critical section, \texttt{thread1} will wait infinitely).
    In C++ it is possible to use scope-based locks: the critical section belonging to mutex \cpp{m} is entered when \cpp{unique_lock lock(m)} is executed and left at the end of the scope in which the \cpp{lock} variable was defined (at the matching curly brace; also marked with comment \cpp{// unlock}).
}\label{fig:ntermsec0}
\end{subfigure}
%
\hfill
%
\begin{subfigure}[t]{0.475\textwidth}
\begin{cppcode}
    mutex m;

    void thread0() {
      // Fixed:
      while (true) {
        unique_lock lock(m);
        do_work();
      } // unlock
    }

    void thread1() {
      while (true) {
        unique_lock lock(m);
        do_other_work();
      } // unlock
    }
\end{cppcode}
\vspace{-\medskipamount}
\caption{
    A fixed version of the program from \autoref{fig:ntermsec0} (the start of the critical section was moved from the position \cpp{// Error} in the left code to \cpp{// Fixed} and therefore the critical section can end now).
    Intuitively, each critical section in this program terminates.
    However, as we can see in \autoref{fig:ntermsec1ce}, it is possible to find an infinite path in the state space of this program that infinitely waits for one of the critical sections.
    To make matters worse, this path can respect weak fairness.
} \label{fig:ntermsec1}
\end{subfigure}
\caption{Example programs with nonterminating resource section (\subref{fig:ntermsec0}) and terminating resource sections (\subref{fig:ntermsec1}).
}
\end{widefigure}
\begin{widefigure}
\center
\begin{tikzpicture}[node distance=2.3cm,>=stealth',bend angle=45,auto, baseline=(i.base), line width = 2pt]

    \tikzstyle{state}=[circle, thick, draw, minimum size=6mm]
    \tikzstyle{endcheck}=[circle, thick, draw, cross out, minimum size=3mm]

    \node (i) {};
    \node [state, right = of i] (lt0) {}
        edge [pre, draw=red] node[yshift=-1mm] {\texttt{0: lock(m)}} (i);
    \node [state, right = of lt0] (lt0wt1) {}
        edge [pre, draw=blue, dash pattern=on 5pt off 5pt] node[yshift=-1mm] {\texttt{1: wait(m)}} (lt0);
    \node [state, right = of lt0wt1] (lt0wt1ut0) {}
        edge [pre, draw=red] node[yshift=-1mm] {\texttt{0: unlock(m)}} (lt0wt1);
    \node [state, above = 1cm of lt0wt1] (lt0wt1ut0work) {}
        edge [pre, draw=red, out=0, in=90] node {\texttt{0: do\_work()}} (lt0wt1ut0)
        edge [post, draw=red, out=180, in=90] node[above left] {\texttt{0: lock(m)}} (lt0);
    \node [right = of lt0wt1ut0] (out) {}
        edge [pre, draw=blue, dashed, line width = 0.8pt] node[yshift=-1mm] {\texttt{1: lock(m)}} (lt0wt1ut0);
\end{tikzpicture}
\caption{
A fragment of state space of program in \autoref{fig:ntermsec1} with starving lasso marked with bold edges.
Each edge is marked by the thread it belongs to and the action of this thread.
Furthermore, to ease the orientation, actions belonging to \texttt{thread0} are marked with continuous \textcolor{red}{red} edges while actions belonging to \texttt{thread1} are marked with dashed \textcolor{blue}{blue} edges.
We can see that both threads participate in the repeated part of the counterexample and \texttt{thread0} is denied the possibility (starves) to execute after \texttt{0:~unlock(m)} (the thin blue dashed edge).
} \label{fig:ntermsec1ce}
\end{widefigure}

A simple example can be seen in \autoref{fig:ntermsec0}. There we have a mutex which
is locked, but never unlocked as the corresponding critical section contains an
infinite loop. We have four different resource sections in this example. Two
of them corresponds to the critical sections guarded by the mutex, and two of
them are hidden inside  \texttt{unique\_lock}, where they implement
waiting until the mutex is unlocked.
 Nonterminating resource section instances are the instances corresponding to the critical
section in \texttt{thread0} and any instances corresponding to waiting for the mutex
in \texttt{thread1} that is executed after the critical section in
\texttt{thread0} is entered.  We can fix this example by putting the critical
section in \texttt{thread0} inside the infinite loop, as shown in \autoref{fig:ntermsec1}.

Suppose that we have defined nonterminating section as one in which it is possible to stay indefinitely (i.e., for the specific case of waiting for \texttt{m} in \texttt{thread1}, termination could be expressed by LTL formula $\mathbf{G}(\textit{wait-m-t1-start} \implies \mathbf{F}\,\textit{wait-m-t1-end})$).
We can witness the existence of such nonterminating section in a program with a finite state
space by a lasso-shaped path. Such the nontermination witness can also be found
for the program in \autoref{fig:ntermsec1}, even though the code might intuitively seem to
terminate.
First, \texttt{thread0} executes its \texttt{lock} action, then \texttt{thread1} starts waiting.
If \texttt{thread0} always executes \texttt{unlock} and \texttt{lock} before
\texttt{thread1} is allowed to run, \texttt{thread1} will never be able to finish waiting.
The counterexample is illustrated in \autoref{fig:ntermsec1ce} and is valid also under weak fairness assumptions.
% We call this behaviour -- where some of the threads are only allowed to run in specific conditions -- \emph{starving}.
% Starving is of interest mostly for checking lock-free and wait-free algorithms and we will not focus on it here.

In general, if a thread waits for some condition which is both infinitely often true and infinitely often false, there can be a run in which the waiting thread is only allowed to run at those moments when the condition is false. This type of runs is present in any program that uses busy waiting, which is very common in practice.
For this practical reason, we
cannot rely on the definition of nontermination as expressed with the LTL
formula above, and we need a different way to describe nonterminating sections.
%will use a different definition which better matches the realities of
%nonterminating parts of parallel programs.
\begin{definition}[Nonterminating Resource Section Instance]\label{def:lnterm}
A~resource section instance is nonterminating if and only if it can reach a point from which it is not possible to reach its end.
\end{definition}

For a particular resource section (e.g., waiting for \texttt{m} in \texttt{thread1}), checking for the absence of nonterminating resource section instances can be expressed using a CTL* property \[ \mathbf{AG}\left(\textit{wait-m-t1-start} \implies \mathbf{A}[(\mathbf{EF}\,\textit{wait-m-t1-end}) \mathbin{\mathbf{W}} \textit{wait-m-t1-end}]\right) \] (where $\mathbf{W}$ is the weak until operator).

In general, the CTL* approach cannot be used, as it requires the set of resource sections to be known before the analysis starts, so that the formula can be created as a conjunction of formulas for each resource section.
This is hard to do if resource sections can be created at runtime, which is often the case when dealing with programs in languages such as C and C++ -- the number of objects such as threads, mutexes, or function invocations which are used to identify resource sections might be hard to determine without exploration of all the runs of the program.

\section{Detection of Nontermination}\label{sec:detection}

The detection of nonterminating resource section instances in the context of
explicit-state model checking proceeds as follows. The basic idea behind the
detection of nonterminating resource section instances is that the model checker
focuses on them one at a time. Every time a resource section instance is about
to be entered during the state space exploration, the algorithm introduces a
nondeterministic branching to the state space graph. In one branch, the resource
section instance remains inactive, in which case the state space exploration
proceeds as usual to discover other resource sections. However, in the other
branch, the instance becomes active. Under this branch, the resource section
instance is checked for being nonterminating -- it becomes \emph{active resource section instance} (ARSI).
Note that the nondeterministic
branching happens only outside of active resource sections, which means the
ARSIs cannot be nested.  Once the
state space graph in the active branch reaches a state that is out of the scope of an ARSI, the state
space exploration within this branch is stopped (a state with no successors is
generated outside the ARSI). Active resource section instances cannot be
nested, but for any instance of a resource section nested in an active section
instance, there is also an instance which is nested in an inactive section
instance, and therefore can become active elsewhere in the state space. As a
result of this construction, for every nonterminating resource section in the
original program, there is a corresponding ARSI in the augmented state space
graph. To let the exploration algorithm know that it is exploring a part of the
state space that is within an ARSI, we mark all edges within ARSIs as accepting.\mnote{The accepting label of an edge is a notation borrowed from (transition-based) Büchy automata.
In DIVINE this information can be used by the exploration algorithm to behave differently for accepting edges.}

\begin{widefigure}
\begin{cppcode}
    mutex m1, m2;
    {
      unique_lock l1(m1);
      do_work_1();
      {
        unique_lock l2(m2);
        do_work_2();
      } // unlock(m2)
    } // unlock(m1)
\end{cppcode}
\centering
\begin{tikzpicture}[node distance=0.6cm,>=stealth',bend angle=45,auto, baseline=(i.base)]

        \tikzstyle{state}=[circle, thick, draw, minimum size=5mm]
    \tikzstyle{endcheck}=[circle, thick, draw, cross out, minimum size=3mm]
    \tikzstyle{acc}=[line width=2pt]

    \node [state, initial, initial text=, initial above] (i) {};
    \node [state, below left = 0.7cm and 3cm of i] (cs1) {}
        edge [pre] node {\texttt{lock(m1)}} (i);

    \node [state, below right = 0.7cm and 1.5cm of i] (cs1a) {}
        edge [pre] node[above right] {\texttt{lock(m1)}} (i);

    \node [state, below = of cs1a] (cs1aw) {}
        edge [pre, acc] node[right] {\texttt{do\_work\_1}} (cs1a);
    \node [state, below = 0.6cm of cs1aw] (cs1acs2) {}
        edge [pre, acc] node[right] {\texttt{lock(m2)}} (cs1aw);
    \node [state, below = of cs1acs2] (cs1acs2w) {}
        edge [pre, acc] node[right, name=cs1acs2wl] {\texttt{do\_work\_2}} (cs1acs2);
    \node [state, below = 0.6cm of cs1acs2w] (cs1acs2u) {}
        edge [pre, acc] node[right, name=cs1aul] {\texttt{unlock(m2)}} (cs1acs2w);
    \node [endcheck, below = 0.6cm of cs1acs2u] (cs1au) {}
        edge [pre] node[right] {\texttt{unlock(m1)}} (cs1acs2u);

    \node [state, below = of cs1] (cs1w) {}
        edge [pre] node[right] {\texttt{do\_work\_1}} (cs1);

    \node [state, below left = 0.5cm and 1.1cm of cs1w] (cs1cs2) {}
        edge [pre] node[above left, name=cs1wl] {\texttt{lock(m2)}} (cs1w);
    \node [state, below = of cs1cs2] (cs1cs2w) {}
        edge [pre] node[right, name=cs1cs2wl] {\texttt{do\_work\_2}} (cs1cs2);
    \node [state, below = 0.6cm of cs1cs2w] (cs1cs2u) {}
        edge [pre] node[right] {\texttt{unlock(m2)}} (cs1cs2w);
    \node [state, below = 0.6cm of cs1cs2u] (cs1u) {}
        edge [pre] node[right] {\texttt{unlock(m1)}} (cs1cs2u);
    \node [state, below = of cs1u] (end) {}
        edge [pre] node[right] {\texttt{end}} (cs1u);

    \node [state, below right = 0.5cm and 1.1cm of cs1w] (cs1cs2a) {}
        edge [pre] node[above right] {\texttt{lock(m2)}} (cs1w);
    \node [state, below = of cs1cs2a] (cs1cs2aw) {}
        edge [pre, acc] node[right, name=cs1cs2awl] {\texttt{do\_work\_2}} (cs1cs2a);
    \node [endcheck, below = 0.6cm of cs1cs2aw] (cs1cs2au) {}
        edge [pre] node[right, name=cs1cs2aul] {\texttt{unlock(m2)}} (cs1cs2aw);


  \begin{pgfonlayer}{background}
    \tikzstyle{dashed}=[dash pattern=on 5pt off 5pt]
    \node[draw, fit=(cs1a) (cs1acs2u) (cs1aul), inner xsep=0.2cm, inner ysep=0.1cm, fill = yellow!70, xshift=-0.1cm, line width = 2pt] (cs1ains) {};
    \node[below left = 1mm of cs1ains.north east] {ARSI};
    \node[draw, dashed, fit=(cs1cs2a) (cs1cs2aw) (cs1cs2awl), inner sep=0.1cm, fill = yellow!70, line width = 2pt] (cs1cs2ains) {};
    \node[below left = 1mm of cs1cs2ains.north east] {ARSI};

    \node[draw, fit=(cs1) (cs1wl) (cs1cs2u) (cs1cs2aul), inner ysep=0.1cm, inner xsep=0.1cm] (cs1ins) {};
    \node[draw, dashed, fit=(cs1cs2) (cs1cs2w) (cs1cs2wl), inner sep=0.1cm] (cs1cs2ins) {};
    \node[draw, dashed, fit=(cs1acs2) (cs1acs2w) (cs1acs2wl), inner sep=0.1cm] (cs1acs2ins) {};
  \end{pgfonlayer}
\end{tikzpicture}
\smallskip
\caption{
    A small example of a program with two resource section instances (on the top) and its state space, which shows active resource section instances (ARSIs; on the bottom).
    In order to keep the state space simple, this example program is sequential and deterministic; the nondeterminism is caused only by the construction which gives rise to ARSIs.
    The resource section instances belonging to the critical section of mutex \texttt{m1} are wrapped in a solid rectangle in the image, while resource section instances belonging to \texttt{m2} are wrapped in a dashed rectangle.
    ARSIs are denoted by thick frame and yellow background and accepting edges in the state space are marked by thick arcs.
    Recall that active resource section instances cannot be nested.
    Crosses at the end of edges denote points where exploration of the state space was terminated due to reaching the end of an active resource section instance.
} \label{fig:active}
\end{widefigure}

An illustration of a state space graph augmented with nondeterministic choices
and accepting edges is given in \autoref{fig:active}. This augmentation of the state
space can be performed by program instrumentation.  Now to discover ARSIs which
are nonterminating according to \autoref{def:lnterm}, it is enough to
detect terminal strongly connected components made of accepting edges only.

\subsection{Detection Algorithm}

Henceforward, we assume the state space graph is finite, and if a run of the
program to be verified terminates then this fact is reflected by a state with
no successors in the underlying state space graph. Note that the program may
terminate even within a resource section instance.
An ARSI terminates either by
reaching the end of the section instance or by the termination of the whole
underlying program. In both cases, this means a state with no successors is
generated and reachable from the ARSI entrance point.
Finally, we assume that any waiting is implemented in a nonblocking way; in
particular, we require that waiting operations give rise to cycles in the state
space of the waiting thread.\mnote{This is not a problem in practice as any
    blocking synchronisation (such as waiting for a mutex) can be simulated by a
    busy waiting loop.}
As a result, the detection
of nonterminating ARSIs can be performed as a search for an accepting terminal
strongly connected component in the state space graph.


\begin{definition}[Terminal Strongly Connected Component]
A strongly connected component $S$ is \emph{terminal}\mnote{Also sometimes
called bottom strongly connected components, or closed communicating classes,
especially in the area of probabilistic system analysis~\cite{norris}.} if for
each state $v$ in $S$ all successors of $v$ are in $S$ (there are no edges out
of $S$).

\end{definition}

\begin{definition}[Fully Accepting Terminal SCC (FATSCC)]
A terminal strongly connected component of the state space is \emph{fully
  accepting} (\emph{fully accepting terminal SCC, or FATSCC}) if and only if it is nontrivial and all its edges are accepting.
\end{definition}

% \begin{lemma}\label{lem:active:atscc}
%     A FATSCC has to be part of exactly one active resource section instance (ARSI).
% \end{lemma}
% \begin{proof}
%     This follows directly from the construction of ARSIs in the state space: any edge which enters or leaves an ARSI is not accepting and therefore accepting edges occur only in ARSIs.
%     FATSCC has to be nontrivial, therefore it contains at least one edge which must be accepting and therefore the FATSCC has to be part of an ARSI, as there are no accepting edges outside ARSIs.
% \end{proof}

\begin{theorem}\label{thm:lnterm}
    A program contains a nonterminating resource section instance if and only if
    its state space graph contains a fully accepting terminal strongly connected component.
\end{theorem}
\begin{proof}
    Assume the program contains a nonterminating ARSI $\mathcal{A}$.
    Then there must exist a set of states in $\mathcal{A}$ from which neither program end nor the corresponding resource section end can be reached.
    Among these states, there must be a subset which can be repeated
    indefinitely and cannot be left -- a nontrivial terminal SCC which is part
    of an ARSI and therefore it is fully accepting -- a FATSCC in the state space.

    For the other direction, let us assume that there is a FATSCC in the state
    space graph. Since any edge which enters or leaves an ARSI is not accepting
    (which follows directly from the construction of the state space graph), all
    states that are part of the FATSCC must be states within a single
    ARSI. Since the component is terminal and nontrivial, it cannot be left.
    Furthermore, a program termination point cannot be part of the FATSCC as it
    has no successors and an ARSI end cannot be part the FATSCC as the edges going to
    it are not accepting. Therefore, it is impossible to reach either a program
    termination point or a state that would be outside of the resource section
    instance from the FATSCC, therefore, the FATSCC witnesses a resource section
    instance that does not terminate.
\end{proof}

To detect the presence of a FATSCC in the state space graph, we employ the
standard Tarjan's algorithm for finding strongly connected components.
To decide if an SCC is terminal, it suffices to check that there are no
edges going from it to any different SCC. Finally, to detect if a terminal
component is nontrivial and fully accepting it is enough to check that the
component contains at least one state with some successors (it is nontrivial) and
that all states of the component have only accepting outgoing edges (it is fully
accepting). These are minor modifications of the algorithm.
Furthermore, it is possible to extend the algorithm to also perform safety
checking while checking for nontermination -- when a new edge with an error label
is traversed, the exploration can be terminated immediately with a safety
counterexample.
This way, any need for separate safety checking is eliminated.

% \paragraph{Size of State Space}
% 
% The additional nondeterminism introduced by active resource section instances is bounded by the number of resource section instance starts in the state space of the original program.
% In the worst case, this can be proportional to the size of the original state space (giving us quadratic bound for the size of state space for nontermination checking with respect to the size of the original state space), but in reality the overhead is expected to be smaller.

%\paragraph{Global Nontermination}

Note that it is also possible to define \emph{global nontermination} using
\autoref{def:lnterm}. In this case, we only need to treat the whole
program as a single active resource section instance.

% If we do so, we can even
% avoid the introduction of extra nondeterminism, but global nontermination cannot
% be applied to checking that annotated parts of nonterminating programs
% terminate.


\subsection{Scheduling and Fairness} \label{sec:fairness}

% An idea for finding such resource section instances can be taken from considerations about fairness in parallel systems.
To provide further context, we also want to discuss the relation of our
nontermination property to LTL model checking with fairness.  Fairness
constraints~\mcite[Chapter 3.5]{Baier2008} are needed in the analysis of temporal
properties of parallel systems to avoid reporting of unrealistic
counterexamples, such as those in which an enabled thread never gets the chance
to make an action. However, even if we use LTL formula to
describe nontermination and allow for LTL model checking under weak fairness, we
still may obtain counterexamples that are totally unrealistic. This is because a
weakly-fair scheduler\mnote{For our purposes, a weakly-fair
  scheduler is a scheduler which ensures that on every accepting cycle in the
  state space all threads which existed (and were not blocked) during the
  execution of this cycle were also executed at least once on the cycle.}
admits runs in which the context switches that happen among participating
threads are very regular, hence unrealistic.

The nontermination as defined in \autoref{def:lnterm} can be seen as a
manifestation of an additional assumption about the thread scheduler. It
claims that the scheduler is in essence somehow irregular, i.e.,
it will not allow for a context switch always after a fixed number of
instructions or at a specific location in the code.  Another way of looking at
this is to assume that the scheduler is probabilistic and assigns some non-zero
probability to interruption between any two instructions. % \footnotemark.
With a probabilistic scheduler, we can equivalently define nonterminating
resource section instance as a section instance which can get to the point when
there is zero probability of reaching its end.
Under the probabilistic view, we can
also say that programs we denote as correct, i.e., without nonterminating
sections, have zero probability of looping forever.
% In practice, this would
% manifest as a program that runs for quite some time seemingly hanging and then
% terminates. % Nevertheless, we believe our definition of nontermination matches
% well with real-world applications and allows easy addition of nontermination
% checking to programs which are already tested for safety.

% \footnotetext{
%     In \divine it is possible to execute a sequence of instructions atomically using an atomic block.
%     These atomic blocks are used only in the implementation of synchronisation primitives in such a way that they do not need to be interleaved.
%     Therefore, from the point of scheduling, it is safe to treat atomic blocks in the same way as simple instructions.
% }

% The probabilistic view also gives us additional validation of our approach which
% uses terminal strongly connected components -- in probabilistic systems, it
% holds that the system will eventually end in one of its terminal strongly
% connected components.

\subsection{Implementation and Usage}

We have implemented our nontermination detection approach in a branch of the \divine model checker.
Resource sections can be specified by annotations in the source code
of the program to be analysed by the user of the tool.
Furthermore, \divine provides predefined resource sections for various POSIX
thread (\texttt{pthread}) synchronisation primitives, namely for
 mutexes (including recursive and reader-writer mutexes), %; both for critical sections and for waiting to enter them),
 condition variables, barriers, and joining of threads.
Since C++ threading support in \divine uses the libc++ library which uses POSIX threads, these resource sections are also used for native C++ threading.

User-defined annotations can be given in one of the following categories: exclusive section,
 waiting for an event, and
 waiting for function end.
For user-defined resource sections, \divine provides C and C++ interface which can be found on the web page accompanying this work.\mnote{\url{https://divine.fi.muni.cz/2019/lnterm}}
To make it possible to specify which resource section types should be considered
for analysis, we use program instrumentation, which enables resource sections
based on command line arguments (for more details see the accompanying web page).
The instrumentation also ensures that edges which are part of an ARSI are accepting.

The detection of nonterminating resource sections in \divine uses Tarjan's algorithm for finding strongly connected components.
The algorithm runs on-the-fly, which means that it generates the state space
graph as needed, and therefore, it can terminate before the entire state space graph is explored.
The algorithm finishes if it finds a fully accepting terminal strongly connected
component, if it discovers a safety error (to avoid the need for a separate
safety verification), or once the entire state space is explored.


\subsection{Interaction with Other Features of \divine}

Since \divine is a research tool, not all the features implemented within the
tool are expected to run together. In this case, there are some
features of \divine which interfere with local nontermination detection
in a not so obvious way.


\paragraph{Counterexamples}
%
When an error is found, \divine has support to show a counterexample and walk through it using an interactive simulator~\mcite{DIVINEToolPaper2017}.
For safety properties, this counterexample is a sequence of states which ends with an error.
For verification of properties described by LTL or Büchi automata (which are partially supported by \divine), the counterexample is a lasso-shaped trace.
For nontermination, the  part of the state space to be reported consists of a fully
accepting terminal strongly connected component and a path that leads to it.
However, it is not practical to output the information about the whole SCC, as it can be large.
For this reason, \divine gives only a trace to the first state of the FATSCC (i.e., the first state from which end of the given resource section instance is not reachable).

\paragraph{Spurious Wakeups}
%
Condition variables are often used in parallel programs to block threads until some event occurs (e.g., a shared queue becomes non-empty).
They provide a function which blocks the current thread (\texttt{wait}) and a function which signals the condition variable and causes waiting threads to proceed (\texttt{signal}).
In most implementations, including C++ standard APIs and platform-specific APIs on Windows and Linux, \texttt{wait} is allowed to return before it is signalled: this behaviour is called \emph{spurious wakeup} and programmers must take it into account when using condition variables.

To help with the discovery of bugs caused by spurious wakeup, \divine simulates
spurious wakeup using nondeterministic choice.
For nontermination detection, it is necessary to ensure that any spurious wakeup does not hide nontermination -- we want to report resource section instances which can be only left by spurious wakeup as nonterminating.
This can be done by careful implementation of the \texttt{wait} function in
\divine{} -- it first nondeterministically decides if a spurious wakeup will
happen, and then, if it is not happening, it enters resource section which waits
for \texttt{signal} and cannot be woken up spuriously.
If the spurious wakeup is simulated, it behaves as if the thread was blocked and allows other threads to run.
Once the waiting thread is used again for generation of successor states, it is unblocked and \texttt{wait} returns spuriously.
The exhaustive enumeration of possible thread interleavings ensures that other threads can run arbitrarily long.

\paragraph{Data Nondeterminism and Symbolic Data}
%
To make it possible to verify programs that depend on input data, \divine has support for symbolic values~\mcite{LRB2018}.
In an analysis of programs with symbolic values, the computation can be split when a branch depends on a symbolic value.
This splitting can cause problems for nontermination detection if leaving some resource section instance requires a particular value of an input variable.
Therefore, in the presence of symbolic data, nontermination checking might miss some instances of nontermination.
We defer this problem to future work.

\paragraph{Relaxed Memory Models}
%
\divine has support for analysis of parallel programs under the x86-TSO memory model of Intel and AMD CPUs (\mcite{SB2018x86tso} and \autoref{chap:mm}), which allows the program to exhibit behaviour not present under the interleaving semantics of threads.
One of the main problems in interaction between nontermination and relaxed memory is that relaxed memory models over-approximate the possible behaviours of the system to cover all possibilities of contemporary and presumably also future processors of a given architecture.
As nontermination is checking for \emph{absence of termination}, it can spuriously hide nontermination if the state space of the program is over-approximated.
Again, we defer this problem to future work.

\section{Evaluation} \label{sec:evaluation}

\pgfplotsset{ylabel style = {yshift = -2mm}}
\begin{figure}[t]
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [s]}, ylabel={local nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
                   xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
                   title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark size=2pt ]
         table [x=safety, y=local, meta=sl] {wall.dat};

\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\hfill%
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [MB]}, ylabel={local nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
                   xmin = 385, ymin=385, xmax=16384, ymax=16384,
                   title = Memory Used (megabytes)]
\addplot [ scatter, only marks,mark size=2pt ]
         table [x=safety, y=local, meta=sl] {memory.dat};

\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\caption{
Scatter plots which compare local nontermination detection with safety checking as implemented in \divine.
Both axes use a logarithmic scale.
The dashed and dotted lines in wall time graphs signify $10 \times$ and $100 \times$ difference respectively.
For graphs of memory usage, the dotted lines signify $3 \times$ difference and the dashed $10 \times$ difference.
Green squares correspond to benchmarks which were error-less in both modes and blue circles correspond to benchmarks which contained errors in both cases.
Red triangles correspond to benchmarks which contained a nonterminating section.
The crosses on the outer edge of the plot correspond to timeouts and out-of-memory errors.
All the failures for local/global nontermination were due to timeouts, benchmarks which failed with out-of-memory did so in all cases.
}
\label{fig:eval}
\end{figure}

\begin{figure}[tp]
\centering
% \newcommand{\ct}{\multicolumn{1}{c}}
% \newcommand{\ctv}{\multicolumn{1}{c|}}
% \newcolumntype{R}{>{\raggedleft\arraybackslash}X}
% \caption{}\label{tab:eval}
% \begin{tabularx}{\textwidth}{ll|R|R|R|R}
% & & \ctv{Count} & \ctv{Wall Time} & \ctv{Memory Used} & \ct{State Space} \\
% & & \ctv{}      & \ctv{[s]}       & \ctv{[MB]}        & \ct{} \\
% \hline
% \multirow{3}{*}{Boost Thread} & safety & 32    & 0      & 0      & 0     \\
%                               & local  & 29    &       &       &      \\
%                               & global & 32    &       &       &      \\ \hline
% \multirow{3}{*}{LockFree}     & safety &  6    &       &       &      \\
%                               & local  &  6    &       &       &      \\
%                               & global &  6    &       &       &      \\ \hline
% \multirow{3}{*}{\divine}      & safety & 16    &       &       &      \\
%                               & local  &  7    &       &       &      \\
%                               & global & 16    &       &       &      \\ \hline
% \multirow{3}{*}{all}          & safety & 54    &       &       &      \\
%                               & local  & 42    &       &       &      \\
%                               & global & 54    &       &       &
% \end{tabularx}
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={safety [\# of states]}, ylabel={local nonterm. [\# of states]}, width=0.49\textwidth, height=0.49\textwidth,
                   xmin = 1, ymin=1, xmax=3200000, ymax=3200000,
                   title = Number of states]
\addplot [ scatter, only marks,mark size=2pt ]
         table [x=safety, y=local, meta=sl] {states.dat};

\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\caption{
    A comparison of state space sizes for local nontermination and safety.
    The dashed and dotted lines signify $10 \times$ and $100 \times$ difference respectively.
    The meaning of the marks in the graph is the same as in \autoref{fig:eval}.
} \label{fig:evalst}
% 
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={safety [\# of states]}, ylabel={local nonterm. [\# of states]}, width=0.49\textwidth, height=0.49\textwidth,
%                    xmin = 1, ymin=1, xmax=3200000, ymax=3200000,
%                    title = Number of states]
% \addplot [ scatter, only marks,mark size=2pt ]
%          table [x=global, y=safety, meta=sl] {states.dat};
% 
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
\end{figure}


To our best knowledge, there is no suitable benchmark set that would cover
termination in parallel programs.
Therefore, we had to develop a suitable
benchmark on our own. We naturally wanted to analyse the performance of our
verification method on real-world data structures. Unfortunately, it is hard to
reuse any existing real-world test cases of parallel data structures for
verification, as these tests are usually developed as stress tests. Stress
tests use large amounts of data and are supposed to be run for a long time in
order to maximise a chance that a parallelism-related bug is found during the
testing period. For the purpose of application of a formal verification tool such
as \divine, the mentioned approach to testing of parallel programs is
inappropriate. Since a model checker explores all interleavings of the program
systematically within a single execution, further repeated executions, such as
the ones within a stress test, are useless and only add to the complexity of
the verification task. For these reasons, the tests we included in our benchmark
are tests we created or adapted and modified specifically for the purpose of
nontermination detection we wanted to evaluate.

To preserve some diversity at least, we opted for the following tests to be
included in our benchmark.  First, to cover some real-world scenarios, we
created some tests for the Thread library from widely used C++
Boost\mnote{\url{https://www.boost.org/doc/libs/1_69_0/doc/html/thread.html}}
(35 test cases). Second, we used some tests from \divine project itself (8 test
cases), and finally, we developed a couple of specific tests for small programs
demonstrating the behaviour of local nontermination with various synchronisation
primitives (16 test cases). Overall, the benchmark covered usage of lock-free and
mutex guarded parallel data structures (e.g. parallel queues), synchronised
variables, less-used synchronisation primitives such as reader-writer locks, or
a single-producer-single-consumer queue and the parallel hashset from
\mcite{BRSW15HS}.

To evaluate our verification approach, we let each test run with a 4 hours
timeout and 16 GB memory limit. We measured runtime and memory requirements for
the three following configurations of our tool:
\begin{description}
\item[safety] A baseline configuration, in which the tool merely generates the
  state space of the program and checks for the standard safety issues, such as
  assertion violation, invalid memory access, etc. In this mode, no
  nontermination can be detected.
    \item[local nontermination] The configuration in which the nonterminating
      resource section detection is used. Under this configuration, the state
      space of the original program is expanded with every entrance to the
      resource section as described in \autoref{sec:local-nontermination}.
    \item[global nontermination] The configuration that treats the whole program
      as a single resource section and detects if it terminates according to
      \autoref{def:lnterm}. Since this configuration does not introduce
      additional nondeterminism, the state space of the program is roughly the
      same size as for \emph{safety}.
\end{description}
The difference between local and global nontermination configurations is
in the shape of the state space; both use the same algorithm (Tarjan's
algorithm for SCC decomposition). Thanks to this difference, local nontermination
can be applied to programs which should not terminate, to check if each of its
resource sections terminate.

Comparison of safety and local nontermination can be seen in \autoref{fig:eval}.  We
evaluate wall time and memory consumption -- in practice, heavy-duty tools like
\divine are likely to be used in long-running overnight tests (preferably only
if anything relevant for the test changed since the last run), therefore longer
runtimes might not be a big problem up to some point, but it is important to
test that the verification tasks fit in some reasonable amount of memory.  As we
can see, the time overhead of local nontermination configuration is quite
significant (up to $59 \times$) especially for larger programs which are
correct. As for memory consumption, we can see that total overhead is less than
threefold, which is mostly due to the state space compression employed by
\divine.

The wall time blow-up is due to extra nondeterminism introduced by active
resource sections -- the state space can grow by a factor that is related to the
number of resource section instances encountered in the original state
space. Note that many resource sections are likely to be very short. For
programs that were invalid, i.e., contained some nonterminating resource
sections, the verification usually exited faster under the local
nontermination configuration than under the safety configuration, which
means that once a nonterminating section is encountered, it is checked
relatively quickly. Further insight into the comparison of safety and local
nontermination can be seen in \autoref{fig:evalst}, which compares sizes of state
spaces for these two configurations. Here, we can see that the overhead in the
size of the state space is lower than the time overhead (less than $10 \times$).
The extra time overhead is likely caused by inefficiencies in \divine{}.  For
example, when \divine nondeterministically chooses from $N$ values, it will
re-execute instructions between the last remembered state and the point of the
nondeterministic choice $N$ times.

\begin{figure}[tbp]
\begin{tikzpicture}
\begin{loglogaxis}[xlabel={global [s]}, ylabel={local nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
                   xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
                   title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark=*,mark size=2pt ]
         table [x=global, y=local, meta=gl] {wall.dat};

\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
\hfill %
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={global [MB]}, ylabel={local nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
%                    xmin = 385, ymin=385, xmax=16384, ymax=16384,
%                    title = Memory Used (in megabytes)]
% \addplot [ scatter, only marks,mark=*,mark size=2pt ]
%          table [x=global, y=local, meta=gl] {memory.dat};
% 
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
% %
% \medskip\\
\begin{tikzpicture}
\begin{loglogaxis}[ xlabel={safety [s]}, ylabel={global nonterm. [s]}, width=0.49\textwidth, height=0.49\textwidth,
                   xmin = 0.8, ymin=0.8, xmax=14400, ymax=14400,
                   title = Wall Time (seconds)]
\addplot [ scatter, only marks,mark=*,mark size=2pt ]
         table [x=safety, y=global, meta=sg] {wall.dat};

\addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 10) (10e6, 1000e6) };
\addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
\addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (10, 0.1) (1000e6, 10e6) };
\end{loglogaxis}
\end{tikzpicture}
% \hfill %
% \begin{tikzpicture}
% \begin{loglogaxis}[xlabel={safety [MB]}, ylabel={global nonterm. [MB]}, width=0.49\textwidth, height=0.49\textwidth,
%                    xmin = 385, ymin=385, xmax=16384, ymax=16384]
% \addplot [ scatter, only marks,mark=*,mark size=2pt ]
%          table [x=safety, y=global, meta=sg] {memory.dat};
% 
% \addplot [ black, sharp plot, thin, update limits=false ] coordinates { (0.1, 0.1) (10e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (0.1, 1) (10e6, 100e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.1, 0.3) (10e6, 30e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dashed ] coordinates { (1, 0.1) (100e6, 10e6) };
% \addplot [ black, sharp plot, thin, update limits=false, dotted ] coordinates { (0.3, 0.1) (30e6, 10e6) };
% \end{loglogaxis}
% \end{tikzpicture}
\caption{
The first scatter plot compares local nontermination checking with checking if the whole program terminates (global nontermination).
In this comparison, the red triangles correspond to benchmarks which did not end, but for which all resource sections terminated.
Finally, in the second graph, we compare global nontermination checking with safety.
Here, the red triangles correspond to benchmarks which did not terminate but were safe.
See \autoref{fig:eval} for the general description of the plot layouts.
} \label{fig:eval2}
\end{figure}

\autoref{fig:eval2} shows a comparison of local nontermination with global nontermination and safety with global nontermination.
Here, we can see that global nontermination behaves similarly to safety, with some time overhead caused by the somewhat more involved algorithm.
This is well in line with our expectations, as global nontermination does not introduce any extra nondeterminism compared to safety and Tarjan's algorithm runs in linear time with respect to the size of the state space, and so does reachability.
This further highlights that the overwhelming part of the time overhead of local
nontermination is in the increase of the state space size.
It is important to note that local nontermination can be applied to programs
which are intended to run infinitely (but have finite state space) -- it can detect if there is a nonterminating resource section in such a program.
As state space sizes and memory consumption are almost the same for safety and for global nontermination, we omit memory and state space size comparisons for the later two pairs of configurations.

\paragraph{Errors Found}
%
No errors were found in the C++ Boost tests. On the other hand, all the errors
we artificially implanted in the test cases were found.  As for the errors which
were not deliberately introduced in the tests, we have found one error in a test
of a lock-free queue from an older version of \divine.  The test was part of
\divine's test suite for a long time and was used to test that the queue works
when it is continuously fed with elements while keeping its size bounded.  This
means that the test was deliberately nonterminating and the intention was that
all the operations executed by main loops of the test's two threads terminate,
which was not the case -- a variable which was supposed to keep track of the
size of the queue was not maintained properly, and therefore it could have happened
that the reader thread would wait indefinitely, attempting to read from an empty
queue which would never fill up. So far, the test case was run under \divine with
the safety algorithm only, therefore, the error did not manifest and remained
undetected.


\section{Related Work} \label{sec:related}

For the related work, we consider only results which go beyond safety checking.
There are many approaches to find problems such as assertion violations or memory safety violations, but they are often fundamentally limited to properties concerning finite runs of the program, and we are focusing here on an infinite behaviour, namely on the absence of termination.
Similarly, we do not explore in depth techniques which specialise on checking sequential programs and have no support for parallelism, as well as techniques which are tailored to a specific modelling language and cannot be applied in general.

Several techniques for checking properties other than safety exist -- indeed
usage of various temporal logics, such as Linear Temporal Logic (LTL)
\mcite[Chapter 5]{Baier2008} and Computation Tree Logic (CTL) \mcite[Chapter 6]{Baier2008} in the context of model checking dates way back to the beginning of the research of formal methods.
Unfortunately, these techniques are not often applied to programs written in real-world programming languages such as C and C++.

As for techniques which detect nontermination, both static and dynamic
techniques exist for the detection of deadlocks caused by circular waiting for
mutexes~\mcite{CC14,agarwal2010detection,bensalem2005scalable}.
These techniques specialise on mutexes and do not allow general nontermination detection, and it is unlikely that they could be naturally extended to cover it.
Some methods detect deadlocks of the whole program (i.e., a program state from which the program cannot move) \mcite{Chaki2005,Demartini99}, but these techniques cannot find cases in which only some threads of the program are making progress, while other threads are blocked forever.
Also, these global deadlock detection techniques are inadequate in the presence of synchronisation mechanisms which causes busy waiting instead of blocking (for example spin locks) or in the cases when normally blocking operations are implemented using busy waiting (which can be easier to handle for the verifier in some cases).
A somewhat different approach based on communicating channels is proposed in \mcite{Ng2016}, but this approach is aiming at the Go programming language which primarily uses shared channels for communication between threads.
Overall, neither of these techniques is applicable in general for the detection of nontermination in programs which use a combination of synchronisation primitives in shared memory.
There are also techniques for checking termination of sequential programs \mcite{Cook2006,Berdine2006,Dietsch2015,Brockschmidt2016,Hensel2017,Giesl2017} and techniques for termination analysis in parallel programs that target modelling languages \mcite{Kupriyanov2014,Albert2017}.

A more realistically-targeted termination checker for parallel programs is presented in \mcite{Cook2007thr}.
Its primary goal is to show a given thread terminates.
It abstracts other threads into a model of the environment which is incrementally refined and attempts to prove termination of the thread locally.
It appears to support C programs, but it assumes the number of threads is fixed and it does not support synchronisation with mutexes and similar primitives.
Another tool for termination-checking parallel C programs is presented in \mcite{Popeea2012}, it is similar to \mcite{Cook2007thr} but more general.
% rely-guarantee reasoning -- keeps track of communication of each thread with the rest of the program.
It uses transition invariants and well-foundedness to prove termination of whole programs or individual threads and uses predicate abstraction and refinement.
The downside of this approach is that it can report unfair paths as termination counterexamples.

Probably the closes relative to our approach is shown in \mcite{Atig2012term}.
It presents a tool MUTANT that can detect fair nontermination.
It aims to identify ultimately periodic executions (i.e., executions in which the same sequence of states is repeated, this sequence can be preceded by a finite stem) and therefore is complete only in programs with finite state space.
Furthermore, the number of context switches in the stem and each of the periods is bounded.
The tool uses sequentialisation -- it reduces the problem of (context-bounded) termination checking of a parallel program to assertion checking in a sequential program.
It does not target any realistic programming language; however, it targets the Boogie modelling language for which translators from C and other languages exist.
Therefore, it might be possible to use this tool even for C and other programming languages.
The main difference between our approach and the one in \mcite{Atig2012term} is that our approach uses a different notion of fairness, targets primarily C and C++, does not impose any limits on the number of context switches, and can detect local nontermination.

\section{Conclusion} \label{sec:conclusion}

We have presented a novel approach to detection of parts of real-world programs written in C and C++ which do not terminate.
Our method allows for detection of partial deadlocks (and livelocks) caused by misuse of synchronisation, but it is not limited to any particular mode of parallel programming (such as lock-based synchronisation, or programs with communication channels) and indeed allows any combination of synchronisation allowed by C++ itself.
To achieve this, it is necessary to provide simple annotations for parts of the code which are to be checked for termination.
Our implementation in the \divine model checker ships with these annotations already prepared for verification of programs which use C++ blocking synchronisation primitives (mutexes, condition variables), or similar synchronisation primitives from the POSIX threads library (\texttt{pthreads}).
Due to the universality of these synchronisation primitives, our annotations allow for checking of most programs which use blocking synchronisation out of the box.
For lock-free programs, or custom synchronization primitives, users have to annotate functions or blocks of code which are required to be exited once they were entered.

We have implemented our technique in an open-source model checker \divine and evaluated it on a set of benchmarks including our tests of the Thread library from widely used C++ Boost.
The evaluation shows that while the time overhead of local nontermination checking can be quite significant (up to $59\times$ compared to safety checking on our benchmarks), the memory overhead is quite modest (under $3\times$).
During the evaluation, we have discovered a hidden bug that remained in the code for a couple of years, even though the code was subject to intensive safety checking.

Our technique enables checking nontermination in parallel programs, including
detection of partial deadlocks and livelocks.
It also supports detection of cases when infinitely-running programs contain
sections which are supposed to terminate but do not terminate.
We believe that even the overhead shown in our evaluation is worth paying for the
additional guarantees over safety checking.
While related to verification of properties written in temporal logics such as CTL*, our technique cannot be subsumed into CTL* verification, as CTL* cannot quantify over objects which can be created while the program runs.

For future work, it is crucial to further investigate interactions between nontermination checking and relaxed memory, and nontermination and symbolic data representation, as the presence of either of these features can lead to programs being reported as terminating even if they are not in the current situation.
Nevertheless, even in the presence of relaxed memory or symbolic data, any reported nonterminating section of the program is indeed a case when the program cannot proceed past the given point.
We would also like to investigate better algorithms for detection of local nontermination that might avoid adding nondeterminism to the program under analysis.

% vim: colorcolumn=80 expandtab sw=4 ts=4 spell spelllang=en