Alias Analysis.tex

\newpage

\section{Alias Analysis\footnote{based on \cite{ShengHsi98:online}}}

The purpose of alias analysis is to determine all possible ways a program may access some given
memory locations. A set of pointers are said to be in an alias group if they all point to the same
memory locations.  Alias analysis is very important in compiler theory. Some of its most notable applications
include code optimization and security. Compiler level optimization needs pointer aliasing information to perform dead code elimination (removing code that does not affect the program’s
result), redundant load/store instruction elimination, instruction scheduling (rearranging instructions) and more. Program security enforcement at compiler level uses alias analysis to help detect
memory leaks and memory related security holes.


\subsection{Clients of Alias Analysis}

Alias Analysis is not an optimization in itself, i.e., once alias analysis is run, it does not make the
program run faster. A Program transformation called Client need to query alias information from
alias analysis to create an optimized version of the program. Also, there are some other analysis
that don’t alter the code, but make use of alias analysis to compute better dataflow information
which can be used by other transformation. Examples of transformation clients include, Automated
bug correction, Parallelization, Common subexpression elimination. Examples of analysis clients
include, Slicing, Shape analysis.

\subsection{Types of Alias analysis}

There are many varieties of alias analysis. They are often categorized by properties such as
field-sensitivity, inter-procedural v.s. intra-procedural, context-sensitivity and flow-sensitivity.


\subsubsection{Field-Sensitivity}

Field-sensitivity is the strategy that governs the way alias analysis models fields in built-in or
user defined data structures. There are three approaches to field-sensitivity – field-sensitive,
field-insensitive and field-based. Consider the following code:

\lstinline[language=C]|struct { int a , b ; } x , y ;|


\begin{itemize}
\item Field-sensitive approach models each field of each struct variable, hence creating four nodes
(we use node to denote a pointer, variable or memory location)  \lstinline[language=C]|x.a, x.b, y.a and y.b.|

\item Field-insensitive approach models each struct variable, but does not model their fields. This
example is modeled by two nodes  \lstinline[language=C]|x.* and y.*.|

\item Field-based approach models each field without modeling the struct variables. This example
is modeled by two nodes  \lstinline[language=C]|*.a and *.b.|


\end{itemize}    

The same principle applies when dealing with arrays. Consider a C integer array int \lstinline[language=C]|a[10]|
Field-insensitive approach models this with only one node: a[*], while field-sensitive approach
creates ten nodes: \lstinline[language=C]|a[0], a[1], ..., a[10].|
Clearly a field-sensitive approach provides a more fine grained model and hence better precision. However, the number of nodes increases rapidly when there are nested structs and/or
arrays.


\subsubsection{Intra-Procedural v.s. Inter-Procedural}

An intra-procedural alias analysis analyzes the bodies of each functions. It does not consider
how each function interact with other functions. Specifically, intra-procedural alias analysis does
not handle pointer parameter passing or functions that return pointers. On the contrary, interprocedural alias analysis deals with the pointer behaviors due to function calls.

A pseudo-code where pointer parameter passing is involved:

\begin{lstlisting}[language=C]
void fn1(int * p) { p = ...}
void fn2 () { int * q ; fn1 ( q ) ; }
\end{lstlisting}


A pseudo-code where function is returning pointers:
\begin{lstlisting}[language=C]
    int * id ( int * p ) { return p ; }
    void fn () {
        int * q ;
        q = id ( q ) ;
    }
\end{lstlisting}

Intra-procedural is less expansive to perform, but has lower precision. It is often easier to
implement an intra-procedural alias analysis before extending to inter-procedural alias analysis.
Intra/inter-procedural property is highly related to context-sensitivity since a context-sensitive
analysis has to be an inter-procedural analysis.


\subsubsection{Context-Sensitivity}
Context-sensitivity governs how function calls are analyzed. This property yields two types of
alias analyses ? context-sensitive and context-insensitive alias analyses. Context-sensitive analysis
considers the calling context (caller) when analyzing the target of a function call (callee). Consider
the following code:


\begin{lstlisting}[language=C]
     int a , b ;
     int * x ;
    
     void f ( void ) { * x ++; }
    
     void main () {
        x = & a ;
        f () ;
        
        x = & b ;
        f () ;
     }
\end{lstlisting}


In this code, function-f is called twice. It increases the value of variable-a the first time it
was called, and increases the value of variable-b the second time it was called. A context-sensitive
alias analyzer needs to have a way to create an abstract description for function-f so that every
time it is called, the analyzer can apply the calling context to the abstract description.
Context-sensitive provides a finer grain model of the static code hence results in higher precision. However, it increases the complexity of the analysis.


\subsubsection{Flow-Sensitivity}


Flow-sensitivity is the principle that governs wether or not an analysis takes the order of code
into account. There are flow-sensitive and flow-insensitive analyses.
A flow-insensitive analysis produces one set of alias result for the entire program it analyzes.
This result is the sets of memory locations that pointers may points to at any point of the program.
It does not consider the order of the code. A flow-sensitive analysis computes alias information
at every point of the program. Consider the following code:


\begin{lstlisting}[language=C,numbers=left,
    numberstyle=\small\color{gray}]
    int a , b ;
    int * p ;
    p = & a ;
    p = & b ;
\end{lstlisting}


The result of a flow-insensitive analysis would be: pointer-p may points to variable-a or
variable-b. A flow-sensitive analysis is capable to determine that between line 3 and line 4,
pointer-p points to variable-a, and after line 4, pointer-p points to variable-b.
Notice that the complexity of flow-sensitive analysis increases tremendously when a program
has many conditional statements, loops or recursive functions. A complete control flow graph
is required in order to perform flow-sensitive analysis. Therefore flow-sensitive analysis is much
more precise, but is too expansive for most cases to perform on a whole program.


\subsection{Alias Analysis Algorithms}


\subsection{Alias Analysis in LLVM}


The LLVM AliasAnalysis class  is the primary interface used by clients and implementations
of alias analyses in the LLVM system. This class is the common interface between clients of alias
analysis information and the implementations providing it, and is designed to support a wide range
of implementations and clients (but currently all clients are assumed to be flow-insensitive). In
addition to simple alias analysis information, this class exposes Mod/Ref (Modified/Referenced)
information from those implementations which can provide it, allowing for powerful analyses and
transformations to work well together.

\subsubsection{Alias Analysis Class Overview in LLVM}

The AliasAnalysis class defines the interface that the various alias analysis implementations should
support. This class exports two important enums: AliasResult and ModeRefResult which represent
the result of an alias query or a mod/ref query, respectively


\vspace{1\baselineskip}
\textbf{\large Methods in AliasAnalysis Class:}


\vspace{1\baselineskip}


\textbf{\large The alias method}


\vspace{1\baselineskip}

The alias method is the primary interface used to determine whether or not two memory objects alias
each other. It takes two memory objects as input and returns MustAlias, PartialAlias, MayAlias,
or NoAlias as appropriate.

\vspace{1\baselineskip}

\textbf{\large The getModRefInfo methods}

\vspace{1\baselineskip}


The getModRefInfo methods return information about whether the execution of an instruction can
read or modify a memory location. Mod/Ref information is always conservative: if an instruction might read or write a location, ModRef is returned. The AliasAnalysis class also provides a
getModRefInfo method for testing dependencies between function calls.


\textbf{\large Other useful AliasAnalysis methods}


% \vspace{1\baselineskip}

\begin{itemize}


 \item The pointsToConstantMemory method: This method returns true if pointer only points to
unchanging memory locations (functions, constant global variables, null pointer).
 \item The doesNotAccessMemory method: This method check whether function reads or writes to
memory. If it never read/write to memory then it returns true.
 \item The onlyReadsMemory method: This method returns true for a function if analysis can prove
that the function only reads from non-volatile memory.

\end{itemize}


\subsubsection{Existing alias analysis implementations and clients in LLVM}


\textbf{\large Available AliasAnalysis}

\begin{itemize}
 \item The -no-aa pass: An alias analysis that never returns any useful information. this pass can
 be useful if you think that alias analysis is doing something wrong and are trying to narrow
 down a problem
 \item The -basicaa pass: This pass is an aggressive local analysis that knows many important facts.
 such as different fields of a structure do not alias. This alias analysis is local per function and
 depends on a series of heuristics to determine which pointers alias. For this analysis,
 distinct global variables, local variable declarations, and heap memory can never
 alias. Additionally, such values never alias the null pointer. Similarly, differing
 structure fields and array references that are statically different do not alias. Some
 C standard library functions are assumed to either never access program memory, or
 only access read-only memory. Pointers that refer to constant global values, such as
 strings, are said to point to constant memory. Finally, function calls cannot access
 local variables that never escape from the function that allocates them.\cite{AnEmpiri85:online}
 \item The -globalsmodref-aa pass: This pass implements a simple context-sensitive mod/ref and
 alias analysis for global variables.
 \item  The -scev-aa pass: This pass implements AliasAnalysis queries by translating them into
 ScalarEvolution queries.
    
\end{itemize}    


% \subsubsection{}
% -basic-aa is the Alias analysis in LLVM. Basic-aa is a rule based alias analysis. It uses the following
% simple but important rules to compute alias information:
% \begin{itemize}
%      \item Distinct globals, stack allocations, and heap allocations can never alias.
%      \item Globals, stack allocations, and heap allocations never alias the null pointer.
%      \item Different fields of a structure do not alias.
%      \item Indexes into arrays with statically differing subscripts cannot alias.
%      \item Many common standard C library functions never access memory or only read memory.
%      \item Pointers that obviously point to constant globals “pointToConstantMemory”.
%      \item Function calls can not modify or references stack allocations if they never escape from the
%     function that allocates them (a common case for automatic arrays).
% \end{itemize}