Analyzing Programs with SMT Solvers
acronyms are fun
SAT : Boolean satisfiability problem
classic NP-complete problem
SMT : SAT modulo theories
SBV : SMT-based verification
Haskell library by Levent Erkok
SAT problem: find satisfying assignment for Boolean formula
Boolean formula : a bunch of true/false variables with $∧$ ,
$∨$ and $\lnot$ :
$$ (a ∨ \lnot b) ∧ (\lnot a ∨ c ∨ d ∨ \lnot e)
∧ (b ∨ \lnot d) $$
satisfying assignment : values for variables such that the
whole formula is true
NP-complete
naïve solution: backtracking
best solution: backtracking
huh?
solvers are fast for real world instances
recent improvements $⇒$ fast enough for some practical purposes
SMT: extend SAT with theories
theory : new type of variable—not Boolean
bitvectors (0001010101)
unbounded integers (ℤ)
unbounded arrays
algebraic numbers (almost like ℝ)
floating-point numbers (not entirely unlike ℝ)
formula still has to be a Boolean
new types $⇒$ new constraints, operations
pretty much what you’d expect:
$$ (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
solves to: $$x = 4, y = -3$$
not just arithmetic
“flow” control:
different theories have:
bitwise operations
signed and unsigned operations
array indexing
sometimes, we cannot solve a formula
$$ (x^2 + y^2 = 42) ∧ (3x + 4y = 0) $$
unsatifiable
this works even over unbounded integers !
however, the solver can also
take too long
return Unknown
more explicit equation from before:
$$ ∃ x. ∃ y. (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
compare to:
$$ ∀ x. ∀ y. (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
Unsatisfiable; can find counterexample :
$$x = 23, y = 0$$
Again but in English Haskell
SBV: Haskell DSL for specifying formulas
mostly ASCII-ifies what we’ve seen before:
x^ 2 + y^ 2 .== 25 &&& 3 * x + 4 * y .== 0
reuses Num
and Bits
but not Eq
or Ord
has own class for Boolean-like things
Everything is better with types
symbolic versions of normal Haskell types
SInteger
for Integer
SWord32
for Word32
; unsigned bitvectors
SInt32
for Int32
; signed bitvectors
SBool
for Bool
SArray
for unbounded arrays
formulas as Haskell functions:
formula :: SInteger -> SInteger -> SBool
formula x y =
x^ 2 + y^ 2 .== 25 &&& 3 * x + 4 * y .== 0
run with quantified variables:
λ> sat . forSome [" x" , " y" ] formula
Everything is better with types monads
handy monad for composing formulas
better way to manage variables
formula =
do x :: SInteger <- exists " x"
y :: SInteger <- exists " y"
constrain $ x^ 2 + y^ 2 .== 25
return $ 3 * x + 4 * y .== 0
note: ScopedTypeVariables
extend to three sets of variables:
$$ φ(input, program, output) $$
fix $input, program$ : interpreter
fix $program, output$ : reverse interpreter
fix $input, output$ : synthesizer
also: check arbitrary invariants
encode program state with single static assignment SSA
transform x = x + y
to:
constrain $ x_2 .== x_1 + y_1
x_1
, y_1
…etc: existentially quantified
operational semantics $⇒$ constraints
interpreter $⇒$ formula compiler
let’s consider very simple language: IMP
all variables 32 bit ints
small number of imperative constructs
arithmetic just gets encoded as arithmetic!
x + 1
becomes… x_n + 1
keep track of the “current” step # n
operations are signed , so we use SInt32
remember: this is the theory of bitvectors
for assignment (x = x + y
), we just use SSA
constrain $ x_< n + 1 > .== x_n + y_n
note how expression x + y
got compiled
IMP just has if-then-else
gets compiled to ite
if x > 5 then y = 1 else y = 2
:
constrain $ ite (x_n .> 5 )
(y_< n + 1 > .== 1 )
(y_< n + 1 > .== 2 )
one statement follows another
only trick: remember to increment n
x = x + y; x = x * 2
constrain $ x_2 .== x_1 + y_1
constrain $ x_3 .== x_2 * 2
basically works out to >>
this is the trickiest part
SMT solver cannot handle recursion
must finitize loops
unroll some arbitrary number of times
fail if that is not enough
later: rerun formula multiple times, unrolling more and more
while x < 5 do x = x + 1
unrolls to:
if x < 5 then x = x + 1; <while> else skip
after we reach our max depth, <while>
becomes false
convert nested if-statements as before
result: long, ugly formula full of ite
’s
we can run our code (slowly)
we can run our code backwards
we can check invariants over our program
we can verify two functions are equivalent
we can synthesize programs
whatever else you can imagine
forwards; solve for outputs:
constrain $ x_0 .== 10
constrain $ y_0 .== 20
backwards; solve for inputs:
constrain $ x_20 .== 10
constrain $ y_20 .== 20
easy to encode invariants
for example “x is always positive”:
constrain $ x_1 .> 0
&&& x_2 .> 0
&&& x_3 .> 0 ...
universally quantify inputs:
x :: SWord32 <- forall " x"
search for inputs where programs disagree
assert outputs are not equal
constrain $ p1_x_20 ./= p2_x_20
constrain $ p1_y_20 ./= p2_y_20
existentially quantify inputs
unsatisfiable = verified
satisfiable = counterexample
add variables for each part of the program
select what to do with ite
fix inputs and outputs
also solve for literals (1
, 2
, 3
…)
this gives you a program correct only over your fixed inputs and outputs
start with random input/output pair
synthesize
verify result
if correct: we’re done
if wrong: verifier returns new input
each new input is a new corner case!
good fit for low-level code (ie assembly)
use techniques to optimize Haskell
maybe take advantage of algebraic laws?
implement simple verified language
check out ImProve on Hackage
design interactive tools
intelligent debugging
maybe use for education?
sbv package by Levent Erkok
synthesis ideas mostly from:
Ras Bodik at the Berkeley ParLab
especially slides from his class on synthesis
z3 SMT Solver from Microsoft Research
most popular SMT solver
very versatile
commercially unfriendly license (not open source)