Analyzing Programs with SMT Solvers

SAT, SMT, SBV… oh my

acronyms are fun
- nested ones doubly so
SAT: Boolean satisfiability problem
- classic NP-complete problem
SMT: SAT modulo theories
SBV: SMT-based verification
- Haskell library by Levent Erkok

I can’t get no…

SAT problem: find satisfying assignment for Boolean formula
Boolean formula: a bunch of true/false variables with $∧$, $∨$ and $\lnot$: $$ (a ∨ \lnot b) ∧ (\lnot a ∨ c ∨ d ∨ \lnot e) ∧ (b ∨ \lnot d) $$
satisfying assignment: values for variables such that the whole formula is true

It’s not easy

NP-complete
naïve solution: backtracking
best solution: backtracking
huh?
solvers are fast for real world instances
- recent improvements $⇒$ fast enough for some practical purposes

A theory of everything

SMT: extend SAT with theories
theory: new type of variable—not Boolean
- bitvectors (0001010101)
- unbounded integers (ℤ)
- unbounded arrays
- algebraic numbers (almost like ℝ)
- floating-point numbers (not entirely unlike ℝ)

What can I do with…?

formula still has to be a Boolean
new types $⇒$ new constraints, operations
pretty much what you’d expect: $$ (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
solves to: $$x = 4, y = -3$$

Bit by bit

not just arithmetic
“flow” control:
- ite: if-then-else
different theories have:
- bitwise operations
- signed and unsigned operations
- array indexing

Still not satisfied

sometimes, we cannot solve a formula

$$ (x^2 + y^2 = 42) ∧ (3x + 4y = 0) $$

unsatifiable
this works even over unbounded integers!
however, the solver can also
- take too long
- return Unknown

Quantification

more explicit equation from before: $$ ∃ x. ∃ y. (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
compare to: $$ ∀ x. ∀ y. (x^2 + y^2 = 25) ∧ (3x + 4y = 0) $$
Unsatisfiable; can find counterexample: $$x = 23, y = 0$$

Again but in English Haskell

SBV: Haskell DSL for specifying formulas
mostly ASCII-ifies what we’ve seen before:

x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

reuses Num and Bits but not Eq or Ord
has own class for Boolean-like things

Everything is better with types

symbolic versions of normal Haskell types
- SInteger for Integer
- SWord32 for Word32; unsigned bitvectors
- SInt32 for Int32; signed bitvectors
- SBool for Bool
- SArray for unbounded arrays

Running

formulas as Haskell functions:

formula :: SInteger -> SInteger -> SBool
formula x y = 
  x^2 + y^2 .== 25 &&& 3 * x + 4 * y .== 0

run with quantified variables:

λ> sat . forSome ["x", "y"] formula

Everything is better with types monads

handy monad for composing formulas

better way to manage variables

formula = 
  do x :: SInteger <- exists "x"
     y :: SInteger <- exists "y"
     constrain $ x^2 + y^2 .== 25
     return $ 3 * x + 4 * y .== 0

note: ScopedTypeVariables

Modelling programs

extend to three sets of variables: $$ φ(input, program, output) $$
fix $input, program$: interpreter
fix $program, output$: reverse interpreter
fix $input, output$: synthesizer
also: check arbitrary invariants

Step by step

encode program state with single static assignment SSA
transform x = x + y to:
```
constrain $ x_2 .== x_1 + y_1
    
```
x_1, y_1…etc: existentially quantified

Operational Semantics

operational semantics $⇒$ constraints
interpreter $⇒$ formula compiler
let’s consider very simple language: IMP
- all variables 32 bit ints
- small number of imperative constructs

Expressions

arithmetic just gets encoded as arithmetic!
x + 1 becomes… x_n + 1
- keep track of the “current” step # n
operations are signed, so we use SInt32
remember: this is the theory of bitvectors

Assignment

for assignment (x = x + y), we just use SSA

constrain $ x_<n + 1> .== x_n + y_n

note how expression x + y got compiled

Conditions

IMP just has if-then-else
gets compiled to ite

if x > 5 then y = 1 else y = 2:

constrain $ ite (x_n .> 5) 
                (y_<n + 1> .== 1)
                (y_<n + 1> .== 2)

Sequencing

one statement follows another
only trick: remember to increment n

x = x + y; x = x * 2

constrain $ x_2 .== x_1 + y_1
constrain $ x_3 .== x_2 * 2

basically works out to >>
- but remember n!

Loops

this is the trickiest part
SMT solver cannot handle recursion
must finitize loops
- unroll some arbitrary number of times
- fail if that is not enough
later: rerun formula multiple times, unrolling more and more

Loops

while x < 5 do x = x + 1 unrolls to:
if x < 5 then x = x + 1; <while> else skip
after we reach our max depth, <while> becomes false
convert nested if-statements as before
result: long, ugly formula full of ite’s

Now what?

we can run our code (slowly)
we can run our code backwards
we can check invariants over our program
we can verify two functions are equivalent
we can synthesize programs
whatever else you can imagine

Running

forwards; solve for outputs:

constrain $ x_0 .== 10
constrain $ y_0 .== 20

backwards; solve for inputs:

constrain $ x_20 .== 10
constrain $ y_20 .== 20

Invariants

easy to encode invariants

for example “x is always positive”:

constrain $ x_1 .> 0 
        &&& x_2 .> 0 
        &&& x_3 .> 0 ...

universally quantify inputs:
```
x :: SWord32 <- forall "x"
    
```

Verification

search for inputs where programs disagree

assert outputs are not equal

constrain $ p1_x_20 ./= p2_x_20
constrain $ p1_y_20 ./= p2_y_20

existentially quantify inputs
unsatisfiable = verified
satisfiable = counterexample

Synthesis

add variables for each part of the program
select what to do with ite
fix inputs and outputs
also solve for literals (1, 2, 3…)
this gives you a program correct only over your fixed inputs and outputs

CEGIS

start with random input/output pair
synthesize
verify result
- if correct: we’re done
- if wrong: verifier returns new input
  - repeat
each new input is a new corner case!

Sketching

some things are easy for synthesizer
some things are easy for humans

programmer specifies code with holes:

while ?? do
  x = ??
  y = ??
  z = x + y
return z

Fun ideas

good fit for low-level code (ie assembly)
use techniques to optimize Haskell
- maybe take advantage of algebraic laws?
implement simple verified language
- check out ImProve on Hackage
design interactive tools
- intelligent debugging
- maybe use for education?

Credit

sbv package by Levent Erkok
synthesis ideas mostly from:
- Ras Bodik at the Berkeley ParLab
- especially slides from his class on synthesis
z3 SMT Solver from Microsoft Research
- most popular SMT solver
- very versatile
- commercially unfriendly license (not open source)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analyzing-programs-with-smt.org

analyzing-programs-with-smt.org

Analyzing Programs with SMT Solvers

SAT, SMT, SBV… oh my

I can’t get no…

It’s not easy

A theory of everything

What can I do with…?

Bit by bit

Still not satisfied

Quantification

Again but in English Haskell

Everything is better with types

Running

Everything is better with types monads

Modelling programs

Step by step

Operational Semantics

Expressions

Assignment

Conditions

Sequencing

Loops

Loops

Now what?

Running

Invariants

Verification

Synthesis

CEGIS

Sketching

Fun ideas

Credit

Files

analyzing-programs-with-smt.org

Latest commit

History

analyzing-programs-with-smt.org

File metadata and controls

Analyzing Programs with SMT Solvers

SAT, SMT, SBV… oh my

I can’t get no…

It’s not easy

A theory of everything

What can I do with…?

Bit by bit

Still not satisfied

Quantification

Again but in English Haskell

Everything is better with types

Running

Everything is better with types monads

Modelling programs

Step by step

Operational Semantics

Expressions

Assignment

Conditions

Sequencing

Loops

Loops

Now what?

Running

Invariants

Verification

Synthesis

CEGIS

Sketching

Fun ideas

Credit