Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trusted Binary Decision Diagrams (Proof Generation) #435

Open
1 of 23 tasks
SSoelvsten opened this issue Dec 19, 2022 · 2 comments
Open
1 of 23 tasks

Trusted Binary Decision Diagrams (Proof Generation) #435

SSoelvsten opened this issue Dec 19, 2022 · 2 comments
Labels
✨ feature New operation or other feature 🎓 student project Work, work... but academic!

Comments

@SSoelvsten
Copy link
Owner

SSoelvsten commented Dec 19, 2022

A problem in SAT solving is trusting the solver when it claims a formula to be unsatisfiable. To this end, the solver can create an LRAT (or just a DRAT) proof. But, doing so is not a trivial task. Yet, BDDs naturally give rise to these very proofs, as is shown in [TACAS 21, arXiv 21] . The underlying idea is to store within each node the associated extension variable u and the four proof steps that encode the formula 'u <-> ITE(x,u1,u0)'.

This was added to a fork of the BuDDy BDD library, called TBuDDy. We want to add the same features to Adiar. The files tbdd.h and tbdd.cpp of TBuDDy might be useful reference material.

API

Proof Output

In TBuDDy, the user specifies the desired proof format and output file as part of a tbdd_init function (which also initialises BuDDy itself). We can do the same. But, I would prefer if we somehow could support multiple output files at once (thinking of later concurrent/thread-safe usages of Adiar). For thread-safety we might want to take a note or two from the exec_policy design (#548)

  • Add tbdd_ostream class to handle outputting the proof steps (LRAT or DRAT), create fresh variables, and create the clause identifiers.
    • Each basic TBDD creation (tbdd_clause and tbdd_validate below) has the output stream as its last argument (default: global proof_writer).
    • For thread-safety the tbdd_ostream in question needs to have a write-lock (using RAII).

TBDD Operations

  • The TBDD u for the input CNF clause with the given id.

    tbdd tbdd_clause(const generator<int> &xs, const size_t clause_id);

    This is similar to bdd_or ( with Extend bdd_and and bdd_or with negation flags #556 ) and adds the associated proof of Cid |= u. In TBuDDy this is the tbdd_create(bdd f, clause_id i) function. That is, we may consider to just use bdd_and(xs) and then convert it into a tbdd afterwards.

    This may need an (optional?) last argument for a shared proof_writer.

  • The TBDD w for u∧v with the associated proof that u∧v → w.

    tbdd tbdd_and(tbdd u, tbdd v);

    Also add & and &= overloads

  • Upgrades f to a TBDD u' by proving that u → u'.

    tbdd tbdd_validate(tbdd u, bdd f);

    This may need an (optional?) last argument for a shared proof_writer.

TBDD Information

Next to tbdd_clause(xs, id) to create a clause, we may also want the following basic building blocks:

  • The tautological clause (reuse bdd_top)

    bool tbdd_top();

    This may need an (optional?) last argument for a shared proof_writer.

We should also bring over the following predicates and accessors from bdd ( together with #536 ).

  • Whether a TBDD is a tautology (reuse bdd_istrue)

    bool tbdd_istop(tbdd u);
  • Whether a TBDD is a contradiction (reuse bdd_isfalse)

    bool tbdd_isbot(tbdd u);
  • Whether a TBDD is a single literal (reuse bdd_isvar)

    bool tbdd_isliteral(tbdd u);
  • Whether a TBDD is a positive literal (reuse bdd_isithvar)

    bool tbdd_ispositive(tbdd u);
  • Whether a TBDD is a negative literal (reuse bdd_isnithvar)

    bool tbdd_isnegative(tbdd u);

Finally, for debugging and statistics we also need

  • The number of BDD nodes in the TBDD (reuse bdd_nodecount)

    size_t tbdd_nodecount(tbdd u);
  • The number of Literals (i.e. the number of levels) present in the TBDD (reuse bdd_varcount)

    size_t tbdd_varcount(tbdd u);
  • Print the underlying BDD to a file (reuse bdd_printdot but maybe the output should also include the proof variables? If so, then this may not be trivial)

    void tbdd_printdot(tbdd u, std::ostream);

Other TBDD Operations (not mandatory)

In [CADE 2021], more operations are described. Here, they use the proof system to both proof a formula to be true and to be false (since that is necessary for QBF problems). What we have noted above for tbdd_and(u, v) is only the truth preserving version. See [CADE 21] for more details.

  • The TBDD w for u∨v with the associated (thruth preserving) proof that u → w and v → w.

    tbdd tbdd_or(tbdd u, tbdd v);

    Also add | and |= overloads

  • The TBDD w for u∧l where l is a single literal. This creates the associated (truth preserving) proof of w∧l → u.

    tbdd tbdd_restrict(tbdd u, int l);
  • The TBDD w for ∃x : u.

     tbdd tbdd_exists(tbdd u, int x);

    In [CADE 21] this is just the combination of tbdd_restrict and tbdd_or. Yet, one should be able to (1) combine it into a single quantification sweep for each literal and then maybe (2) incorporate multiple literals at once.

  • The TBDD w for ∀x : u.

    tbdd tbdd_forall(tbdd u, int x);

    Use tbdd_restrict and tbdd_and together. Similar to tbdd_exists, future work would be to improve performance with a dedicated quantification algorithm.

tbdd_and (Manipulation Algorithms)

I'll focus on tbdd_and for now...

Since the proof steps are built bottom-up most of the work happens in the Reduce Sweep - the preceding top-down sweep only needs to be extended such that it also stores the relevant association between nodes and proof variables.

Top-Down Sweeps

During the top-down sweep, forward a mapping between the nodes of the unreduced intermediate output and the original nodes (and/or their variables).

Algorithmic Extension

  1. Make the prod2_policy a stateful policy (similar to nested_sweep).
  2. Add hooks to prod2:
    • Start processing of pair (u,v). Here, the policy knows how far to go forward in its list of proof clauses.
    • Forwarding (u,v) across a level. Here, the policy knows to forward proof clauses too in its own priority queue. Yet, unlike the product construction it only needs to forward information for the node and not all its ingoing arcs. Hence, the maximum size of this priority queue is the product of the inputs' widths.
    • Resolved (u,v) to w. Here, the policy can output all of the relevant proof clauses for later.

Reduce Sweep

Preliminaries

The end-goal is to describe each BDD node u = (x, u[0], u[1]) as the following four clauses that collectively encode u↔(x ? u[1] : u[0]).

Notation Clause
HD(u) -x -u u[1]
LD(u) x -u u[0]
HU(u) -x -u[1] u
LU(u) x -u[0] u

Here, the L and H respectively are abbreviations for Low and High edges. The U is Up and D is Down (with respect to the direction of the implication). Notice, this perfectly fits with the information flow of the Reduce algorithm.

After having defined the BDD node in the proof, we also want to prove that it is the correct result. To this end, we need to combine the defining clauses of original BDD nodes u, v, and the resulting BDD node w.

BDD Node Clauses
u HD(u), LD(u)
v HD(v), LD(v)
w HU(w), LU(w)

Next to these six clauses we also need the following clauses that claim they correspond to the and of u and v.

Notation Clause
ANDH -u[1] -v[1] w[1]
ANDL -u[0] -v[0] w[0]

Knowing the clause id of all eight of these, we can combine them into two RUP resolution chains that can be combined to proof -u -v w, i.e. u∧v→w.

Algorithmic Extension

Again, we should be able to implement the above by the addition of hooks to a stateful reduce_policy. Alternatively, if this becomes too messy/expensive, then we may want to just implement a separate tbdd_reduce. The four places the algorithm needs to be extended is:

  • Node w is suppressed:
    Store this down together, similar to the red1_mapping.

  • Outputting new node w as w':
    Push LD(w'), LU(w'), HD(w'), and HU(w') to the proof and push their ids to the clause id file. Furthermore, push the clause id into a red2_mapping sorter that puts it back in order with w. Here, we

  • Outputting duplicate node w as w':
    Also place this mapping in the red2_mapping clause sorter.

With all of the defining clauses of w' available, we now only need to combine them with the input's clauses. Yet, to do so it needs to back in the order of the unreduced product graph. Hence, we need the clause variants of red1_mapping and red2_mapping. Finally, knowing all of the relevant clauses and having identified the proof case in question, we can output the proofs during the last hook:

  • Forwarding w' to parents of w
    Here, we need to provide the identifiers for the defining and correctness proving clauses of w'. These then need to be forwarded to their parents.

For all of this, the priority queue needs to be extended with w[_] and the clause proving their correctness (this is akin to #412).

tbdd_validate (Validation Algorithm)

This follows the above described idea for tbdd_and

  1. In a top-down sweep, create the clause mapping of the product graph between u and f (based on the implication operator).
  2. In the bottom-up sweep, instead of ANDL and ANDH create the relevant correctness proving clauses.

To make this faster, we would want the proof to relate to the BDD nodes of f rather than to new TBDD nodes of u' (since we then can reuse the node file for f). To do so, we should create a custom tbdd_reduce which does not output any new nodes and skips all of the computations to apply reduction rules. Instead, it traverses the transposed version of f, declares new variables for u', and proves it correct with the clauses of u.

  • Sort the products graph clauses of step 1. to match the transposition of f.
  • Forward relevant clauses bottom-up with a (levelized) priority queue.
  • For each node of f, combine the clauses from the priority queue to create new extension clauses for the nodes of f and use the product graph clauses to prove the implication.

Architecture

  • tbdd class

    • Conversion from tbdd to bdd should be trivial (since we may want to do some BDD operations, e.g. bdd_exists, and then get back to TBDDs with tbdd_validate). That is, it should merely be forgetting the associated variables. Hence, extension variables should not be included in the node or levelized_file<node> class but be placed in a separate file with the associated proof steps. In fact, we would want tbdd to inherit from the bdd class.
    • To keep the number of active clauses low, all freed TBDD nodes need to be deleted in the proof. Since Adiar does not share subtrees, then this merely is to make the tbdd destructor output a delete of all its extension variables. Hence, each tbdd needs to carry around a reference to their proof writer (assuming there is not only one).
      • Since deletions should not happen before the final result is computed, we do not need a __tbdd class (at least in the public API). To safe a little bit of disk space, we can also move the clause file(s) from the input into __tbdd.
      • The proof consists of (a) four RAT clauses that define each BDD node and (b) three RUP steps to prove correctness. We need the defining clauses of (a) later to prove the correctness of the next algorithm. The clauses of (b) need to be stored to delete them later.
  • tbdd_reduce

    Add the hooks and logic described above.

  • tbdd_and

    Add the hooks and logic described above.

  • tbdd_validate

    Implement tbdd_validate__down and tbdd_validate__up as described above.

References

  • [TACAS 21] Randal E. Bryant and Marijn J. H. Heule. “Generating extended resolution proofs with a BDD-based SAT solver”. In: Tools and Algorithms for the Construction and Analysis of Systems (2021)

  • [arXiv 21] Randal E. Bryant and Marijn J. H. Heule. “Generating extended resolution proofs with a BDD-based SAT solver (extended version)”. In: arXiv Preprint (2021)

  • [CADE 21] Randal E. Bryant and Marijn J. H. Heule. “Dual Proof Generation for Quantified Boolean Formulas with a BDD-based Solver”. In: International Conference on Automated Deduction (2021)

@SSoelvsten SSoelvsten added ✨ feature New operation or other feature 🎓 student project Work, work... but academic! 🎓 student project: MSc labels Dec 19, 2022
@SSoelvsten SSoelvsten changed the title Generating Extended Resolution Proofs within Adiar tbdd : Generating Extended Resolution Proofs within Adiar Nov 12, 2023
@SSoelvsten
Copy link
Owner Author

SSoelvsten commented Nov 12, 2023

Variable Sharing → Proof Compression

The implementation described above makes no use of node sharing in any way at all. In parts, the proof has to retraverse the entire output anyway, since it has to create the correctness clauses (all the way from the bottom). Yet, if we can share proof variables between TBDDs, then we can mitigate some heavy blowup in the proof size.

Yet, solving this seems quite difficult. Since we cannot know anything about the lifetimes of a TBDD, we cannot merely transfer the ownership from one TBDD to another, since the latter may be destructed before the former.

@SSoelvsten SSoelvsten changed the title tbdd : Generating Extended Resolution Proofs within Adiar Trusted Binary Decision Diagrams (LRAT Proof Generation) Nov 12, 2023
@SSoelvsten SSoelvsten changed the title Trusted Binary Decision Diagrams (LRAT Proof Generation) Trusted Binary Decision Diagrams (Proof Generation) Nov 12, 2023
@SSoelvsten
Copy link
Owner Author

SSoelvsten commented Nov 30, 2023

While reading [FMCAD 22] on the TBuDDy implementation, it does not seem like the memory demands of the tbdds is the hurdle to overcome:

  1. The experiments do not show TBuDDy runs out of memory but should partially begin to use 64-bit integers rather than 32-bit integers:

    • In 900 seconds, it was able to create almost 1 billion clauses. This is half of what fits into an int which is what TBuDDy now uses to store down the indices. Hence, the BDD package runs out of LRAT proof clauses before it runs out of BDD nodes.
    • Some of the benchmarks had the BDD package run out of BDD variables (BuDDy only supports 221> variables) rather than actual BDD nodes. This is related to Change nodetable and cache indexing from int to ssize_t  buddy#3 .
  2. The LRAT-CHECK tool has been modified to decrease its memory usage when the set of active clauses are sparse. The way it does it is with a two-level indexation scheme: an array holds pointers to blocks of consecutive clauses. When all clauses in a block are deleted, then the block can be freed.

    • Each BDD node occupies 4 clauses, each of which consists of up to 3 variables. Since BuDDy only uses 32-bit integers, then drat-trim should be compiled for using the same too (LONGTYPE in drat-trim/lrat-check.c). This means, each BDD node requires 48 bytes of clause literals; each BDD node requires 24 bytes in BuDDy, meaning the memory usage is (roughly speaking) at most 2:1.
    • One may want to investigate alternative data structures to hold onto the clauses inside of the proof checker; data structures that are better designed for sparse data.
    • An I/O-efficient LRAT proof checker would be the catch-all solution.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ feature New operation or other feature 🎓 student project Work, work... but academic!
Projects
None yet
Development

No branches or pull requests

1 participant