New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Blog - Final Project] Array IR for Mixed Cryptography #431

Merged

sampsyo merged 3 commits into sampsyo:2023fa from willwng:2023fa

Dec 23, 2023

Contributor

willwng commented Dec 12, 2023

Closes #401

Final Project with @vivianyyd

willwng added 2 commits

December 11, 2023 21:39


          [Project] add Array IR

f085c2a


          Update index.md

1c2b080

sampsyo requested changes

View reviewed changes

Owner

sampsyo left a comment

Sounds very cool! I liked the detailed description of the optimization problem and your heuristic solution to it. Y'all have done a great job coming up with a creative way to characterize the optimality of this solution empirically, in the absence of theoretical guarantees.

One thing that would have been useful to know is something about the prevalence of these issues in real code. Maybe that would be good to explore in the future!

content/blog/2023-12-11-array-ir/index.md Outdated

		@@ -0,0 +1,109 @@
		+++

Owner

sampsyo Dec 17, 2023

Suggested change

content/blog/2023-12-11-array-ir/index.md Outdated


		Implementing control flow required a slight change to how “blocks” (sequences of code delimited by curly braces) are represented in our AST. Previously, blocks were only used to represent function bodies, and therefore required that the last statement was always a `return`. We decided to introduce distinct notions of blocks for function bodies and control flow branches. This is because the invariant that each function contains a single unique `return` statement makes code generation far more straightforward, and doesn’t sacrifice expressivity.

		We evaluated this implementation using handwritten tests to ensure all branches through control flow are properly executed.

Owner

sampsyo Dec 17, 2023

I presume you also checked that everything still worked for the "old" use of blocks (i.e., on all test cases you have that don't use control flow)?

content/blog/2023-12-11-array-ir/index.md Outdated

+              This reordering is subject to data and security dependencies. Node `x` is a dependency of node `y` when:
+              - `x` defines data that is used by `y`,
+              - `x` is an `input` statement, and `y` an output,
+              - `x` is an `endorse` statement, and `y` an declassify, or

Owner

sampsyo Dec 17, 2023

Suggested change

      
            - `x` is an `endorse` statement, and `y` an declassify, or
          
            - `x` is an `endorse` statement, and `y` a declassify, or

content/blog/2023-12-11-array-ir/index.md Outdated

+              - `x` is an `endorse` statement, and `y` an declassify, or
+              - `x` is an `output` statement, and `y` an input from the same host.
+              The first rule corresponds to data dependencies. The second and third rules hold because moving a statement which reveals information before one that commits to data violates security. The last rule preserves the user interface.

Owner

sampsyo Dec 17, 2023

Maybe this is silly, but what about an output followed by an output? Do those have to have their order preserved? (This would be true for memory writes, for example, because you want to preserve "overwrites" of the same memory location. But maybe this is not possible with AIRduct outputs… if so, maybe the blog post needs a little more background on what "inputs" and "outputs" are in this setting?)

content/blog/2023-12-11-array-ir/index.md

		The grouping procedure constructs a dependency graph for each block (a sequence of code enclosed with curly braces), where nodes are statements. For instance, given the above code, the procedure constructs a dependency graph containing `A` and the `if` statement, and another for `B` and `C`. Reordering occurs only within these two graphs. Notice that these graphs do not need to maintain control-flow information, so statements can be reordered strictly based on data and security dependencies.


		We formalize the reordering task as the following problem. Given a directed acyclic graph $G = (V,E)$ where vertices are statements and edges are dependencies, and an assignment of protocols to nodes $p : V \to \mathbb{N}$, we aim to find a sorting $\pi$ of the nodes ($v_{\pi_1}, v_{\pi_2}, \dots, v_{\pi_n}$) such that:

Owner

sampsyo Dec 17, 2023

I think you need to add latex = true to the front matter to make the LaTeX math render.

content/blog/2023-12-11-array-ir/index.md

+              We formalize the reordering task as the following problem. Given a directed acyclic graph $G = (V,E)$ where vertices are statements and edges are dependencies, and an assignment of protocols to nodes $p : V \to \mathbb{N}$, we aim to find a sorting $\pi$ of the nodes ($v_{\pi_1}, v_{\pi_2}, \dots, v_{\pi_n}$) such that:
+              - For all $e = (v_1, v_2) \in E$ (the execution of $v_2$ depends on $v_1$), $v_1$ is ordered before $v_2$
+              - The number of protocol changes within the sequence is minimized. For each consecutive $v_{\pi_i}, v_{\pi_j}$ in the sort, there is a change if $p(v_{\pi_i}) \neq p(v_{\pi_j})$.

Owner

sampsyo Dec 17, 2023

Great! This is a very clear description of the optimization objective.

content/blog/2023-12-11-array-ir/index.md Outdated

+              - For all $e = (v_1, v_2) \in E$ (the execution of $v_2$ depends on $v_1$), $v_1$ is ordered before $v_2$
+              - The number of protocol changes within the sequence is minimized. For each consecutive $v_{\pi_i}, v_{\pi_j}$ in the sort, there is a change if $p(v_{\pi_i}) \neq p(v_{\pi_j})$.
+              We are unsure of the hardness of this problem. We (unsuccessfully) tried several reductions from NP-Complete problems, and have not yet proven that our polynomial-time algorithm produces an optimum result. This would be an interesting task to pursue in the future. We think the problem might be in P.

Owner

sampsyo Dec 17, 2023

I also don't see an immediate NP-hardness reduction. The fact that the scheduling constraints are just expressed as a dag rather than, like, integer deadlines makes it hard to see how to do a reduction from any "classic" scheduling problem. I guess I agree with you that it seems likely that it is in P?

content/blog/2023-12-11-array-ir/index.md Outdated


		Here, the error is defined by the number of swaps computed by our algorithm compared to the optimal solution. `p` is defined to be the probability of a DAG edge existing.

		From these results, we see that our heuristic gives an optimal solution nearly 90% (at least for smaller DAG’s). We also estimate that the smallest approximation factor that describes our algorithm is `ɑ=2`, where our algorithm gives twice as many swaps as the optimal solution (number is obtained by a small adversarial example).

Owner

sampsyo Dec 17, 2023

"nearly 90%": add "…of the time"? Or make it "for nearly 90% of the randomly generated graphs"?

content/blog/2023-12-11-array-ir/index.md Outdated


		These results seem to make sense; sparse graphs are rather uncomplicated and most solutions will already be optimal, and highly-connected graphs have a smaller space of possible topological sorts. It would be interesting to perform a more rigorous analysis of the algorithm to explain the empirical results.

		We also tested our implementation on handwritten programs in the pre-circuit language, both based on existing benchmarks for cryptographic compilers and programs which mix multiple protocols in suboptimal orders. For each example, maximally large groupings were constructed. One important task for future work is to test on more representative benchmarks, as many of the existing benchmarks are small and do not use more than two cryptographic protocols.

Owner

sampsyo Dec 17, 2023

While it is also useful to see the general characterization of the heuristic on randomly generated data, it would be really interesting to know it performs on "real" programs. One thing I'm not quite clear on is whether a frontend exists for your pre-circuit language… is this something you can obtain from other benchmarks that have already been written somewhere? Or do you need to manually translate these benchmarks into your system?


          Respond to changes requested

willwng requested a review from sampsyo

December 19, 2023 23:19

Contributor Author

willwng commented Dec 19, 2023 •

edited

Loading

Thanks for the suggestions! I couldn't get the comments to line up with the revised file, but I believe all the comments should hopefully be answered in the latest commit. Most notably, we added the statement about how benchmarks for typical cryptographic compilers support only one protocol, so we had to hand-write our own. Hopefully that will change in the future!

sampsyo approved these changes

View reviewed changes

Owner

sampsyo left a comment

Got it; looks great! Nice work on this.

sampsyo merged commit 99fdddc into sampsyo:2023fa

2 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet