-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterating over sets in compiled code is slow #2062
Comments
I think the situation isn't that dire :)
Can't we do
But we already do for our current implementation of sets, so that's not a downside of that approach. Coming back to the question of If that's what we're going to do in the long run, then we can do the same thing with sets. In fact, to make this concrete, consider building general iterators rather than plain sequence comprehensions. Suppose further that you're using an efficient set (because, I suspect this is more or less the way you would implement a generic iterator over tree-like datastructures in Dafny. In the binary search tree example, you have If we want to make the current iteration pattern as efficient as possible, then we could make any selection (
Was this example written pre |
Oh I wouldn't even call the worst possible interpretation of this situation "dire", thankfully :)
I'm sure it's technically possible to choose a compilation/runtime strategy that makes the loop as written run in O(n) time. The extreme case would be using linked lists to represent If I'm following the rest of your description correctly, you're imagining that we could have the equivalent of an Re: your point about wanting to have a general iterator concept, I completely agree! I'm really just proposing adding higher-level concepts that can be defined in terms of iterators first, because they will be useful for Dafny users by themselves on builtin datatypes right away, and provide much more succinct and easier-to-verify concepts than working with raw iterators. After all, it's generally a lot simpler to write a
That's an excellent point, and I hadn't fully read through your implementation until now (dafny-lang/rfcs#9 (comment)). That's another really nice feature of function-by-methods: not only does it enable more efficient implementations, it also allows you to use local non-determinism (via |
Right. I don't have strong opinions on the high-level interface :)
That, and it sounds like a lot of work, too ^^
I suspect that we'll have to do something like that eventually (once/when we implement iterators). IOW the message above isn't anything about what interface we expose to users — more that whatever we have to do to support iterators down the line will require us to implement something that also gives fast iteration on sets, without having to make Dafny the language more expressive.
👍, will do!
And even better: that's asymptotically optimal in this case, since we're actually sorting the output :) |
Cool, I think we're on the same page. The features I'm proposing in the RFC will be able to use the existing target language datatype support for iterators as is, as they are already used in translating various uses of
Right! :) I've added an edit since your observation changes the story a little, and thanks again! |
I imagine that it only needs to be deterministic within a single program execution. Would an ordering based on hash codes, based on pointer values, that changes across executions, be okay? |
This is part of the motivation for the foreach loops and sequence comprehensions RFC, but since the issue is a bit subtle and largely orthogonal, I'm outlining it here separately. The goal here is to explain why this issue is not feasible to fix just by improving the efficiency of the compiler or runtimes, but is actually a more fundamental limitation on how the features of the language interact.
As explained in the reference manual, it is easy in a method context to iterate over the values in a set:
With the current runtime implementation of sets, the expression
ss - {i}
takes O(|ss|
) time to evaluate, since it creates an entirely fresh copy of the set with one element removed. This means the whole loop executes in quadratic time.As @cpitclaudel correctly pointed out in the RFC, more efficient implementations of immutable sets exist that share subvalues and can implement removal in O(
log |ss|
) time, which would bring the runtime of the loop down to O(|s| log |s|
). This may be a worthwhile optimization, although the downside is depending on such implementations in each target language, or (preferably) providing our own implementation in Dafny user code. Even then, log-linear time is disappointing and surprising for what should be a simple iteration over a collection.The situation is much worse when trying to iterate over a set in an expression, though. We can define a ghost function that creates a sequence from a set:
Changing this to a function method, however, generates the error message "to be compilable, the value of a let-such-that expression must be uniquely determined." This example is borrowed from this manuscript, which explains that the let-such-that expression
var x :| x in s; ...
must be deterministic, and hence needs extra constraints to specify a unique value. The final compilable version (with comments removed from the lemma as it isn't the focus here) ends up being:The extra conjunct on the let-such-that expression is very expensive in this case: the compiled implementation of this statement is equivalent to:
The call to
Elements.All
is a simplification of how theforall y :: y in s ==> x <= y
expression is compiled, but is an equivalent search through all elements. Because selecting the minimum element is a naive, literal execution of the specification, it takes quadratic time, and soSetToSequence(s)
evaluates in O(|s|
3) time! It is difficult to imagine a practical compilation optimization that would recognize this pattern and produce better code, which would need to sort all the elements at once, possibly via a temporary ordered data structure such as a tree set, instead of attempting to pick each successive minimum one at a time.Edit: @cpitclaudel provided a solution involving
function by method
which brings the runtime down to O(|s|
2) with the current runtimes, which does improve the situation somewhat!The features proposed in the RFC are one possible solution to these challenges, as they express ordering of results directly in their semantics. An expression like
seq i: int | i in s
would be interpreted as "the ordered domain of integers, restricted to those in the set s", which would be implemented at runtime by extracting and sorting the elements of s. This is not the only possible solution, but hopefully it is clear that any solution is better off proposing how to increase the expressivity of the Dafny language, rather than focussing on better compilation strategies.The text was updated successfully, but these errors were encountered: