From e8125a3a3fff0fc4261643135e11b8dfa42ed4d9 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Wed, 11 May 2022 11:28:07 -0700 Subject: [PATCH 01/13] Initial partial rewrite --- 0005-sequence-comprehensions.md | 555 ++++++++++++++++++++++++++++++++ 1 file changed, 555 insertions(+) create mode 100644 0005-sequence-comprehensions.md diff --git a/0005-sequence-comprehensions.md b/0005-sequence-comprehensions.md new file mode 100644 index 0000000..e1e7010 --- /dev/null +++ b/0005-sequence-comprehensions.md @@ -0,0 +1,555 @@ +- Feature Name: sequence_comprehensions +- Start Date: 2022-05-10 +- RFC PR: [dafny-lang/rfcs#9](https://github.com/dafny-lang/rfcs/pull/9) +- Dafny Issue: [dafny-lang/dafny#1753](https://github.com/dafny-lang/dafny/issues/1753) + +# Summary +[summary]: #summary + +This RFC proposes adding generalized sequence comprehension expressions, similar to the existing `set` and `map` comprehension expressions and more flexible than the current `seq(5, i => i*i)` form. + +# Motivation +[motivation]: #motivation + +Loops are notoriously difficult for Dafny users to verify, and coming up with the correct loop invariants to prove a loop correct +is particularly challenging. The recent addition of `for` loops was a good step to abstract away from raw `while` loops, +but only supports consecutive integer indices. They still fall short of the expressiveness and economy of `for` or `foreach` loops in +many mainstream programming languages, which generally iterate through the contents of a collection of some kind. +It is a well-established best practice to avoid manual loop indexes wherever possible, as they force the assumption that datatypes +support efficient random access, and are more likely to be used incorrectly by accident: + +```dafny +// Before: +method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { + for i := 0 to |s| { + var left := s[i]; + for j := 0 to |s| { + var right := s[i]; // Whoops! + result := result + (left, right); + } + } +} + +// After: +method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { + foreach left <- s { + foreach right <- s { + result := result + (left, right); + } + } +} + +// Or even better, as an expression: +function method AllPairs(s: seq): seq<(nat, nat)> { + seq left <- s, right <- s :: (left, right) +} +``` + +A more serious shortcoming is that there is currently no efficient way to iterate over the elements of a `set` at all. +The best alternative is to use the assign-such-that operator and set subtraction, illustrated in the reference manual as follows +([sec 10.5.2](https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#1052-sets)): + +```dafny +// s is a set +var ss := s; +while ss != {} +{ + var i: int :| i in ss; + ss := ss - {i}; + print i, "\n"; +} +``` + +This is functionally correct, but because set subtraction requires creating a fresh copy, +when compiled this loop requires quadratic time and memory (or at least garbage-collection work) in the size of the set. +The only way to avoid this cost is to write additional external code in the target language to interface directly with the +internal runtime implementation of `set` values. + +By comparison, the equivalent `foreach` loop is: + +```dafny +foreach i <- s { + print i, "\n"; +} +``` + +The runtime implementation of this loop can use the native iteration features of the target language. + +Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and +expressions rather than statements allows logic to be used in specifications as well as implementation, +and reduces the burden on the Dafny verifier. The proposed sequence comprehension expression allows more +logic that ultimately produces a sequence of values to be expressed as a value rather than a statement. +Just producing a sequence of the values in a set is simple using a sequence comprehension expression: `seq i <- s`. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Although sequence comprehensions are the main feature this RFC proposes, they depend on another new and more +more fundamental concept: quantifier domain ordering. Therefore, allow me to outline this concept first before then +describing sequence comprehensions. + +## Quantifier domain ordering + +TODO + +There will be two cases for determining this ordering, and others could potentially be added in the future: + +1. + +Note that this concept and the new `<-` syntax applies to any use of quantifiers, even though ordering and multiplicity +is irrelevant for all current uses: the semantics of `set` and `map` comprehensions, `forall` and `exists` expressions, +and `forall` statements all do not depend on the order of quantification. This syntax does offer a more efficient +expression of common uses of these features, however: `set x <- s :: x + 2` is equivalent to +`set x | x in s :: x + 2`, for example. + +## foreach Loops + +A `foreach` loop in Dafny resembles `for` or `foreach` loops in many other mainstream languages, +but is substantially more general and expressive. A very simple example only looks slightly different than expected: + +```dafny +foreach x <- [1, 2, 3, 4, 5] { + print x, "\n"; +} +``` + +The feature supports much more sophisticated uses as well, however, +including binding multiple variables at once and filtering: + +```dafny +var myDoubleDeckerSeq: seq> := ...; +foreach x <- myDoubleDeckerSeq, y <- x | y != 0 { + Process(y); +} +``` + +The high-level syntax for a `foreach` loop reuses several existing concepts: + +``` +ForeachLoop ::= + "foreach" + QuantifierDomain + LoopSpec + Body +``` + +More specifically, this expands to: + +``` +ForeachLoop ::= + "foreach" + Ident [":" Type] { "," Ident [":" Type] } [ "|" Expression ] + { InvariantDecl | ModifiesDecl | DecreasesDecl } + [ Body ] +``` + +A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall` +loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop +executes its body once for each tuple of quantified values in sequence, one at a time. + +Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop +must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable, +but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`. + +```dafny +foreach x: real | 0.0 <= x < 1.0 { // Error: ... + ... +} +``` + +The domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. +`decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided +to ensure the loop terminates even if the domain is potentially infinite. The following (slightly bizarre) example is legal: + +```dafny +foreach x: nat + invariant x <= 10 + decreases 10 - x +{ + print x, "\n"; + if x == 10 { break; } +} +``` + +TODO: Not correct any more, C has to be a deterministically ordered collection for now (?). +Most `foreach` loops will take the form `foreach x <- C { ... }`, where `C` is a `seq`, `set` or `multiset`. +This includes expressions like `m.Keys`, `m.Values` and `m.Items` when `m` is a `map`. +Looping over the keys and values of a map is particularly readable and aesthetically pleasing with this syntax, +which also makes it easier for the compiled program to avoid unnecessary intermediate tuple values: + +```dafny +foreach (k, v) <- m.Items { + print "m[", k, "] == ", v; +} +``` + +Note that the range expression is optional, and if omitted the loop will iterate over all +possible values of the types of the bound variables. This is only permitted if all such types +are enumerable, which is not true of the `real` type, for example. This supports an elegant +pattern for mapping over simple datatypes and other types with few values, including subset types. + +``` +datatype Suit = Clubs | Diamonds | Hearts | Spades + +foreach s: Suit { + ... +} + +foreach b1: bool, b2: bool { + expect TestMyFunctionWithOptions(b1, b2) == true; +} + +type SmallIndex = x: nat | 0 <= x < 8 + +foreach i: SmallIndex { + ... +} +``` + +Similarly, a helper function can be provided to maintain a running index of the enumerated values: + +```dafny +foreach (x, i) in WithIndexes(s) { + ... +} + +// Alternatively, if s is a sequence: +foreach i, x | 0 <= i < |s| && s[i] == x { + ... +} +``` + +## Sequence comprehensions + +A sequence comprehension has identical syntax to a set comprehension, except that it begins with +`seq` rather than `set`. Therefore its syntax is: + +``` +SeqComprehension ::= + "seq" + Ident [":" Type] { "," Ident [":" Type] } + "|" Expression + [ :: Expression ] +``` + +Sequence comprehensions are likely to be easier to work with than `foreach` loops, and Dafny users should be +encouraged to use them instead where possible. Many high-order operations on sequences can be expressed directly +using sequence comprehensions: + +```dafny +// Filtering to optional values +var s: seq> := ...; +var filtered: seq := seq Some(x) <- s; + +// Zipping two lists together +var a: seq := ...; +var b: seq := ...; +assert |a| == |b|; +var zipped := seq i | 0 <= i < |a| :: (a[i], b[i]) + +// Map and then flatten (a.k.a. "FlatMap") +var c: seq := ...; +var f: S -> seq := ...; +var flatMapped: seq := seq x <- c, y <- f(x) :: y; + +// Sorted sequence from a set +var setOfReals := {3.141, 2.718, 1.618}; +var sortedList := seq x | x in setOfReals; +assert sortedList == {1.618, 2.718, 3.141}; +``` + +Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, +but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature. +Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as +`seq i | 0 <= i < size :: f(i)`. They also avoid the common trap of forgetting to explicitly attach the `requires 0 <= i < size` +pre-condition to a lambda expression, as this issue highlights: https://github.com/dafny-lang/dafny/issues/1332 + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to +borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling +quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden +is ensuring that enumeration ordering is deterministic. The existing compilation logic for enumerating a domain +treats this ordering as pure implementation detail, and applies heuristic optimizations to try to make the search as short +as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context +is a sequence comprehension or `foreach` loop. + +Here is a sketch of the rules that define the enumeration ordering of a quantifier domain. +These rules are chosen as generalizations of the existing semantics for set comprehensions, +such that building a set from all elements in the ordered enumeration results in the same value. + + * Every boolean expression defines an ordered enumeration of free variable bindings that satisfy it. This enumeration may include duplicate bindings, and the ordering is significant. + * The most common base case is an expression of the form `x in C`, where `C` is an expression with a sequence, multiset, or (i)set type. These expressions produce enumerations that bind only `x` to each value in the collection. + * For sequences, the ordering is the obvious ordering of values in the sequence. + * For (i)sets or multisets, the exact ordering is deterministic but unspecified. + * Potentially-infinite sets are only allowed if they are enumerable. + * If `x` is not a variable, then the enumeration produces an empty set of bindings once for each time the LHS appears in the RHS collection. + * For an expression of the form `A && B`, the enumeration includes all bindings that satisfy both `A` and `B`. The enumeration ordering is defined by ordering first by `A`'s ordering and then by `B`'s, and the multiplicities of bindings is multiplied. See below for the pseudocode that calculates this enumeration. + * This means that the `&&` operation is not fully symmetrical, but this is already true as verification considers it to be "short-circuiting": + `b != 0 && a/b > 1` is valid but `a/b > 1 && b != 0` is not. + * This definition is chosen such that the multiplicities of results from `A && B` is the same as `B && A`, even if the ordering is not the same. + * For an expression of the form `A || B`, the enumeration will simply produce the values enumerated by `A` followed by those enumerated by `B`. + * If the context of an expression binds a variable that is not used in the expression, such as `seq x: bool | true`, + then the default ordering for the type of the variable is used (i.e. `[false, true]` in this case). + If this type is not enumerable, then the expression is not allowed. + * For an expression of the form `!A`, the enumeration will produce all values of the types of the bound variables that do not satisfy `A`. + As above, if any of these types are not enumerable the expression is not allowed. + +Here is the pseudocode for the semantics of the ordered enumeration of bindings for `A && B`: + +``` +IEnumerable EnumerateConjunction(Expr A, Expr B) { + foreach (ActualBindings a in A) { + Expr B' := B[a]; // Substitute the given bindings into the expression B + foreach (ActualBindings b' in B') { + yield return b'; + } + } +} +``` + +And some illustrative examples using sequence comprehensions: + +```dafny +// Special cases for one variable: +(seq x: T | A && B) == (seq x | A) * (seq x | B) // Not an actual built-in operation, but a generalization of set intersection. +(seq x: T | A || B) == (seq x | A) + (seq x | B) +(seq x: T | !A) == (seq x: T) - (seq x | A) // Not an actual built-in operation, but a generalization of set subtraction. + // Only well-formed if T itself is enumerable. + +(seq x | x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2] +(seq x | x in [1, 2, 2] && x in {2, 1}) == [1, 2, 2] +(seq x | x in [1, 2, 2] || x in {2, 1}) == [1, 2, 2, 1, 2] // Ordering of last two not specified +(seq x, y | y in [[1, 2], [3, 4]] && x in y :: x) == [1, 2, 3, 4] +(seq x, y | y in [[1, 1], [1, 1]] && x in y :: x) == [1, 1, 1, 1] + +(seq x, y | y == {1, 2, 3} && x in y) == [2, 3, 1] // Ordering not specified +(seq x, y | x in y && y == {1, 2, 3}) // ==> "Error: can't calculate ordered enumeration" + +// Edge case with no bound variables (not literally supported in Dafny source, but a base case for the general algorithm) +(seq | 2 in [2, 2] :: 42) == [42, 42] +(seq | 1 in [2, 2] :: 42) == [] +``` + +## Verification + +The optimal encoding of sequence comprehensions to Boogie is something of an open question: +set comprehensions are encoded as their indicator predicates, but there is no such direct representation +for sequence comprehensions as defined here. The initial encoding will likely be with a function, +similar to the `Seq#Create` function used for set constructions, that accepts an indicator predicate. +Axioms will be included to at least state that values are elements of a sequence comprehension iff +they satisfy this predicate. This property is likely the most important to infer automatically, +although it says nothing about the ordering or multiplicity of these values. + +Since `x | x in C && P(x)` where `C` is a sequence will be a very common pattern, it will likely be worthwhile +to also include axioms for this pattern to encode the fact that elements in the result +maintain their ordering from `C`, and that the length of the result and the multiplicities of values +are all bounded from above by their analogues from `C`. + +The translation of `foreach` loops should be reducible to sequence comprehensions +and other existing features for loops in general, +as the semantics of such loops can usually be expressed as follows: + +```dafny +// A loop of this form: +foreach x1, x2, ..., xN | { + +} + +// Is semantically equivalent to: +var __s := seq x1, x2, ..., xN | :: (x1, x2, ..., xN); +for __i := 0 to |__s| { + var (x1, x2, ..., xN) := __s[__i]; + +} +``` + +This is not a complete equivalency as the domain of `foreach` loop with explicit `decreaes` clauses +are allowed to be infinite. It may make sense to support encoding potentially infinite sequences in the Boogie prelude +even if they are not (yet) supported in Dafny, to close this gap. + +## Compilation + +The most challenging part of compiling these new features is ensuring the enumeration +ordering of unordered collection types such as sets and multisets is deterministic when used in a sequence comprehension. +This is closely related to compiling let-such-that expressions (`var x :| ...; ...`), as such expressions +are implemented by enumerating through the parameters of a domain to search for a match, +and hence are influenced by this ordering. The current solution is to require any such expressions +have only one solution, which is not an option here. +As illustrated in [this paper](https://easychair.org/publications/paper/dM), the challenge is +that although the runtime values of `{1, 2}` and `{2, 1}` will compare equal to each other, +their internal representation may differ and hence their enumeration ordering may as well. + +Most if not all current Dafny compilers implement a Dafny `set` with some variation on a hash set, +which is generally a good default choice since this datatype offers constant-time access on average. +Although different copies of the same set must place each value into the same hashing bucket, multiple +values may be assigned to the same bucket, so it is necessary to use a secondary data structure of some kind for each bucket. +A common simple approach is to use a linked list, +in which case a single bucket will store elements in the order they were added and hence not meet the requirement +of a deterministic ordering based only on the set of contained elements. + +One solution is to use a binary search tree as this secondary data structure instead, assuming that a total ordering of +the elements of the type `T` is available. Some hash set implementations, such as recent versions of the Java `HashSet` type, +use a similar approach as an internal optimization when many hashing collisions occur, +so that the worst case runtime of lookups is O(log N) rather than O(N). +The included Dafny runtimes can provide an implementation of Dafny sets and multisets that use this approach consistently, +and hence provide the efficient average-case access of a hash set but still provide a deterministic enumeration order. + +The good news is that individual compilers are free to decide they do not support sequence comprehensions +whose ordering depends on any such collections. Dafny code that will be compiled using such compilers can +fall back to using a `foreach` loop to iterate over sets instead, as statements are permitted to +observe non-determinism. This means a particular compiler can still map a Dafny `set` value +directly to a native implementation of sets, such as a Java `Set`, even though these implementations will +enumerate their contents with non-deterministic ordering. + +# Drawbacks +[drawbacks]: #drawbacks + +The primary drawback to this proposal is the burden of adding the new concept of quantification ordering. +This means extra cognitive load for Dafny users, extra implementation effort for any new builtin collection types, +and additional testing to ensure that all supported quantification domains are deterministically enumerable. + +It can be argued that if any compiled Dafny program uses existing features that depend on domain enumeration, +such as `exists` expressions or `:|` statements, its runtime behavior already depends heavily on how this domain +searching is implemented. The compiler only enforces that such domains are finite and enumerable, but an inefficient +choice can lead to the search for a matching value taking orders of magnitude more time than expected. +Therefore we should already be documenting this aspect of Dafny semantics and implementation better anyway. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + + + +# Prior art +[prior-art]: #prior-art + +Sequence comprehensions bear a strong resemblance to list comprehensions in Python, which also include similar abilities to +bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]` + +Most mainstream programming languages include rich libraries for working with collections and iterators, +and define much the same operations such as "filter" and "flatMap". +The proposed Dafny features provide similar functionality at a higher level +of abstraction more amenable to verification: quantification expressions are used to succinctly declare the behavior of +an operation that in spirit applies several such operations. See the "Future Possibilities" section for ideas for +pushing this philosophy even further. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +* Is inconsistency between quantification ordering and `<` an issue? (e.g. `<` on datatypes means "rank comparison" +and can't be compiled, string prefix rather than lexicographic ordering, etc) + +```dafny +function QuickSort(s: seq): seq { + // Oops, this discards multiplicity + seq x | x in s +} by method { + // ... +} +``` + +# Future possibilities +[future-possibilities]: #future-possibilities + +Adding this building block to the Dafny language opens up a wide array of tempting new features! + +## User-defined collections + +As discussed in the earlier [`for` loops RFC](https://github.com/dafny-lang/rfcs/pull/4), the semantics of +`for` or `foreach` loops in many programming languages are defined in terms of "iterator" or "enumerator" objects. +The features proposed here only support builtin collections, but could easily be extended to support +arbitrary Dafny types that provide a way to obtain an enumerator for their contents. +[This PR](https://github.com/dafny-lang/libraries/pull/37) should be a good start towards designing +the necessary interface. This is also likely the best way to provide a better user experience +for looping or quantifying over the values produced by an `iterator` type. + +## Unicode strings + +As I have previously ranted about in https://github.com/dafny-lang/dafny/issues/413, the Dafny `string` type is currently an alias +for `seq`, where `char` is defined as a UTF-16 code units, and hence allows invalid sequences according to the semantics of +Unicode. Modelling unicode correctly will involve specifying that the semantic contents of a UTF-8 string, for example, is +a sequence of Unicode code points that is not efficiently accessed with indices. The features proposed here provide ways to work with +such datatypes efficiently at runtime. + +## Array comprehensions + +The same syntax for sequence comprehensions could also be used for flexible array initialization, +since this is another case where the user has to declare an ordered source of values: + +``` +var a := new int[10] i :: i * i; +var a := new int[|s|] i :: s[i]; // Where s is a seq +var a := new int[] i | 10 <= i < 20; +``` + +It would likely be prudent to limit these cases to those where the size of the domain +is easy to infer statically, so that the size of the array to allocate is known before +enumerating the values. + +## Multiset comprehensions + +There would have similar semantics to sequence comprehensions, +where the multiplicity of results is significant but not the ordering. +This feature would be very cheap to implement once sequence comprehensions are implemented. + +## Generalized comprehensions + +As shown above, sequence comprehensions are a powerfully expressive feature, +but they can only express enumerating multiple values from a small number of source expressions, +and not the opposite direction of aggregating multiple values into one. +The existing quantifier and comprehension expressions can be viewed as specific +cases of aggregating multiple values into a single result value. +An equally powerful feature would be to generalize this pattern and define +a parameterized collection expression to aggregate a quantifier domain into +a single value: + +```dafny +collect(&&) x: T | P(x) :: Q(x) == forall x: T | P(x) :: Q(x) +collect(||) x: T | P(x) :: Q(x) == exists x: T | P(x) :: Q(x) +collect(SeqBuilder) x: T | P(x) :: Q(x) == seq x: T | P(x) :: Q(x) +collect(SetBuilder) x: T | P(x) :: Q(x) == set x: T | P(x) :: Q(x) +collect(+) x: T | P(x) :: Q(x) == (summation) +collect(*) x: T | P(x) :: Q(x) == (product) +collect(<) x: T | P(x) :: Q(x) == (minimum) +collect(Averager) x: T | P(x) :: Q(x) == (average) +collect(First(n)) x: T | P(x) :: Q(x) == Take(seq x: T | P(x) :: Q(x), n) +... +``` + +This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the +parameter to `collect` expressions would define similar operations for merging intermediate results +to leave the door open for parallel or even distributed computation. + +## Notes + +New version: + +"Quantifier domain" exists as a parsing concept, now define it more explicitly: a totally ordered set of bindings. Bindings have their own identity for the sake of the ordering definition, so +`x <- [1, 1, 1, 1, 1]` contains 5 distinct bindings, although each binds `x` to `1`. +Not necessarily enumerable or finite. Doesn't even need to be when compiled, because +quantifier ranges can restrict to a domain that is, e.g. `set x: real | x in setOfReals`. + +```dafny +seq x <- xs, y <- ys :: (x, y) +seq k <- m.Keys, v == m[k] :: (k, v) +multiset (k, v) <- m.Items :: v +seq Some(v) <- optionals :: v +seq x <- doubleDeckerSeq, y <- x :: y +seq x | 0 <= x < 10 +foreach x <- c { ... } +``` + +How to get the domain of a multiset? `x <- ms` is fine as long as the ordering +doesn't have to be deterministic. + +Binding patterns: +* x (: T) - natural order of T, may need heuristics to determine finiteness +* x <- C - order of C, may contain duplicates +* x by myLessThen (future) - user-provided ordering + +Provide a way to specify the ordering when you don't have a collection? +* x: int will be a wibble wobble order, but x: nat will be sequential + * Technially that's a wibble wobble order restricted to nat values! + +"Exists uniquely" quantifier? From f8726d73ce0d26ddcbbe1bab833c3fa740560ef9 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Wed, 11 May 2022 13:58:47 -0700 Subject: [PATCH 02/13] Added section on ordered quantification --- ...sions.md => 0005-ordered-quantification.md | 105 +++++++++++++----- 1 file changed, 77 insertions(+), 28 deletions(-) rename 0005-sequence-comprehensions.md => 0005-ordered-quantification.md (80%) diff --git a/0005-sequence-comprehensions.md b/0005-ordered-quantification.md similarity index 80% rename from 0005-sequence-comprehensions.md rename to 0005-ordered-quantification.md index e1e7010..05a86a1 100644 --- a/0005-sequence-comprehensions.md +++ b/0005-ordered-quantification.md @@ -1,12 +1,17 @@ -- Feature Name: sequence_comprehensions -- Start Date: 2022-05-10 +- Feature Name: ordered_quantification +- Start Date: 2022-02-22 (twosday!) - RFC PR: [dafny-lang/rfcs#9](https://github.com/dafny-lang/rfcs/pull/9) - Dafny Issue: [dafny-lang/dafny#1753](https://github.com/dafny-lang/dafny/issues/1753) # Summary [summary]: #summary -This RFC proposes adding generalized sequence comprehension expressions, similar to the existing `set` and `map` comprehension expressions and more flexible than the current `seq(5, i => i*i)` form. +This RFC proposes adding two closely-related new features to Dafny for working with collections and iteration: + +1. `foreach` loops over any enumerable domain, initially only supporting the builtin collection types but leaving the door open for user-defined collections. +2. Generalized sequence comprehension expressions, similar to the existing `set` and `map` comprehension expressions and more flexible than the current `seq(5, i => i*i)` form. + +These features both build on the fundamental new concept of *ordered quantification*, and their syntax and semantics overlap substantially. # Motivation [motivation]: #motivation @@ -32,10 +37,8 @@ method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { // After: method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { - foreach left <- s { - foreach right <- s { - result := result + (left, right); - } + foreach left <- s, right <- s { + result := result + (left, right); } } @@ -60,10 +63,9 @@ while ss != {} } ``` -This is functionally correct, but because set subtraction requires creating a fresh copy, -when compiled this loop requires quadratic time and memory (or at least garbage-collection work) in the size of the set. -The only way to avoid this cost is to write additional external code in the target language to interface directly with the -internal runtime implementation of `set` values. +This is functionally correct, but in the current compiler and runtimes takes quadratic time. +https://github.com/dafny-lang/dafny/issues/2062 contains a more in-depth exploration of this issue; +the overall conclusion is that improving the situation through optimization is risky and expensive. By comparison, the equivalent `foreach` loop is: @@ -77,30 +79,79 @@ The runtime implementation of this loop can use the native iteration features of Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and expressions rather than statements allows logic to be used in specifications as well as implementation, -and reduces the burden on the Dafny verifier. The proposed sequence comprehension expression allows more +and reduces the effort Dafny users have to spend on proving their code correct. The proposed sequence comprehension expression allows more logic that ultimately produces a sequence of values to be expressed as a value rather than a statement. -Just producing a sequence of the values in a set is simple using a sequence comprehension expression: `seq i <- s`. +Just producing a sequence of the values in the set above, sorted by the natural ordering of `int` values, +is simple using a sequence comprehension expression: `seq i: int | i in s`. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Although sequence comprehensions are the main feature this RFC proposes, they depend on another new and more -more fundamental concept: quantifier domain ordering. Therefore, allow me to outline this concept first before then -describing sequence comprehensions. +Both of the proposed new features depend on another new and more fundamental concept: quantification ordering. +Therefore, allow me to outline this concept first before then describing `foreach` loops and sequence comprehensions. + +## Quantification ordering + +This concept augments the existing common syntax for quantifier domains with a notion of ordering, and allows quantified variables to +bind to duplicate values. The key points are: + +* Any quantifier domain defines a potentially infinite, partially-ordered set of *quantifier bindings*. +* The quantified variable declarations define the values each binding maps to each variable, AND how these bindings are ordered. +* The range expression after the `|` restricts the quantification to a subset of these bindings, but does not influence their ordering. +* A quantifier domain only guarantees an ordering of bindings, but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! + This is consistent with existing expressions such as `set x: real | x in mySet && P(x)`: although the unfiltered domain of `real` values is not + enumerable, the bound `x in mySet` is, and at runtime this set is calculated by enumerating the contents of `mySet` and filtering out values + that do not satisfy `P(x)`. The new features that are affected by quantification ordering will behave similarly: `seq x: real | x in mySet && P(x)` + +There will be three supported *quantifier variable declaration* cases for determining this ordering, and others could potentially be added in the future: + +1. `x [: T]` + + In the existing syntax for quantified variables (where the type `T` may be explicitly provided or omitted and inferred), + the quantification bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will + have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them, + will use the natural mathematical ordering. `x: int | x in {2, 5, 3}`, for example, would bind x to `2`, `3`, and `5` in that order. + +2. `x [: T] <- C` + + In this new syntax, the quantification bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`set`, `multiset`, `map` and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. + Unlike the first case, this syntax may produce duplicate bindings. The ordering of bindings is non-deterministic unless `C` is a sequence. + If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. + + The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. + + The expression `C` is allowed to depend on quantifier variables declared in earlier clauses, such as `x <- C, y <- x`. + +3. ` <- C` + + This is a generalization of the previous case that supports pattern matching, as in variable declaration and match statements. + It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`. + +A single quantifier domain may include multiple such clauses separated by commas, in which case the orderings described for each clause take +lexicographic precedence. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in that order: -## Quantifier domain ordering +1. `x == 1, y == 3` +1. `x == 1, y == 4` +1. `x == 2, y == 3` +1. `x == 2, y == 4` -TODO +The syntax for quantifier domains will therefore become: -There will be two cases for determining this ordering, and others could potentially be added in the future: +QuantifierVarDecl ::= + Ident [":" Type] + | Ident [":" Type] "<-" Expression + | CasePatternLocal "<-" Expression -1. +QuantifierDomain ::= + QuantifierVarDecl + { "," QuantifierVarDecl } + [ "|" Expression ] -Note that this concept and the new `<-` syntax applies to any use of quantifiers, even though ordering and multiplicity -is irrelevant for all current uses: the semantics of `set` and `map` comprehensions, `forall` and `exists` expressions, -and `forall` statements all do not depend on the order of quantification. This syntax does offer a more efficient -expression of common uses of these features, however: `set x <- s :: x + 2` is equivalent to -`set x | x in s :: x + 2`, for example. +Note that this concept and the new `<-` syntax applies to any use of quantifier domains, even though ordering and multiplicity +is irrelevant for all current uses: the semantics of `[i]set` and `map` comprehensions, `forall` and `exists` expressions, +and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. +This syntax does offer a more efficient expression of common uses of these features, however: `set Some(x) <- s` is equivalent to +`set x | x in s && x.Some? :: x.value`, for example. ## foreach Loops @@ -171,11 +222,9 @@ foreach x: nat } ``` -TODO: Not correct any more, C has to be a deterministically ordered collection for now (?). Most `foreach` loops will take the form `foreach x <- C { ... }`, where `C` is a `seq`, `set` or `multiset`. This includes expressions like `m.Keys`, `m.Values` and `m.Items` when `m` is a `map`. -Looping over the keys and values of a map is particularly readable and aesthetically pleasing with this syntax, -which also makes it easier for the compiled program to avoid unnecessary intermediate tuple values: +Looping over the keys and values of a map is particularly clean with this syntax: ```dafny foreach (k, v) <- m.Items { From b80e931cebdde0ffe74cd370043936f51b145f50 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Wed, 11 May 2022 17:21:12 -0700 Subject: [PATCH 03/13] Mostly finished modulo a few TODOs --- 0005-ordered-quantification.md | 253 +++++++++++++-------------------- 1 file changed, 96 insertions(+), 157 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 05a86a1..ad4d0e8 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -1,6 +1,6 @@ - Feature Name: ordered_quantification - Start Date: 2022-02-22 (twosday!) -- RFC PR: [dafny-lang/rfcs#9](https://github.com/dafny-lang/rfcs/pull/9) +- RFC PR: [dafny-lang/rfcs#9](https://github.com/dafny-lang/rfcs/pull/10) - Dafny Issue: [dafny-lang/dafny#1753](https://github.com/dafny-lang/dafny/issues/1753) # Summary @@ -37,6 +37,7 @@ method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { // After: method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { + result := []; foreach left <- s, right <- s { result := result + (left, right); } @@ -92,51 +93,62 @@ Therefore, allow me to outline this concept first before then describing `foreac ## Quantification ordering -This concept augments the existing common syntax for quantifier domains with a notion of ordering, and allows quantified variables to -bind to duplicate values. The key points are: +Several existing Dafny features support a common syntax for quantifying over one or more variables, internally referred to +as a "quantifier domain". For example, the `x: nat, y: nat | x < 2 && y < 2` portion of the set comprehension +`set x: nat, y: nat | x < 2 && y < 2 :: (x, y)`. The boolean expression after the `|` is referred to as the *range*, +and I will label the `x: nat` and `y: nat` sections as *quantifier variable declarations*. +We can augment this existing syntax with a notion of ordering, and allow quantified variables to bind to duplicate values. The key points are: -* Any quantifier domain defines a potentially infinite, partially-ordered set of *quantifier bindings*. +* Any quantifier domain defines a potentially-infinite, partially-ordered set of *quantifier bindings*. * The quantified variable declarations define the values each binding maps to each variable, AND how these bindings are ordered. * The range expression after the `|` restricts the quantification to a subset of these bindings, but does not influence their ordering. * A quantifier domain only guarantees an ordering of bindings, but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! - This is consistent with existing expressions such as `set x: real | x in mySet && P(x)`: although the unfiltered domain of `real` values is not - enumerable, the bound `x in mySet` is, and at runtime this set is calculated by enumerating the contents of `mySet` and filtering out values + This is consistent with existing expressions such as `set x: real | x in myFiniteSet && P(x)`: although the unfiltered domain of `real` values is not + enumerable, the bound `x in myFiniteSet` is, and at runtime this set is calculated by enumerating the contents of `myFiniteSet` and filtering out values that do not satisfy `P(x)`. The new features that are affected by quantification ordering will behave similarly: `seq x: real | x in mySet && P(x)` + is calculated via the same filtered enumeration, but collected into a sorted sequence instead. -There will be three supported *quantifier variable declaration* cases for determining this ordering, and others could potentially be added in the future: +There will be three supported variable declaration styles, and others could potentially be added in the future: 1. `x [: T]` - In the existing syntax for quantified variables (where the type `T` may be explicitly provided or omitted and inferred), - the quantification bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will + In the existing syntax for quantifier variables (where the type `T` may be explicitly provided or omitted and inferred), + the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them, - will use the natural mathematical ordering. `x: int | x in {2, 5, 3}`, for example, would bind x to `2`, `3`, and `5` in that order. + will use the natural mathematical ordering. `x: int | x in {2, 5, -4}`, for example, would bind x to `-4`, `2`, and `5` in that order. 2. `x [: T] <- C` - In this new syntax, the quantification bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`set`, `multiset`, `map` and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. + In this new syntax, the bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`set`, `multiset`, `map`, and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. Unlike the first case, this syntax may produce duplicate bindings. The ordering of bindings is non-deterministic unless `C` is a sequence. If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. - The expression `C` is allowed to depend on quantifier variables declared in earlier clauses, such as `x <- C, y <- x`. - 3. ` <- C` - This is a generalization of the previous case that supports pattern matching, as in variable declaration and match statements. + This is a generalization of the previous case that supports pattern matching, as in variable declarations and match statements. It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`. -A single quantifier domain may include multiple such clauses separated by commas, in which case the orderings described for each clause take -lexicographic precedence. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in that order: +A single quantifier domain may include multiple such declarations separated by commas, in which case the orderings described for each clause take +lexicographic precedence. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in this order: 1. `x == 1, y == 3` 1. `x == 1, y == 4` 1. `x == 2, y == 3` 1. `x == 2, y == 4` -The syntax for quantifier domains will therefore become: +In addition, collection expressions used in declarations are permitted to refer to variables declared in previous declarations. +The domain `x <- [[1, 2], [3, 4], y <- x` will therefore produce these bindings: + +1. `x == [1, 2], y == 1` +1. `x == [1, 2], y == 2` +1. `x == [3, 4], y == 3` +1. `x == [3, 4], y == 4` +The overall syntax for quantifier domains will become: + +``` QuantifierVarDecl ::= Ident [":" Type] | Ident [":" Type] "<-" Expression @@ -146,11 +158,13 @@ QuantifierDomain ::= QuantifierVarDecl { "," QuantifierVarDecl } [ "|" Expression ] +``` Note that this concept and the new `<-` syntax applies to any use of quantifier domains, even though ordering and multiplicity is irrelevant for all current uses: the semantics of `[i]set` and `map` comprehensions, `forall` and `exists` expressions, -and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. -This syntax does offer a more efficient expression of common uses of these features, however: `set Some(x) <- s` is equivalent to +and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. +Importantly, these extensions to the syntax and semantics of quantifier domains are all fully backwards compatible. +They do offer a more efficient expression of common uses of these existing features, however: `set Some(x) <- s` is equivalent to `set x | x in s && x.Some? :: x.value`, for example. ## foreach Loops @@ -189,14 +203,14 @@ More specifically, this expands to: ``` ForeachLoop ::= "foreach" - Ident [":" Type] { "," Ident [":" Type] } [ "|" Expression ] + QuantifierVarDecl { "," QuantifierVarDecl } [ "|" Expression ] { InvariantDecl | ModifiesDecl | DecreasesDecl } - [ Body ] + [ BlockStmt ] ``` A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall` -loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop -executes its body once for each tuple of quantified values in sequence, one at a time. +statement executes its body once for every binding of quantified values simultaneously, whereas a `foreach` loop +executes its body once for each binding of quantified values in sequence, one at a time. Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable, @@ -208,10 +222,12 @@ foreach x: real | 0.0 <= x < 1.0 { // Error: ... } ``` -The domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. +The quantifier domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. `decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided to ensure the loop terminates even if the domain is potentially infinite. The following (slightly bizarre) example is legal: +TODO: better example. Perhaps convince + ```dafny foreach x: nat invariant x <= 10 @@ -222,16 +238,6 @@ foreach x: nat } ``` -Most `foreach` loops will take the form `foreach x <- C { ... }`, where `C` is a `seq`, `set` or `multiset`. -This includes expressions like `m.Keys`, `m.Values` and `m.Items` when `m` is a `map`. -Looping over the keys and values of a map is particularly clean with this syntax: - -```dafny -foreach (k, v) <- m.Items { - print "m[", k, "] == ", v; -} -``` - Note that the range expression is optional, and if omitted the loop will iterate over all possible values of the types of the bound variables. This is only permitted if all such types are enumerable, which is not true of the `real` type, for example. This supports an elegant @@ -255,6 +261,34 @@ foreach i: SmallIndex { } ``` +There is no builtin support for automatically tracking the enumerated values so far, as there is for +`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). +It is straightforward to track these values manually, however: + +```dafny +ghost var xs := []; +foreach x in s { + ... + xs := xs + [x]; +} +``` + +It is even possible to attach this ghost state directly to the sequence with a helper function: + +```dafny +foreach (x, xs) in WithPrefixes(c) { + ... +} +``` + +Where the signature of `WithPrefixes` is something like: + +```dafny +function method WithPrefixes(s: seq): seq<(T, ghost seq)> { + ... +} +``` + Similarly, a helper function can be provided to maintain a running index of the enumerated values: ```dafny @@ -276,8 +310,7 @@ A sequence comprehension has identical syntax to a set comprehension, except tha ``` SeqComprehension ::= "seq" - Ident [":" Type] { "," Ident [":" Type] } - "|" Expression + QuantifierVarDecl { "," QuantifierVarDecl } "|" Expression [ :: Expression ] ``` @@ -305,6 +338,8 @@ var flatMapped: seq := seq x <- c, y <- f(x) :: y; var setOfReals := {3.141, 2.718, 1.618}; var sortedList := seq x | x in setOfReals; assert sortedList == {1.618, 2.718, 3.141}; + +TODO: manual collection operations like intersection of two sequences ``` Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, @@ -316,6 +351,8 @@ pre-condition to a lambda expression, as this issue highlights: https://github.c # Reference-level explanation [reference-level-explanation]: #reference-level-explanation +TODO: + As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden @@ -324,77 +361,22 @@ treats this ordering as pure implementation detail, and applies heuristic optimi as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context is a sequence comprehension or `foreach` loop. -Here is a sketch of the rules that define the enumeration ordering of a quantifier domain. -These rules are chosen as generalizations of the existing semantics for set comprehensions, -such that building a set from all elements in the ordered enumeration results in the same value. - - * Every boolean expression defines an ordered enumeration of free variable bindings that satisfy it. This enumeration may include duplicate bindings, and the ordering is significant. - * The most common base case is an expression of the form `x in C`, where `C` is an expression with a sequence, multiset, or (i)set type. These expressions produce enumerations that bind only `x` to each value in the collection. - * For sequences, the ordering is the obvious ordering of values in the sequence. - * For (i)sets or multisets, the exact ordering is deterministic but unspecified. - * Potentially-infinite sets are only allowed if they are enumerable. - * If `x` is not a variable, then the enumeration produces an empty set of bindings once for each time the LHS appears in the RHS collection. - * For an expression of the form `A && B`, the enumeration includes all bindings that satisfy both `A` and `B`. The enumeration ordering is defined by ordering first by `A`'s ordering and then by `B`'s, and the multiplicities of bindings is multiplied. See below for the pseudocode that calculates this enumeration. - * This means that the `&&` operation is not fully symmetrical, but this is already true as verification considers it to be "short-circuiting": - `b != 0 && a/b > 1` is valid but `a/b > 1 && b != 0` is not. - * This definition is chosen such that the multiplicities of results from `A && B` is the same as `B && A`, even if the ordering is not the same. - * For an expression of the form `A || B`, the enumeration will simply produce the values enumerated by `A` followed by those enumerated by `B`. - * If the context of an expression binds a variable that is not used in the expression, such as `seq x: bool | true`, - then the default ordering for the type of the variable is used (i.e. `[false, true]` in this case). - If this type is not enumerable, then the expression is not allowed. - * For an expression of the form `!A`, the enumeration will produce all values of the types of the bound variables that do not satisfy `A`. - As above, if any of these types are not enumerable the expression is not allowed. - -Here is the pseudocode for the semantics of the ordered enumeration of bindings for `A && B`: - -``` -IEnumerable EnumerateConjunction(Expr A, Expr B) { - foreach (ActualBindings a in A) { - Expr B' := B[a]; // Substitute the given bindings into the expression B - foreach (ActualBindings b' in B') { - yield return b'; - } - } -} -``` - -And some illustrative examples using sequence comprehensions: - -```dafny -// Special cases for one variable: -(seq x: T | A && B) == (seq x | A) * (seq x | B) // Not an actual built-in operation, but a generalization of set intersection. -(seq x: T | A || B) == (seq x | A) + (seq x | B) -(seq x: T | !A) == (seq x: T) - (seq x | A) // Not an actual built-in operation, but a generalization of set subtraction. - // Only well-formed if T itself is enumerable. - -(seq x | x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2] -(seq x | x in [1, 2, 2] && x in {2, 1}) == [1, 2, 2] -(seq x | x in [1, 2, 2] || x in {2, 1}) == [1, 2, 2, 1, 2] // Ordering of last two not specified -(seq x, y | y in [[1, 2], [3, 4]] && x in y :: x) == [1, 2, 3, 4] -(seq x, y | y in [[1, 1], [1, 1]] && x in y :: x) == [1, 1, 1, 1] - -(seq x, y | y == {1, 2, 3} && x in y) == [2, 3, 1] // Ordering not specified -(seq x, y | x in y && y == {1, 2, 3}) // ==> "Error: can't calculate ordered enumeration" - -// Edge case with no bound variables (not literally supported in Dafny source, but a base case for the general algorithm) -(seq | 2 in [2, 2] :: 42) == [42, 42] -(seq | 1 in [2, 2] :: 42) == [] -``` - ## Verification The optimal encoding of sequence comprehensions to Boogie is something of an open question: set comprehensions are encoded as their indicator predicates, but there is no such direct representation for sequence comprehensions as defined here. The initial encoding will likely be with a function, -similar to the `Seq#Create` function used for set constructions, that accepts an indicator predicate. -Axioms will be included to at least state that values are elements of a sequence comprehension iff -they satisfy this predicate. This property is likely the most important to infer automatically, -although it says nothing about the ordering or multiplicity of these values. +similar to the `Seq#Create` function used for seq constructions, that accepts an indicator predicate. +The useful axioms about these values will fall into three categories, sketched below for the simple +case of comprehensions with a single variable bound from a sequence and no explicit term: + +`var S := seq x <- C | P(x);` -Since `x | x in C && P(x)` where `C` is a sequence will be a very common pattern, it will likely be worthwhile -to also include axioms for this pattern to encode the fact that elements in the result -maintain their ordering from `C`, and that the length of the result and the multiplicities of values -are all bounded from above by their analogues from `C`. +1. Membership: `forall x :: P(x) <==> x in S` +2. Cardinality: `|S| <= |C|` +3. Ordering: `forall i, j | 0 <= i < j < |S| :: + exists i', j' | 0 <= i' < j' < |C| :: + C[i'] == S[i] && C[j'] == S[j]` The translation of `foreach` loops should be reducible to sequence comprehensions and other existing features for loops in general, @@ -420,37 +402,11 @@ even if they are not (yet) supported in Dafny, to close this gap. ## Compilation -The most challenging part of compiling these new features is ensuring the enumeration -ordering of unordered collection types such as sets and multisets is deterministic when used in a sequence comprehension. -This is closely related to compiling let-such-that expressions (`var x :| ...; ...`), as such expressions -are implemented by enumerating through the parameters of a domain to search for a match, -and hence are influenced by this ordering. The current solution is to require any such expressions -have only one solution, which is not an option here. -As illustrated in [this paper](https://easychair.org/publications/paper/dM), the challenge is -that although the runtime values of `{1, 2}` and `{2, 1}` will compare equal to each other, -their internal representation may differ and hence their enumeration ordering may as well. - -Most if not all current Dafny compilers implement a Dafny `set` with some variation on a hash set, -which is generally a good default choice since this datatype offers constant-time access on average. -Although different copies of the same set must place each value into the same hashing bucket, multiple -values may be assigned to the same bucket, so it is necessary to use a secondary data structure of some kind for each bucket. -A common simple approach is to use a linked list, -in which case a single bucket will store elements in the order they were added and hence not meet the requirement -of a deterministic ordering based only on the set of contained elements. - -One solution is to use a binary search tree as this secondary data structure instead, assuming that a total ordering of -the elements of the type `T` is available. Some hash set implementations, such as recent versions of the Java `HashSet` type, -use a similar approach as an internal optimization when many hashing collisions occur, -so that the worst case runtime of lookups is O(log N) rather than O(N). -The included Dafny runtimes can provide an implementation of Dafny sets and multisets that use this approach consistently, -and hence provide the efficient average-case access of a hash set but still provide a deterministic enumeration order. - -The good news is that individual compilers are free to decide they do not support sequence comprehensions -whose ordering depends on any such collections. Dafny code that will be compiled using such compilers can -fall back to using a `foreach` loop to iterate over sets instead, as statements are permitted to -observe non-determinism. This means a particular compiler can still map a Dafny `set` value -directly to a native implementation of sets, such as a Java `Set`, even though these implementations will -enumerate their contents with non-deterministic ordering. +TODO: + +* Blah blah bounded pools +* Compilation already has an internal foreach-like concept. Having foreach loops in the language + means this can be one of many lowering transformations in the future. # Drawbacks [drawbacks]: #drawbacks @@ -468,7 +424,7 @@ Therefore we should already be documenting this aspect of Dafny semantics and im # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives - +TODO # Prior art [prior-art]: #prior-art @@ -483,9 +439,16 @@ of abstraction more amenable to verification: quantification expressions are use an operation that in spirit applies several such operations. See the "Future Possibilities" section for ideas for pushing this philosophy even further. +The [JMatch language](https://www.cs.cornell.edu/andru/papers/padl03.pdf), an extension of Java to support +pattern matching, includes a `foreach` loop remarkably similar to the proposal here: it is parameterized by +a boolean expression and iterates through all matching bindings. For example, the statement `foreach(a[int i] != 0) n += i;` +sums the indices of all non-zero elements of the array `a`. + # Unresolved questions [unresolved-questions]: #unresolved-questions +TODO: + * Is inconsistency between quantification ordering and `<` an issue? (e.g. `<` on datatypes means "rank comparison" and can't be compiled, string prefix rather than lexicographic ordering, etc) @@ -572,33 +535,9 @@ to leave the door open for parallel or even distributed computation. ## Notes -New version: - -"Quantifier domain" exists as a parsing concept, now define it more explicitly: a totally ordered set of bindings. Bindings have their own identity for the sake of the ordering definition, so -`x <- [1, 1, 1, 1, 1]` contains 5 distinct bindings, although each binds `x` to `1`. -Not necessarily enumerable or finite. Doesn't even need to be when compiled, because -quantifier ranges can restrict to a domain that is, e.g. `set x: real | x in setOfReals`. - -```dafny -seq x <- xs, y <- ys :: (x, y) -seq k <- m.Keys, v == m[k] :: (k, v) -multiset (k, v) <- m.Items :: v -seq Some(v) <- optionals :: v -seq x <- doubleDeckerSeq, y <- x :: y -seq x | 0 <= x < 10 -foreach x <- c { ... } -``` - -How to get the domain of a multiset? `x <- ms` is fine as long as the ordering -doesn't have to be deterministic. - Binding patterns: -* x (: T) - natural order of T, may need heuristics to determine finiteness -* x <- C - order of C, may contain duplicates * x by myLessThen (future) - user-provided ordering Provide a way to specify the ordering when you don't have a collection? * x: int will be a wibble wobble order, but x: nat will be sequential * Technially that's a wibble wobble order restricted to nat values! - -"Exists uniquely" quantifier? From b243b388d481d68172020a71854be6623bfa40f3 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Wed, 11 May 2022 17:29:47 -0700 Subject: [PATCH 04/13] Clean-up --- 0005-ordered-quantification.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index ad4d0e8..31cb1a8 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -120,6 +120,7 @@ There will be three supported variable declaration styles, and others could pote 2. `x [: T] <- C` In this new syntax, the bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`set`, `multiset`, `map`, and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. + Unlike the first case, this syntax may produce duplicate bindings. The ordering of bindings is non-deterministic unless `C` is a sequence. If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. @@ -203,7 +204,8 @@ More specifically, this expands to: ``` ForeachLoop ::= "foreach" - QuantifierVarDecl { "," QuantifierVarDecl } [ "|" Expression ] + QuantifierVarDecl { "," QuantifierVarDecl } + [ "|" Expression ] { InvariantDecl | ModifiesDecl | DecreasesDecl } [ BlockStmt ] ``` @@ -267,7 +269,7 @@ It is straightforward to track these values manually, however: ```dafny ghost var xs := []; -foreach x in s { +foreach x <- s { ... xs := xs + [x]; } @@ -276,7 +278,7 @@ foreach x in s { It is even possible to attach this ghost state directly to the sequence with a helper function: ```dafny -foreach (x, xs) in WithPrefixes(c) { +foreach (x, xs) <- WithPrefixes(c) { ... } ``` @@ -292,7 +294,7 @@ function method WithPrefixes(s: seq): seq<(T, ghost seq)> { Similarly, a helper function can be provided to maintain a running index of the enumerated values: ```dafny -foreach (x, i) in WithIndexes(s) { +foreach (x, i) <- WithIndexes(s) { ... } @@ -310,7 +312,8 @@ A sequence comprehension has identical syntax to a set comprehension, except tha ``` SeqComprehension ::= "seq" - QuantifierVarDecl { "," QuantifierVarDecl } "|" Expression + QuantifierVarDecl { "," QuantifierVarDecl } + "|" Expression [ :: Expression ] ``` @@ -356,10 +359,7 @@ TODO: As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden -is ensuring that enumeration ordering is deterministic. The existing compilation logic for enumerating a domain -treats this ordering as pure implementation detail, and applies heuristic optimizations to try to make the search as short -as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context -is a sequence comprehension or `foreach` loop. +is ensuring that enumeration ordering is deterministic. ## Verification @@ -405,6 +405,12 @@ even if they are not (yet) supported in Dafny, to close this gap. TODO: * Blah blah bounded pools + +The existing compilation logic for enumerating a domain +treats this ordering as pure implementation detail, and applies heuristic optimizations to try to make the search as short +as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context +is a sequence comprehension or `foreach` loop. + * Compilation already has an internal foreach-like concept. Having foreach loops in the language means this can be one of many lowering transformations in the future. From 273173bbb8a917d3335bc287261c82ab5379fcb3 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Thu, 12 May 2022 12:55:11 -0700 Subject: [PATCH 05/13] Filled in the rest --- 0005-ordered-quantification.md | 181 +++++++++++++++++++++------------ 1 file changed, 117 insertions(+), 64 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 31cb1a8..275c7c4 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -80,7 +80,8 @@ The runtime implementation of this loop can use the native iteration features of Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and expressions rather than statements allows logic to be used in specifications as well as implementation, -and reduces the effort Dafny users have to spend on proving their code correct. The proposed sequence comprehension expression allows more +and reduces the effort Dafny users have to spend on proving their code correct. +The proposed sequence comprehension expression allows more logic that ultimately produces a sequence of values to be expressed as a value rather than a statement. Just producing a sequence of the values in the set above, sorted by the natural ordering of `int` values, is simple using a sequence comprehension expression: `seq i: int | i in s`. @@ -102,11 +103,17 @@ We can augment this existing syntax with a notion of ordering, and allow quantif * Any quantifier domain defines a potentially-infinite, partially-ordered set of *quantifier bindings*. * The quantified variable declarations define the values each binding maps to each variable, AND how these bindings are ordered. * The range expression after the `|` restricts the quantification to a subset of these bindings, but does not influence their ordering. -* A quantifier domain only guarantees an ordering of bindings, but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! - This is consistent with existing expressions such as `set x: real | x in myFiniteSet && P(x)`: although the unfiltered domain of `real` values is not - enumerable, the bound `x in myFiniteSet` is, and at runtime this set is calculated by enumerating the contents of `myFiniteSet` and filtering out values - that do not satisfy `P(x)`. The new features that are affected by quantification ordering will behave similarly: `seq x: real | x in mySet && P(x)` - is calculated via the same filtered enumeration, but collected into a sorted sequence instead. + + +A quantifier domain only guarantees an ordering of bindings, +but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! +This is consistent with existing expressions such as `set x: real | x in myFiniteSet && P(x)`: +although the unfiltered domain of `real` values is not enumerable, the bound `x in myFiniteSet` is, +and at runtime this comprehension is calculated by enumerating the contents of `myFiniteSet` +and filtering out values that do not satisfy `P(x)`. +The new features that are affected by quantification ordering will behave similarly: +`seq x: real | x in mySet && P(x)` is calculated via the same filtered enumeration, +but collected into a sorted sequence instead. There will be three supported variable declaration styles, and others could potentially be added in the future: @@ -115,24 +122,27 @@ There will be three supported variable declaration styles, and others could pote In the existing syntax for quantifier variables (where the type `T` may be explicitly provided or omitted and inferred), the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them, - will use the natural mathematical ordering. `x: int | x in {2, 5, -4}`, for example, would bind x to `-4`, `2`, and `5` in that order. + will use the natural mathematical ordering. + `x: int | x in {2, 5, -4}`, for example, would bind `x` to `-4`, `2`, and `5` in that order. 2. `x [: T] <- C` - In this new syntax, the bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`set`, `multiset`, `map`, and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. + In this new syntax, the bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`[i]set`, `multiset`, `[i]map`, and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. - Unlike the first case, this syntax may produce duplicate bindings. The ordering of bindings is non-deterministic unless `C` is a sequence. + Unlike the first case, this syntax may produce duplicate bindings. + The ordering of bindings is non-deterministic unless `C` is a sequence. If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. - The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. + The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. 3. ` <- C` This is a generalization of the previous case that supports pattern matching, as in variable declarations and match statements. It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`. -A single quantifier domain may include multiple such declarations separated by commas, in which case the orderings described for each clause take -lexicographic precedence. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in this order: +A single quantifier domain may include multiple such declarations separated by commas, +in which case the orderings described for each clause take lexicographic precedence. +The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in this order: 1. `x == 1, y == 3` 1. `x == 1, y == 4` @@ -140,7 +150,7 @@ lexicographic precedence. The domain `x <- [1, 2], y <- [3, 4]` will therefore s 1. `x == 2, y == 4` In addition, collection expressions used in declarations are permitted to refer to variables declared in previous declarations. -The domain `x <- [[1, 2], [3, 4], y <- x` will therefore produce these bindings: +The domain `x <- [[1, 2], [3, 4], y <- x` therefore produces these bindings: 1. `x == [1, 2], y == 1` 1. `x == [1, 2], y == 2` @@ -226,17 +236,18 @@ foreach x: real | 0.0 <= x < 1.0 { // Error: ... The quantifier domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. `decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided -to ensure the loop terminates even if the domain is potentially infinite. The following (slightly bizarre) example is legal: - -TODO: better example. Perhaps convince +to ensure the loop terminates even if the domain is potentially infinite. For example, the following (very slow) example collects +five arbitrary primes (assuming that Dafny can figure out how to enumerate the `allPrimes` infinite set): ```dafny -foreach x: nat - invariant x <= 10 - decreases 10 - x +var allPrimes := iset n | !exists i, j | 1 < i <= j < n :: i * j == n; +var fivePrimes := {}; +foreach p <- allPrimes + invariant |fivePrimes| <= 5 + decreases 5 - |fivePrimes| { - print x, "\n"; - if x == 10 { break; } + if |fivePrimes| == 5 { break; } + fivePrimes := fivePrimes + [x]; } ``` @@ -263,7 +274,7 @@ foreach i: SmallIndex { } ``` -There is no builtin support for automatically tracking the enumerated values so far, as there is for +There will not initially be any builtin support for automatically tracking the enumerated values so far, as there is for `iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). It is straightforward to track these values manually, however: @@ -330,7 +341,7 @@ var filtered: seq := seq Some(x) <- s; var a: seq := ...; var b: seq := ...; assert |a| == |b|; -var zipped := seq i | 0 <= i < |a| :: (a[i], b[i]) +var zipped := seq i | 0 <= i < |a| :: (a[i], b[i]); // Map and then flatten (a.k.a. "FlatMap") var c: seq := ...; @@ -342,9 +353,19 @@ var setOfReals := {3.141, 2.718, 1.618}; var sortedList := seq x | x in setOfReals; assert sortedList == {1.618, 2.718, 3.141}; -TODO: manual collection operations like intersection of two sequences +// Intersection of sequences +var a := [4, 1, 3, 5]; +var b := [3, 1, 7]; +assert (seq x <- a | x in b) == [1, 3]; +assert (seq x <- b | x in a) == [3, 1]; ``` +Since sequence comprehensions are expressions rather than statements, they carry an additional restriction +when used in functions: their ordering must be fully deterministic. If `s` is a `set`, for example, +`seq x <- s` would be rejected in specification contexts, whereas `seq x | x in s` would be allowed. +This is very similar to the restriction on `:|` let-such-that expressions, which is not relevant for equivalent +`:|` assign-such-that statements. + Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature. Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as @@ -354,19 +375,17 @@ pre-condition to a lambda expression, as this issue highlights: https://github.c # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -TODO: - As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden -is ensuring that enumeration ordering is deterministic. +is ensuring that enumeration ordering is deterministic in contexts where it needs to be. ## Verification The optimal encoding of sequence comprehensions to Boogie is something of an open question: set comprehensions are encoded as their indicator predicates, but there is no such direct representation for sequence comprehensions as defined here. The initial encoding will likely be with a function, -similar to the `Seq#Create` function used for seq constructions, that accepts an indicator predicate. +similar to the `Seq#Create` function used for seq constructions, that accepts a source sequence and indicator predicate. The useful axioms about these values will fall into three categories, sketched below for the simple case of comprehensions with a single variable bound from a sequence and no explicit term: @@ -402,46 +421,72 @@ even if they are not (yet) supported in Dafny, to close this gap. ## Compilation -TODO: - -* Blah blah bounded pools - -The existing compilation logic for enumerating a domain -treats this ordering as pure implementation detail, and applies heuristic optimizations to try to make the search as short -as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context -is a sequence comprehension or `foreach` loop. - -* Compilation already has an internal foreach-like concept. Having foreach loops in the language - means this can be one of many lowering transformations in the future. +The existing resolution logic for quantified domains in Dafny applies heuristics to identify +conjuncts in the range expression that define a bounded, potentially-enumerable set of values. The +[`BoundedPool`](https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/AST/DafnyAst.cs#L11487) type is used +to represent these cases, and models common patterns such as `0 <= i < n` (`IntBoundedPool`), +`x in s` (`CollectionBoundedPool`), and enum-like datatypes with no parameterized constructors (`DatatypeBoundedPool`). +Different bounded pool types have different "virtues", including whether or not they are finite or enumerable. +In compiled contexts, Dafny will produce an error if it cannot identify at least one enumerable bound. + +To support ordered quantification, this mechanism will be extended to include tracking the ordering these pools +will enumerate values with. The `x <- C` syntax will be another potential `CollectionBoundedPool` source, and usually one that encodes +an explicit enumeration order. The existing compilation logic treats ordering as pure implementation detail, +and applies optimizations to try to make the search as short as possible. Ensuring consistent ordering is an additional +constraint on this logic, applied only when the surrounding context is a sequence comprehension or `foreach` loop. + +Each target language compiler already implements methods such as `CreateForEachLoop` to emit the implementation of +iterating over a particular `BoundedPool`. +One benefit to adding `foreach` loops to Dafny is that this logic can be refactored as a pure lowering translation +phase, rewriting quantification of many kinds to explicit `foreach` loop AST nodes, which can then be translated to +target languages constructs in later phases. # Drawbacks [drawbacks]: #drawbacks The primary drawback to this proposal is the burden of adding the new concept of quantification ordering. This means extra cognitive load for Dafny users, extra implementation effort for any new builtin collection types, -and additional testing to ensure that all supported quantification domains are deterministically enumerable. +and additional testing to ensure that quantification domains are deterministically enumerable when required. + +The bounded pool heuristics described above are necessary in order to support compiling expressions such as set +comprehensions that are not computable in general. They come with the downside that it is less obvious to Dafny +programmers how their code will behave at runtime, and hence these new features come with +the same downside. Dafny is free to implement an expression like `seq x <- a | x in b` by either enumerating the values +of `a` and skipping those not in `b`, or by sorting the elements of `b` according to the ordering in `a`. This flexibility +is excellent for future optimizations, but makes it harder to reason about runtime performance. It can be argued that if any compiled Dafny program uses existing features that depend on domain enumeration, such as `exists` expressions or `:|` statements, its runtime behavior already depends heavily on how this domain searching is implemented. The compiler only enforces that such domains are finite and enumerable, but an inefficient choice can lead to the search for a matching value taking orders of magnitude more time than expected. -Therefore we should already be documenting this aspect of Dafny semantics and implementation better anyway. +Therefore we should already be documenting this aspect of Dafny semantics and implementation better anyway, potentially +including communicating this through the IDE. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -TODO +The most obvious alternative is to only provide the simpler `foreach x <- C` syntax, with only a single bound variable, +no additional filtering, and with `C` directly providing the enumeration and not just the ordering of values. +This is certainly a well-formed and sound construct, but far less powerful than the proposed version, especially +when sequence comprehensions are excluded. Offering `foreach` loops along with sequence comprehensions means we can +recommend the latter where possible, with the former as a fallback that is inevitably more complicated to verify. # Prior art [prior-art]: #prior-art Sequence comprehensions bear a strong resemblance to list comprehensions in Python, which also include similar abilities to -bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]` +bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]`. +Haskell and Cryptol also feature similar constructs. + +Although the syntax `for[each] x in c` is more common, there is precedent for using the symbol `<-` instead in other languages, +including Scala, Cryptol, and Haskell. Using something other than `in` is recommended to avoid confusion with the `in` boolean operator, +but I don't have a strong opinion about which symbol or keyword we use. Most mainstream programming languages include rich libraries for working with collections and iterators, and define much the same operations such as "filter" and "flatMap". The proposed Dafny features provide similar functionality at a higher level -of abstraction more amenable to verification: quantification expressions are used to succinctly declare the behavior of +of abstraction that will require less effort from Dafny programmers to successfully verify: +quantification expressions are used to succinctly declare the behavior of an operation that in spirit applies several such operations. See the "Future Possibilities" section for ideas for pushing this philosophy even further. @@ -453,19 +498,25 @@ sums the indices of all non-zero elements of the array `a`. # Unresolved questions [unresolved-questions]: #unresolved-questions -TODO: - -* Is inconsistency between quantification ordering and `<` an issue? (e.g. `<` on datatypes means "rank comparison" -and can't be compiled, string prefix rather than lexicographic ordering, etc) - -```dafny -function QuickSort(s: seq): seq { - // Oops, this discards multiplicity - seq x | x in s -} by method { - // ... -} -``` +It is very convenient to support ordered quantification over the natural ordering of types, but +Dafny's choice of built-in ordering concepts is currently somewhat surprising. Historical, Dafny has biased towards +defining well-founded orderings to support proofs of termination rather than intuitive orderings +users would expect. For example, `<` on datatypes is defined as "rank comparison", such that `Some(x) < Some(Some(x))`. +For strings (and more generally sequences), `<` means "proper prefix of", and the well-founded ordering is +"subsequence of", and neither is the lexicographic comparison users would expect. +As a somewhat orthogonal improvement, we may want to make breaking changes to the language to align `<` with +typical programmer expectations and the "natural ordering" concept, as well as defining a separate operator +for the well-founded orderings (see https://github.com/dafny-lang/dafny/issues/762). + +Along similar lines, will users frequently want to customize the ordering of their quantifications without +creating explicit collection values? To produce the sequence `[4, 3, 2, 1, 0]`, for example, +we could consider supporting something like `seq i by > | 0 <= i < 5`. + +Although the `seq i | i in mySet` pattern is great for expressing the sorted sequence of elements in +a set, there doesn't seem to be as direct a way to express the sorted sequence of elements in another +sequence or multiset, since the plain `i` declaration implies a natural ordering, which doesn't allow +duplicates. This could also be addressed by the above idea of customizing the ordering, if +`seq i <- mySeq by <` was allowed. # Future possibilities [future-possibilities]: #future-possibilities @@ -490,6 +541,15 @@ Unicode. Modelling unicode correctly will involve specifying that the semantic c a sequence of Unicode code points that is not efficiently accessed with indices. The features proposed here provide ways to work with such datatypes efficiently at runtime. +## Let-such-that expressions without uniqueness proofs + +An expression `var x :| P(x); ...` is currently only allowed in specification contexts +if Dafny can prove that there is exactly one such `x` that satisfies `P(x)`. +With ordered quantification in our toolbox, we could lift this requirement when +the quantification is ordered, and ensure that the first such `x` according to that +order is picked. This would be more useful if we also allowed the same additional syntax, +i.e. `var x <- C :| P(x); ...`. + ## Array comprehensions The same syntax for sequence comprehensions could also be used for flexible array initialization, @@ -539,11 +599,4 @@ This mirrors the `Collector` concept from the Java streaming APIs, and ideally t parameter to `collect` expressions would define similar operations for merging intermediate results to leave the door open for parallel or even distributed computation. -## Notes - -Binding patterns: -* x by myLessThen (future) - user-provided ordering -Provide a way to specify the ordering when you don't have a collection? -* x: int will be a wibble wobble order, but x: nat will be sequential - * Technially that's a wibble wobble order restricted to nat values! From 461e92dd50ea9b1a966c12f4e25ae38b8c319398 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Thu, 12 May 2022 13:24:48 -0700 Subject: [PATCH 06/13] A few more edits --- 0005-ordered-quantification.md | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 275c7c4..608227c 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -104,7 +104,6 @@ We can augment this existing syntax with a notion of ordering, and allow quantif * The quantified variable declarations define the values each binding maps to each variable, AND how these bindings are ordered. * The range expression after the `|` restricts the quantification to a subset of these bindings, but does not influence their ordering. - A quantifier domain only guarantees an ordering of bindings, but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! This is consistent with existing expressions such as `set x: real | x in myFiniteSet && P(x)`: @@ -256,7 +255,7 @@ possible values of the types of the bound variables. This is only permitted if a are enumerable, which is not true of the `real` type, for example. This supports an elegant pattern for mapping over simple datatypes and other types with few values, including subset types. -``` +```dafny datatype Suit = Clubs | Diamonds | Hearts | Spades foreach s: Suit { @@ -274,8 +273,8 @@ foreach i: SmallIndex { } ``` -There will not initially be any builtin support for automatically tracking the enumerated values so far, as there is for -`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). +There will not initially (see Open Questions) be any builtin support for automatically tracking the enumerated values so far, +as there is for `iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). It is straightforward to track these values manually, however: ```dafny @@ -378,7 +377,7 @@ pre-condition to a lambda expression, as this issue highlights: https://github.c As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden -is ensuring that enumeration ordering is deterministic in contexts where it needs to be. +is ensuring that enumeration ordering is threaded through, and deterministic in contexts where it needs to be. ## Verification @@ -425,12 +424,13 @@ The existing resolution logic for quantified domains in Dafny applies heuristics conjuncts in the range expression that define a bounded, potentially-enumerable set of values. The [`BoundedPool`](https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/AST/DafnyAst.cs#L11487) type is used to represent these cases, and models common patterns such as `0 <= i < n` (`IntBoundedPool`), -`x in s` (`CollectionBoundedPool`), and enum-like datatypes with no parameterized constructors (`DatatypeBoundedPool`). +`x in s` (`CollectionBoundedPool`), enum-like datatypes with no parameterized constructors (`DatatypeBoundedPool`), +and `x == C` (`ExactBoundedPool`). Different bounded pool types have different "virtues", including whether or not they are finite or enumerable. In compiled contexts, Dafny will produce an error if it cannot identify at least one enumerable bound. To support ordered quantification, this mechanism will be extended to include tracking the ordering these pools -will enumerate values with. The `x <- C` syntax will be another potential `CollectionBoundedPool` source, and usually one that encodes +will enumerate values with. The `x <- C` syntax will be another potential `CollectionBoundedPool` source, and often one that encodes an explicit enumeration order. The existing compilation logic treats ordering as pure implementation detail, and applies optimizations to try to make the search as short as possible. Ensuring consistent ordering is an additional constraint on this logic, applied only when the surrounding context is a sequence comprehension or `foreach` loop. @@ -518,6 +518,14 @@ sequence or multiset, since the plain `i` declaration implies a natural ordering duplicates. This could also be addressed by the above idea of customizing the ordering, if `seq i <- mySeq by <` was allowed. +Tracking the values enumerated so far and/or the number of iterations in a `foreach` loop +is possible with manual helper functions as illustrated above, but only when the source collection is a sequence. +It may be a good idea in the next iteration of the feature to have convenient mechanisms for this, +if they are frequently requested. They may be best addressed when adding support for +user-defined collection types and enumerators instead. Another possibility is supporting +parallel orderings as in Cryptol or Haskell, such that indexing an arbitrary collections could be expressed as +`seq x <- s (some symbol) i <- (0 to *) :: (s, i)`, but I don't see an obvious choice of syntax. + # Future possibilities [future-possibilities]: #future-possibilities From cc9c9f179974c6aaece9d617e0ac667ede171119 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Thu, 12 May 2022 13:34:57 -0700 Subject: [PATCH 07/13] Link typo --- 0005-ordered-quantification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 608227c..8b1abb1 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -1,6 +1,6 @@ - Feature Name: ordered_quantification - Start Date: 2022-02-22 (twosday!) -- RFC PR: [dafny-lang/rfcs#9](https://github.com/dafny-lang/rfcs/pull/10) +- RFC PR: [dafny-lang/rfcs#10](https://github.com/dafny-lang/rfcs/pull/10) - Dafny Issue: [dafny-lang/dafny#1753](https://github.com/dafny-lang/dafny/issues/1753) # Summary From f6747ef452428dbc89fe8936a6f9c418ee6f5af1 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Thu, 12 May 2022 13:58:51 -0700 Subject: [PATCH 08/13] Added implementation plan --- 0005-ordered-quantification.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 8b1abb1..61086e7 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -441,6 +441,22 @@ One benefit to adding `foreach` loops to Dafny is that this logic can be refacto phase, rewriting quantification of many kinds to explicit `foreach` loop AST nodes, which can then be translated to target languages constructs in later phases. +# Implementation plan + +This RFC proposes a lot of functionality, which can be delivered in multiple smaller phases: + +1. Sequence comprehensions for a single quantified variable with a sequence source, only usable in ghost code. + 1. Depends on at least minimal support for ordered quantification. + 2. Ensures verification is effective before moving on to other derived features. +2. Compiling sequence comprehensions. +3. Other source collection types (`set`, `multiset`, etc.) +4. Declaring quantifier variables using pattern matching. +5. Multiple quantifier variables. +6. `foreach` loops. +7. etc. + +It may even be possible to implement multiple extensions in parallel. + # Drawbacks [drawbacks]: #drawbacks From 3f009946c8e82aa87855e01c0baa532088e98a4d Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Fri, 13 May 2022 07:17:08 -0700 Subject: [PATCH 09/13] Update 0005-ordered-quantification.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Clément Pit-Claudel --- 0005-ordered-quantification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 61086e7..e46edb7 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -149,7 +149,7 @@ The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindi 1. `x == 2, y == 4` In addition, collection expressions used in declarations are permitted to refer to variables declared in previous declarations. -The domain `x <- [[1, 2], [3, 4], y <- x` therefore produces these bindings: +The domain `x <- [[1, 2], [3, 4]], y <- x` therefore produces these bindings: 1. `x == [1, 2], y == 1` 1. `x == [1, 2], y == 2` From 963abeaa68467e4eda40ea39528cc3b1fcc6335b Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Fri, 13 May 2022 13:08:10 -0700 Subject: [PATCH 10/13] Address most feedback --- 0005-ordered-quantification.md | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 61086e7..47939a8 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -139,8 +139,8 @@ There will be three supported variable declaration styles, and others could pote This is a generalization of the previous case that supports pattern matching, as in variable declarations and match statements. It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`. -A single quantifier domain may include multiple such declarations separated by commas, -in which case the orderings described for each clause take lexicographic precedence. +When a single quantifier domain includes multiple declarations separated by commas, +the the orderings take precedence from left to right. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in this order: 1. `x == 1, y == 3` @@ -149,7 +149,7 @@ The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindi 1. `x == 2, y == 4` In addition, collection expressions used in declarations are permitted to refer to variables declared in previous declarations. -The domain `x <- [[1, 2], [3, 4], y <- x` therefore produces these bindings: +The domain `x <- [[1, 2], [3, 4]], y <- x` therefore produces these bindings: 1. `x == [1, 2], y == 1` 1. `x == [1, 2], y == 2` @@ -236,7 +236,7 @@ foreach x: real | 0.0 <= x < 1.0 { // Error: ... The quantifier domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. `decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided to ensure the loop terminates even if the domain is potentially infinite. For example, the following (very slow) example collects -five arbitrary primes (assuming that Dafny can figure out how to enumerate the `allPrimes` infinite set): +at most five arbitrary primes (assuming that Dafny can figure out how to enumerate the `allPrimes` infinite set): ```dafny var allPrimes := iset n | !exists i, j | 1 < i <= j < n :: i * j == n; @@ -248,6 +248,12 @@ foreach p <- allPrimes if |fivePrimes| == 5 { break; } fivePrimes := fivePrimes + [x]; } +// We can't assert |fivePrimes| == 5 here, since +// we can't prove that the loop couldn't have terminated +// earlier by running out of primes. +// The decreases metric proves that the loop will terminate, +// but is only an upper bound on the number of iterations. +assert |fivePrimes| <= 5; ``` Note that the range expression is optional, and if omitted the loop will iterate over all @@ -360,8 +366,8 @@ assert (seq x <- b | x in a) == [3, 1]; ``` Since sequence comprehensions are expressions rather than statements, they carry an additional restriction -when used in functions: their ordering must be fully deterministic. If `s` is a `set`, for example, -`seq x <- s` would be rejected in specification contexts, whereas `seq x | x in s` would be allowed. +when compiled: their ordering must be fully deterministic. If `s` is a `set`, for example, +`seq x <- s` would be rejected in compiled code, whereas `seq x | x in s` would be allowed. This is very similar to the restriction on `:|` let-such-that expressions, which is not relevant for equivalent `:|` assign-such-that statements. @@ -377,7 +383,8 @@ pre-condition to a lambda expression, as this issue highlights: https://github.c As mentioned in the guide-level explanation, `foreach` loops and sequence comprehensions are both able to borrow concepts and implementation substantially from other features. Parsing, resolving, verifying, and compiling quantifier domains is already a mature aspect of the Dafny implementation. The most significant implementation burden -is ensuring that enumeration ordering is threaded through, and deterministic in contexts where it needs to be. +is ensuring that the required quantification ordering is threaded through the stack, +and checked to be deterministic when used in compiled sequence comprehensions. ## Verification @@ -462,7 +469,7 @@ It may even be possible to implement multiple extensions in parallel. The primary drawback to this proposal is the burden of adding the new concept of quantification ordering. This means extra cognitive load for Dafny users, extra implementation effort for any new builtin collection types, -and additional testing to ensure that quantification domains are deterministically enumerable when required. +and additional testing to ensure that quantification domains are deterministically ordered when required. The bounded pool heuristics described above are necessary in order to support compiling expressions such as set comprehensions that are not computable in general. They come with the downside that it is less obvious to Dafny @@ -565,9 +572,9 @@ Unicode. Modelling unicode correctly will involve specifying that the semantic c a sequence of Unicode code points that is not efficiently accessed with indices. The features proposed here provide ways to work with such datatypes efficiently at runtime. -## Let-such-that expressions without uniqueness proofs +## Compiled let-such-that expressions without uniqueness proofs -An expression `var x :| P(x); ...` is currently only allowed in specification contexts +An expression `var x :| P(x); ...` is currently only allowed in compiled contexts if Dafny can prove that there is exactly one such `x` that satisfies `P(x)`. With ordered quantification in our toolbox, we could lift this requirement when the quantification is ordered, and ensure that the first such `x` according to that From 698bf03850c21cc30b68edbaa0193cab48560eb3 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Fri, 13 May 2022 16:09:15 -0700 Subject: [PATCH 11/13] Move case patterns on quantifier variable declarations to future possibilities --- 0005-ordered-quantification.md | 39 ++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 47939a8..12cee6a 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -114,7 +114,7 @@ The new features that are affected by quantification ordering will behave simila `seq x: real | x in mySet && P(x)` is calculated via the same filtered enumeration, but collected into a sorted sequence instead. -There will be three supported variable declaration styles, and others could potentially be added in the future: +There will be two supported variable declaration styles, and others could potentially be added in the future: 1. `x [: T]` @@ -134,11 +134,6 @@ There will be three supported variable declaration styles, and others could pote The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. -3. ` <- C` - - This is a generalization of the previous case that supports pattern matching, as in variable declarations and match statements. - It allows destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`, and filtering, as in `Some(x) <- mySetOfOptionals`. - When a single quantifier domain includes multiple declarations separated by commas, the the orderings take precedence from left to right. The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindings in this order: @@ -160,9 +155,7 @@ The overall syntax for quantifier domains will become: ``` QuantifierVarDecl ::= - Ident [":" Type] - | Ident [":" Type] "<-" Expression - | CasePatternLocal "<-" Expression + Ident [":" Type] [ "<-" Expression ] QuantifierDomain ::= QuantifierVarDecl @@ -174,8 +167,6 @@ Note that this concept and the new `<-` syntax applies to any use of quantifier is irrelevant for all current uses: the semantics of `[i]set` and `map` comprehensions, `forall` and `exists` expressions, and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. Importantly, these extensions to the syntax and semantics of quantifier domains are all fully backwards compatible. -They do offer a more efficient expression of common uses of these existing features, however: `set Some(x) <- s` is equivalent to -`set x | x in s && x.Some? :: x.value`, for example. ## foreach Loops @@ -294,7 +285,8 @@ foreach x <- s { It is even possible to attach this ghost state directly to the sequence with a helper function: ```dafny -foreach (x, xs) <- WithPrefixes(c) { +foreach xAndXs <- WithPrefixes(c) { + var (x, xs) := xAndXs; ... } ``` @@ -310,7 +302,8 @@ function method WithPrefixes(s: seq): seq<(T, ghost seq)> { Similarly, a helper function can be provided to maintain a running index of the enumerated values: ```dafny -foreach (x, i) <- WithIndexes(s) { +foreach xAndI <- WithIndexes(s) { + var (x, i) := xAndI; ... } @@ -340,7 +333,7 @@ using sequence comprehensions: ```dafny // Filtering to optional values var s: seq> := ...; -var filtered: seq := seq Some(x) <- s; +var filtered: seq := seq o <- s | o.Some? :: o.value; // Zipping two lists together var a: seq := ...; @@ -554,6 +547,24 @@ parallel orderings as in Cryptol or Haskell, such that indexing an arbitrary col Adding this building block to the Dafny language opens up a wide array of tempting new features! +## Pattern matching on quantifier variables + +The LHS of `<-` declarations could be extended to allow case patterns, just as variable declaration statements do. +This would allow destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`. It could also support filtering, as in `Some(x) <- mySetOfOptionals`. +This would be consistent with how case patterns are used in match statements; these two loops would be equivalent, for example: + +```dafny +foreach Some(x) <- [None, Some(42), None] { + print x; +} + +foreach o <- [None, Some(42), None] { + match o { + case Some(x) => print x; + } +} +``` + ## User-defined collections As discussed in the earlier [`for` loops RFC](https://github.com/dafny-lang/rfcs/pull/4), the semantics of From 5c5e2909b923763cca363b41e1c6677dd7b92531 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Tue, 24 May 2022 21:25:01 -0700 Subject: [PATCH 12/13] Per quantified variable domains and ranges --- 0005-ordered-quantification.md | 341 +++++++++++++++++---------------- 1 file changed, 179 insertions(+), 162 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 12cee6a..9882c23 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -81,10 +81,29 @@ The runtime implementation of this loop can use the native iteration features of Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and expressions rather than statements allows logic to be used in specifications as well as implementation, and reduces the effort Dafny users have to spend on proving their code correct. -The proposed sequence comprehension expression allows more -logic that ultimately produces a sequence of values to be expressed as a value rather than a statement. -Just producing a sequence of the values in the set above, sorted by the natural ordering of `int` values, -is simple using a sequence comprehension expression: `seq i: int | i in s`. +The proposed sequence comprehension expression allows more logic +that ultimately produces a sequence of values to be expressed as a value rather than a statement. +Just producing a sequence of the values in the set above, in a non-deterministic order, +is simple using a sequence comprehension expression: `seq i <- s`. + +These higher-level features also provide a simpler Dafny idiom for compilers to recognize +and translate to idiomatic code in a target language, which is an important requirement for some Dafny users. +A sequence comprehension like `seq e: Entry <- myList | e.active :: e.id` could be translated to a loop like the following in Java: + +```java +List result = new ArrayList<>(myList.size()); +for (Entry e : myList) { + if (e.isActive()) { + result.add(e.getID()); + } +} +``` + +Or this, which is arguably more idiomatic for most Java logic that works with collections: + +```java +List result = myList.stream().filter(Entry::isActive).map(Entry::getID).collect(Collectors.toList()); +``` # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -95,44 +114,59 @@ Therefore, allow me to outline this concept first before then describing `foreac ## Quantification ordering Several existing Dafny features support a common syntax for quantifying over one or more variables, internally referred to -as a "quantifier domain". For example, the `x: nat, y: nat | x < 2 && y < 2` portion of the set comprehension -`set x: nat, y: nat | x < 2 && y < 2 :: (x, y)`. The boolean expression after the `|` is referred to as the *range*, -and I will label the `x: nat` and `y: nat` sections as *quantifier variable declarations*. -We can augment this existing syntax with a notion of ordering, and allow quantified variables to bind to duplicate values. The key points are: +as a *quantifier domain*. For example, consider this universal quantifier: -* Any quantifier domain defines a potentially-infinite, partially-ordered set of *quantifier bindings*. -* The quantified variable declarations define the values each binding maps to each variable, AND how these bindings are ordered. -* The range expression after the `|` restricts the quantification to a subset of these bindings, but does not influence their ordering. +```dafny +forall x: nat, y: nat | 0 < x && 0 < y :: x % gcd(x, y) == 0 && y % gcd(x, y) == 0 +``` -A quantifier domain only guarantees an ordering of bindings, -but is NOT prescriptive on how to enumerate this domain at runtime, if it is compiled! -This is consistent with existing expressions such as `set x: real | x in myFiniteSet && P(x)`: -although the unfiltered domain of `real` values is not enumerable, the bound `x in myFiniteSet` is, -and at runtime this comprehension is calculated by enumerating the contents of `myFiniteSet` -and filtering out values that do not satisfy `P(x)`. -The new features that are affected by quantification ordering will behave similarly: -`seq x: real | x in mySet && P(x)` is calculated via the same filtered enumeration, -but collected into a sorted sequence instead. +The quantifier domain is this case is the `x: nat, y: nat | 0 < x && 0 < y` portion. +The boolean expression after the `|` is referred to as the *range*, +and I will label the `x: nat` and `y: nat` sections as *quantifier variable declarations* +(or alternatively, *quantified variable declarations*). -There will be two supported variable declaration styles, and others could potentially be added in the future: +We can extend the syntax and semantics of this concept in natural, +backwards-compatible ways to allow a well-defined notion of ordering over some quantification domains, +which will be the foundation of both `foreach` loops and sequence comprehensions and potentially other future language features. -1. `x [: T]` +### Quantified variable domains - In the existing syntax for quantifier variables (where the type `T` may be explicitly provided or omitted and inferred), - the bindings will be ordered according to the *natural ordering* of the type `T`. Not all Dafny types will - have such a natural ordering defined, but at a minimum `int` and `real`, and any subset type or newtype based on them, - will use the natural mathematical ordering. - `x: int | x in {2, 5, -4}`, for example, would bind `x` to `-4`, `2`, and `5` in that order. +By default, any quantified variable ranges over all values of its type, and is only further constrained by the range expression. +The first extension is to allow an optional *quantifier variable domain*, declared with the syntax `x <- C` where `C` is a collection. +Initially only the builtin collection types (`[i]set`, `multiset`, `[i]map`, and `seq`) will be allowed, +but support for user-defined collection types will be possible in the future. +If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; +`map.Items` can be used instead to bind key-value pairs. -2. `x [: T] <- C` +Assuming that we have `posNat == iset n: nat | 0 <= n`, the set comprehension above could also then be expressed as: - In this new syntax, the bindings will be defined and ordered according to the expression `C`, which must be a collection. Initially only the builtin collection types (`[i]set`, `multiset`, `[i]map`, and `seq`) will be supported, but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` should be used instead to bind key-value pairs. +```dafny +forall x <- posNat, y <- posNat :: x % gcd(x, y) == 0 && y % gcd(x, y) == 0 +``` - Unlike the first case, this syntax may produce duplicate bindings. - The ordering of bindings is non-deterministic unless `C` is a sequence. - If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. +This aligns nicely with the variation of guarded quantifier notation in mathematics that explicitly attaches domains to variables: +$\forall x \in \mathbb{N}^+, y \in \mathbb{N}^+ \bullet x | gcd(x, y) \land y | gcd(x, y)$. + +Besides offering a slightly more succinct expression of existing quantification domains, +quantified variable domains also specify the ordering of quantification bindings, and allow variables to be bound to duplicate values. +This ordering of bindings is non-deterministic unless `C` is a sequence. +If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. +Note that ordering and multiplicity is irrelevant for all current uses: +the semantics of `[i]set` and `[i]map` comprehensions, `forall` and `exists` expressions, +and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. - The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". Note that `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. +The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". +`<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. +This should help clarify that while any `x` value bound from `x <- C` also satisfies `x in C`, +they are different concepts and the former carries more information than the latter. + +Quantified variable domains will be optional in all existing features that use them, +but required on variables quantified by a `foreach` loop or sequence comprehension. +This is to avoid having to define the ordering of an expression like `seq x: int`: +should this have a non-deterministic ordering, or should it be ordered +according to the natural mathematical ordering on integers? +This restriction could be lifted in the future if the answer becomes clearer +(and see also Future Possibilities). When a single quantifier domain includes multiple declarations separated by commas, the the orderings take precedence from left to right. @@ -143,7 +177,8 @@ The domain `x <- [1, 2], y <- [3, 4]` will therefore specify the following bindi 1. `x == 2, y == 3` 1. `x == 2, y == 4` -In addition, collection expressions used in declarations are permitted to refer to variables declared in previous declarations. +The domain expression of one quantified variable is allowed to reference other variables +declared earlier in the quantifier domain. The domain `x <- [[1, 2], [3, 4]], y <- x` therefore produces these bindings: 1. `x == [1, 2], y == 1` @@ -151,22 +186,42 @@ The domain `x <- [[1, 2], [3, 4]], y <- x` therefore produces these bindings: 1. `x == [3, 4], y == 3` 1. `x == [3, 4], y == 4` -The overall syntax for quantifier domains will become: +### Quantified variable ranges + +Since quantifier variable domains are values themselves, they carry well-formedness requirements with them as well. +To make quantifier domains with multiple quantified variables easier to express, +range expressions will be supported after each variable declaration, instead of as a separate component of quantifier domains. +Assuming that `seqOfInts` is of type `seq` and the function `natsFromZeroTo` accepts a `nat` rather than an `int`, +the quantifier domain `x <- seqOfInts | 0 <= x, y <- natsFromZeroTo(x)` is then well-formed. + +The overall syntax for quantifier domains will therefore become: ``` QuantifierVarDecl ::= - Ident [":" Type] [ "<-" Expression ] + Ident [":" Type] [ "<-" Expression ] { Attribute } [ "|" Expression ] QuantifierDomain ::= - QuantifierVarDecl - { "," QuantifierVarDecl } - [ "|" Expression ] + QuantifierVarDecl { "," QuantifierVarDecl } ``` -Note that this concept and the new `<-` syntax applies to any use of quantifier domains, even though ordering and multiplicity -is irrelevant for all current uses: the semantics of `[i]set` and `map` comprehensions, `forall` and `exists` expressions, -and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. -Importantly, these extensions to the syntax and semantics of quantifier domains are all fully backwards compatible. +It is perhaps a bit surprising that this change does not break existing quantification domains, +but it only means existing range expressions will be re-interpreted as binding to the last quantifier variable, +where it can still reference all quantified variables and has the same filtering semantics. +The only wrinkle is attributes like `{:heapQuantifier}` that are currently only allowed in-between +the quantified variable declarations and the single range. To maintain backwards compatibility, +`QuantifierVarDecl` accepts attributes just before the optional range expression. +This means that the parser will accept attributes in-between variables, +but these can be easily rejected by the resolver instead. + +Note that in quantification contexts not sensitive to ordering or multiplicity, +quantified variable domains and ranges can both be rewritten back into a single range without changing semantics. +For example, note that the RHS of the following identity is equally well-formed +because `&&` is short-circuiting in Dafny: + +```dafny +(forall x <- C | P1(x), y <- f(x) | P2(x, y) :: Q(x, y)) == +(forall x, y | x in C && P1(x) && y in f(x) && P2(x, y) :: Q(x, y)) +``` ## foreach Loops @@ -183,8 +238,14 @@ The feature supports much more sophisticated uses as well, however, including binding multiple variables at once and filtering: ```dafny -var myDoubleDeckerSeq: seq> := ...; -foreach x <- myDoubleDeckerSeq, y <- x | y != 0 { +var mySeq: seq := ...; +function method getSomeValues(x: nat): seq +{ + ... +} +foreach x <- mySeq | 0 <= x, + y <- getSomeValues(x) | y != 0 +{ Process(y); } ``` @@ -204,8 +265,8 @@ More specifically, this expands to: ``` ForeachLoop ::= "foreach" - QuantifierVarDecl { "," QuantifierVarDecl } - [ "|" Expression ] + Ident [":" Type] [ "<-" Expression ] [ "|" Expression ] + { "," Ident [":" Type] [ "<-" Expression ] [ "|" Expression ] } { InvariantDecl | ModifiesDecl | DecreasesDecl } [ BlockStmt ] ``` @@ -214,19 +275,21 @@ A `foreach` loop closely resembles a `forall` statement, the key difference bein statement executes its body once for every binding of quantified values simultaneously, whereas a `foreach` loop executes its body once for each binding of quantified values in sequence, one at a time. -Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop -must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable, +Similarly to compiled set comprehensions or assign-such-that statements, the domain of a `foreach` loop +must be enumerable, even in ghost contexts. The following loop would produce the error `"the domain of a foreach loop must be enumerable, but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`. ```dafny -foreach x: real | 0.0 <= x < 1.0 { // Error: ... +var someReals := iset n | 0.0 <= x < 1.0; +foreach x <- someReals { // Error: ... ... } ``` -The quantifier domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. -`decreases *` is permitted, in which the `foreach` loop may never terminate. Any other `decreases` clause can be provided -to ensure the loop terminates even if the domain is potentially infinite. For example, the following (very slow) example collects +The quantifier domain is allowed to be potentially infinite if an explicit `decreases` clause is provided. +`decreases *` is permitted in compiled contexts, in which the `foreach` loop may never terminate. +Any other `decreases` clause can be provided +to ensure the loop terminates even if the domain is potentially infinite. For example, the following example collects at most five arbitrary primes (assuming that Dafny can figure out how to enumerate the `allPrimes` infinite set): ```dafny @@ -240,36 +303,13 @@ foreach p <- allPrimes fivePrimes := fivePrimes + [x]; } // We can't assert |fivePrimes| == 5 here, since -// we can't prove that the loop couldn't have terminated +// we haven't proven that the loop couldn't have terminated // earlier by running out of primes. // The decreases metric proves that the loop will terminate, // but is only an upper bound on the number of iterations. assert |fivePrimes| <= 5; ``` -Note that the range expression is optional, and if omitted the loop will iterate over all -possible values of the types of the bound variables. This is only permitted if all such types -are enumerable, which is not true of the `real` type, for example. This supports an elegant -pattern for mapping over simple datatypes and other types with few values, including subset types. - -```dafny -datatype Suit = Clubs | Diamonds | Hearts | Spades - -foreach s: Suit { - ... -} - -foreach b1: bool, b2: bool { - expect TestMyFunctionWithOptions(b1, b2) == true; -} - -type SmallIndex = x: nat | 0 <= x < 8 - -foreach i: SmallIndex { - ... -} -``` - There will not initially (see Open Questions) be any builtin support for automatically tracking the enumerated values so far, as there is for `iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). It is straightforward to track these values manually, however: @@ -282,7 +322,7 @@ foreach x <- s { } ``` -It is even possible to attach this ghost state directly to the sequence with a helper function: +It is even possible to attach this ghost state directly to sequences with a helper function: ```dafny foreach xAndXs <- WithPrefixes(c) { @@ -306,11 +346,6 @@ foreach xAndI <- WithIndexes(s) { var (x, i) := xAndI; ... } - -// Alternatively, if s is a sequence: -foreach i, x | 0 <= i < |s| && s[i] == x { - ... -} ``` ## Sequence comprehensions @@ -321,8 +356,7 @@ A sequence comprehension has identical syntax to a set comprehension, except tha ``` SeqComprehension ::= "seq" - QuantifierVarDecl { "," QuantifierVarDecl } - "|" Expression + QuantifierDomain [ :: Expression ] ``` @@ -335,22 +369,11 @@ using sequence comprehensions: var s: seq> := ...; var filtered: seq := seq o <- s | o.Some? :: o.value; -// Zipping two lists together -var a: seq := ...; -var b: seq := ...; -assert |a| == |b|; -var zipped := seq i | 0 <= i < |a| :: (a[i], b[i]); - // Map and then flatten (a.k.a. "FlatMap") var c: seq := ...; var f: S -> seq := ...; var flatMapped: seq := seq x <- c, y <- f(x) :: y; -// Sorted sequence from a set -var setOfReals := {3.141, 2.718, 1.618}; -var sortedList := seq x | x in setOfReals; -assert sortedList == {1.618, 2.718, 3.141}; - // Intersection of sequences var a := [4, 1, 3, 5]; var b := [3, 1, 7]; @@ -360,16 +383,10 @@ assert (seq x <- b | x in a) == [3, 1]; Since sequence comprehensions are expressions rather than statements, they carry an additional restriction when compiled: their ordering must be fully deterministic. If `s` is a `set`, for example, -`seq x <- s` would be rejected in compiled code, whereas `seq x | x in s` would be allowed. +`seq x <- s` would be rejected in compiled code. This is very similar to the restriction on `:|` let-such-that expressions, which is not relevant for equivalent `:|` assign-such-that statements. -Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, -but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature. -Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as -`seq i | 0 <= i < size :: f(i)`. They also avoid the common trap of forgetting to explicitly attach the `requires 0 <= i < size` -pre-condition to a lambda expression, as this issue highlights: https://github.com/dafny-lang/dafny/issues/1332 - # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -386,15 +403,15 @@ set comprehensions are encoded as their indicator predicates, but there is no su for sequence comprehensions as defined here. The initial encoding will likely be with a function, similar to the `Seq#Create` function used for seq constructions, that accepts a source sequence and indicator predicate. The useful axioms about these values will fall into three categories, sketched below for the simple -case of comprehensions with a single variable bound from a sequence and no explicit term: +case of comprehensions with a single variable bound from a sequence: -`var S := seq x <- C | P(x);` +`var S := seq x <- C | P(x) :: Q(x);` -1. Membership: `forall x :: P(x) <==> x in S` +1. Membership: `forall x :: x in C && P(x) ==> Q(x) in S` 2. Cardinality: `|S| <= |C|` 3. Ordering: `forall i, j | 0 <= i < j < |S| :: exists i', j' | 0 <= i' < j' < |C| :: - C[i'] == S[i] && C[j'] == S[j]` + Q(C[i']) == S[i] && Q(C[j']) == S[j]` The translation of `foreach` loops should be reducible to sequence comprehensions and other existing features for loops in general, @@ -402,12 +419,12 @@ as the semantics of such loops can usually be expressed as follows: ```dafny // A loop of this form: -foreach x1, x2, ..., xN | { +foreach x1, x2, ..., xN { } // Is semantically equivalent to: -var __s := seq x1, x2, ..., xN | :: (x1, x2, ..., xN); +var __s := seq x1, x2, ..., xN :: (x1, x2, ..., xN); for __i := 0 to |__s| { var (x1, x2, ..., xN) := __s[__i]; @@ -441,16 +458,22 @@ One benefit to adding `foreach` loops to Dafny is that this logic can be refacto phase, rewriting quantification of many kinds to explicit `foreach` loop AST nodes, which can then be translated to target languages constructs in later phases. +Note that quantifier variable domain collections do not necessarily have to be materialized as values! +In the expression `seq pair <- myMap.Items :: pair.1`, for example, a separate set value holding the contents of +`myMap.Items` does not have to be built at runtime. Only an enumerator or target-language loop of some kind +to enumerate over these values is necessary. + # Implementation plan This RFC proposes a lot of functionality, which can be delivered in multiple smaller phases: -1. Sequence comprehensions for a single quantified variable with a sequence source, only usable in ghost code. +1. Quantifier variable domains and ranges. + 1. Adds the new syntax but rewrites it immediately after parsing. +2. Sequence comprehensions for a single quantified variable with a sequence source, only usable in ghost code. 1. Depends on at least minimal support for ordered quantification. 2. Ensures verification is effective before moving on to other derived features. -2. Compiling sequence comprehensions. -3. Other source collection types (`set`, `multiset`, etc.) -4. Declaring quantifier variables using pattern matching. +3. Compiling sequence comprehensions. +4. Other source collection types (`set`, `multiset`, etc.) 5. Multiple quantifier variables. 6. `foreach` loops. 7. etc. @@ -481,12 +504,24 @@ including communicating this through the IDE. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives +The primary rationale for this design is to leverage the fact that Dafny already heavily relies on expressing +quantification directly in the language, to the point that it already has a similar `forall` statement +expressed using the same quantification syntax. Allowing quantification to also +declare sequential bindings provides a lot of expressive power for a small language change, +and is just as composable with existing language features. + The most obvious alternative is to only provide the simpler `foreach x <- C` syntax, with only a single bound variable, no additional filtering, and with `C` directly providing the enumeration and not just the ordering of values. This is certainly a well-formed and sound construct, but far less powerful than the proposed version, especially when sequence comprehensions are excluded. Offering `foreach` loops along with sequence comprehensions means we can recommend the latter where possible, with the former as a fallback that is inevitably more complicated to verify. +An earlier version of this proposal (https://github.com/dafny-lang/rfcs/pull/9) described an alternative +where the ordering and multiplicity of bindings was specified by clauses such as `x in C` +in a quantifier domain's range expression. +This had the advantage of not requiring any new syntax, +but also broke expectations of referential transparency of boolean expressions in these ordered contexts. + # Prior art [prior-art]: #prior-art @@ -514,26 +549,6 @@ sums the indices of all non-zero elements of the array `a`. # Unresolved questions [unresolved-questions]: #unresolved-questions -It is very convenient to support ordered quantification over the natural ordering of types, but -Dafny's choice of built-in ordering concepts is currently somewhat surprising. Historical, Dafny has biased towards -defining well-founded orderings to support proofs of termination rather than intuitive orderings -users would expect. For example, `<` on datatypes is defined as "rank comparison", such that `Some(x) < Some(Some(x))`. -For strings (and more generally sequences), `<` means "proper prefix of", and the well-founded ordering is -"subsequence of", and neither is the lexicographic comparison users would expect. -As a somewhat orthogonal improvement, we may want to make breaking changes to the language to align `<` with -typical programmer expectations and the "natural ordering" concept, as well as defining a separate operator -for the well-founded orderings (see https://github.com/dafny-lang/dafny/issues/762). - -Along similar lines, will users frequently want to customize the ordering of their quantifications without -creating explicit collection values? To produce the sequence `[4, 3, 2, 1, 0]`, for example, -we could consider supporting something like `seq i by > | 0 <= i < 5`. - -Although the `seq i | i in mySet` pattern is great for expressing the sorted sequence of elements in -a set, there doesn't seem to be as direct a way to express the sorted sequence of elements in another -sequence or multiset, since the plain `i` declaration implies a natural ordering, which doesn't allow -duplicates. This could also be addressed by the above idea of customizing the ordering, if -`seq i <- mySeq by <` was allowed. - Tracking the values enumerated so far and/or the number of iterations in a `foreach` loop is possible with manual helper functions as illustrated above, but only when the source collection is a sequence. It may be a good idea in the next iteration of the feature to have convenient mechanisms for this, @@ -550,7 +565,8 @@ Adding this building block to the Dafny language opens up a wide array of tempti ## Pattern matching on quantifier variables The LHS of `<-` declarations could be extended to allow case patterns, just as variable declaration statements do. -This would allow destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`. It could also support filtering, as in `Some(x) <- mySetOfOptionals`. +This would allow destructuring datatypes and tuples, as in `(k, v) <- myMap.Items`. +It could also support filtering, as in `Some(x) <- mySetOfOptionals`. This would be consistent with how case patterns are used in match statements; these two loops would be equivalent, for example: ```dafny @@ -575,6 +591,24 @@ arbitrary Dafny types that provide a way to obtain an enumerator for their conte the necessary interface. This is also likely the best way to provide a better user experience for looping or quantifying over the values produced by an `iterator` type. +## Range expressions + +This would interpret the existing `i..j` syntax, +currently only usable in sequence or array indexing expressions such as `s[i..j]`, +as an alternate option for expressing the sequence value `[i, i + 1, ..., j - 1]`. +To avoid actually materializing these values, range expressions would be implemented in the runtimes as a special case +`RangeSequence` type, similar to the existing lazily-evaluated `ConcatSequence` datatype in the C# and Java runtimes. +This would allow us to deprecate sequence constructions in favour of sequence comprehensions, +as any construction `seq(size, f)` could be rewritten as `seq i <- 0..size :: f(i)`. + +## Ordered infinite collections + +A quantifier variable domain only specifies the ordering of bindings and not necessarily how they are enumerated. +Just as a set comprehension like `set x: object | x in S1 && P(x)` +may be enumerable even though the set of allocated objects are not, +we could introduce a variation of `iset` to represent partially-ordered sets, +so that `seq x <- objectsOrderedByFoo | x in S1 && P(x)` could provide a deterministically-ordered equivalent. + ## Unicode strings As I have previously ranted about in https://github.com/dafny-lang/dafny/issues/413, the Dafny `string` type is currently an alias @@ -589,23 +623,8 @@ An expression `var x :| P(x); ...` is currently only allowed in compiled context if Dafny can prove that there is exactly one such `x` that satisfies `P(x)`. With ordered quantification in our toolbox, we could lift this requirement when the quantification is ordered, and ensure that the first such `x` according to that -order is picked. This would be more useful if we also allowed the same additional syntax, -i.e. `var x <- C :| P(x); ...`. - -## Array comprehensions - -The same syntax for sequence comprehensions could also be used for flexible array initialization, -since this is another case where the user has to declare an ordered source of values: - -``` -var a := new int[10] i :: i * i; -var a := new int[|s|] i :: s[i]; // Where s is a seq -var a := new int[] i | 10 <= i < 20; -``` - -It would likely be prudent to limit these cases to those where the size of the domain -is easy to infer statically, so that the size of the array to allocate is known before -enumerating the values. +order is picked. This would probably necessitate allowing such statements to specify +variable domains as well, i.e. `var x <- C :| P(x); ...`. ## Multiset comprehensions @@ -625,20 +644,18 @@ a parameterized collection expression to aggregate a quantifier domain into a single value: ```dafny -collect(&&) x: T | P(x) :: Q(x) == forall x: T | P(x) :: Q(x) -collect(||) x: T | P(x) :: Q(x) == exists x: T | P(x) :: Q(x) -collect(SeqBuilder) x: T | P(x) :: Q(x) == seq x: T | P(x) :: Q(x) -collect(SetBuilder) x: T | P(x) :: Q(x) == set x: T | P(x) :: Q(x) -collect(+) x: T | P(x) :: Q(x) == (summation) -collect(*) x: T | P(x) :: Q(x) == (product) -collect(<) x: T | P(x) :: Q(x) == (minimum) -collect(Averager) x: T | P(x) :: Q(x) == (average) -collect(First(n)) x: T | P(x) :: Q(x) == Take(seq x: T | P(x) :: Q(x), n) +collect(&&) x <- C | P(x) :: Q(x) == forall x <- C | P(x) :: Q(x) +collect(||) x <- C | P(x) :: Q(x) == exists x <- C | P(x) :: Q(x) +collect(SeqBuilder) x <- C | P(x) :: Q(x) == seq x <- C | P(x) :: Q(x) +collect(SetBuilder) x <- C | P(x) :: Q(x) == set x <- C | P(x) :: Q(x) +collect(+) x <- C | P(x) :: Q(x) == (summation) +collect(*) x <- C | P(x) :: Q(x) == (product) +collect(<) x <- C | P(x) :: Q(x) == (minimum) +collect(Averager) x <- C | P(x) :: Q(x) == (average) +collect(First(n)) x <- C | P(x) :: Q(x) == Take(seq x <- C | P(x) :: Q(x), n) ... ``` This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the parameter to `collect` expressions would define similar operations for merging intermediate results to leave the door open for parallel or even distributed computation. - - From af3027063d1c0dc59c58194883228e41ba6d31f1 Mon Sep 17 00:00:00 2001 From: Robin Salkeld Date: Fri, 27 May 2022 11:04:22 -0700 Subject: [PATCH 13/13] PR feedback --- 0005-ordered-quantification.md | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/0005-ordered-quantification.md b/0005-ordered-quantification.md index 9882c23..04304ed 100644 --- a/0005-ordered-quantification.md +++ b/0005-ordered-quantification.md @@ -26,6 +26,7 @@ support efficient random access, and are more likely to be used incorrectly by a ```dafny // Before: method AllPairs(s: seq) returns (result: seq<(nat, nat)>) { + result := []; for i := 0 to |s| { var left := s[i]; for j := 0 to |s| { @@ -138,7 +139,7 @@ but support for user-defined collection types will be possible in the future. If `C` is a `map`, the bound values will be the keys of the `map`, in order to be consistent with the meaning of `x in C`; `map.Items` can be used instead to bind key-value pairs. -Assuming that we have `posNat == iset n: nat | 0 <= n`, the set comprehension above could also then be expressed as: +Assuming that we have `posNat == iset n: nat | 0 < n`, the set comprehension above could also then be expressed as: ```dafny forall x <- posNat, y <- posNat :: x % gcd(x, y) == 0 && y % gcd(x, y) == 0 @@ -151,9 +152,9 @@ Besides offering a slightly more succinct expression of existing quantification quantified variable domains also specify the ordering of quantification bindings, and allow variables to be bound to duplicate values. This ordering of bindings is non-deterministic unless `C` is a sequence. If `C` is a `multiset`, multiple bindings with the same value of `x` will be created, but their ordering is still non-deterministic. -Note that ordering and multiplicity is irrelevant for all current uses: -the semantics of `[i]set` and `[i]map` comprehensions, `forall` and `exists` expressions, -and `forall` statements all do not depend on the order of quantification and semantically ignore duplicates. +Ordering and duplication are not relevant for `[i]set` and `[i]map` comprehensions, `forall` and `exists` expressions, +and `forall` statements. Nevertheless, it is natural to extend these comprehensions with the `x <- C` syntax, as it is equivalent to +`x | x in C` but is slightly more compact, and avoids having to repeat the `x` syntactically. The `<-` symbol would be read as "from", so a statement like `foreach x <- C { ... }` would be read as "for each x from C, ...". `<-` is not an independent operator and is intentionally distinct from the `in` boolean operator. @@ -387,6 +388,12 @@ when compiled: their ordering must be fully deterministic. If `s` is a `set This is very similar to the restriction on `:|` let-such-that expressions, which is not relevant for equivalent `:|` assign-such-that statements. +Note this restriction is technically a property of the particular compiler and runtime, and not a fundamental limitation implied by the language semantics. +Currently the various runtimes use some kind of `HashSet` datatype to represent Dafny sets, +and these generally may enumerate the value `{1, 2, 3}` in a different order than `{3, 2, 1}`, for example. +The runtime datatypes could be changed to guarantee a deterministic ordering, but this would carry a complexity and potential performance cost. +The workaround will be to provide a standard library `function by method` that sorts the result of `seq x <- s` and hence restores determinism. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -575,9 +582,9 @@ foreach Some(x) <- [None, Some(42), None] { } foreach o <- [None, Some(42), None] { - match o { - case Some(x) => print x; - } + match o + case None => + case Some(x) => print x; } ```