-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
foreach loops and sequence comprehensions #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving initial comments. Yay, that looks like a very useful feature, especially seq comprehensions
foreach right | right in s { | ||
result := result + (left, right); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to have the following
foreach left, right | left in s && right in s {
result := result + (left, right);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that thought too - that's equally valid (and does illustrate the power of the more general foreach
) but I was a little wary of implying this is always the more idiomatic way to do it. I felt more confident it made sense in the sequence comprehension version.
// Or even better, as an expression: | ||
function AllPairs(s: seq<nat>): seq<(nat, nat)> { | ||
seq left, right | left in s && right in s | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that would be a nice addition to the language.
If you want to make your example more compelling (meaning no comprehension can emulate your basic example, you could consider doing two operations at once, like computing an out variable "maximum".
maximum := 0;
foreach left, right | left in s && right in s {
result := result + (left, right);
var tmpMax := left + right;
maximum := if tmpMax > maximum then tmpMax else maximum;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll put some thought into improving the motivating example - I think even the eventual user-facing documentation should have a good example of where a foreach
loop is necessary because a sequence comprehension doesn't cut it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems to me that, even if foreach
and sequence comprehensions were identical in what they could express, foreach
would feel a lot more natural if the body is very imperative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of why it feels right to me to design both features at once, as they are kind of two sides of the same fundamental coin: imperative and functional.
```dafny | ||
foreach i | i in s { | ||
print i, "\n"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be soo compelling. I hope you can break this down further and obtain an iterator for a set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a strong future possibility at least. Only supporting these higher-level operations over the elements of a set buys us a ton for now though, so I'm comfortable leaving it as future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to be able to write those three lines, too!
I'd also love to see set iterators in the future, but agree that it's not necessary in the first iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line with what @samuelgruetter said above, though: since in Dafny i in [1,2,3] <==> i in [3,2,1]
, shouldn't foreach i | i in [1,2,3] { … }
be the same as foreach i | i in [3,2,1] { … }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since foreach
is not an expression but a statement, its semantics require an ordering, and we would syntactically extract this ordering from the expression. In my answer below, I explain how we could encode the ordering for the verifier, so that the two foreach statement you mention wouldn't be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not right: var s: seq<int> :| set(s) == {1,2,3}
is an expression, and it doesn't require an ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not require ordering because there is only one statement to execute :-)
To be fair, I should have said, is a statement that will execute other statements in some orders
|
||
```dafny | ||
foreach x: nat | ||
decreases 10 - x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you add the invariant that x is less or equal to 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is what happens when you can't actually run the verifier to catch your mistakes. :) Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10
. One problem would be that this does not hold true of all values that the foreach
statement will consider (since this example used a foreach
loop with no range expression). The other problem is that the loop invariant of a foreach
statement is not allowed to mention the index variable (this is also true of the for
statement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need a better example here - it was questionable in the first place and trying to patch it with an invariant made it worse.
Note that the range expression is optional, and if omitted the loop will iterate over all | ||
possible values of the types of the bound variables. This is only permitted if all such types | ||
are enumerable, which is not true of the `real` type, for example. This supports an elegant | ||
pattern for iterating over simple datatypes and other types with few values, including subset types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a wonderful use case for testing !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is cool!
* This means that the `&&` operation is not fully symmetrical, but this is already true as verification considers it to be "short-circuiting": | ||
`b != 0 && a/b > 1` is valid but `a/b > 1 && b != 0` is not. | ||
* This definition is chosen such that the multiplicities of results from `A && B` is the same as `B && A`, even if the ordering is not the same. | ||
* For an expression of the form `A || B`, the enumeration will simply produce the values enumerated by `A` followed by those enumerated by `B`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case was not covered even before for set comprehensions, it would complain with "heuristics cannot figure out blablah".
The solution is to manually compute the sequence on which to iterate. For example, replacing x in a || x in b
by x in (a + b)
.
I think it would make sense to omit support for finding enumerations based on ||
, and the IDE could suggest such rewritings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the existing heuristics only consider conjuncts and give up on disjuncts. I may do as you say and trim disjuncts out of scope initially - I just wanted to make sure there was a reasonable answer to the semantics of enumerating disjuncts so we have a firm foundation. If and when there's need for it, though, I'd prefer to just implement it rather than have users rewrite their code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikaelMayer But part of the motivation here is efficiency, and merging sets isn't a great idea for performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's true. a + b
at least strongly implies that a new value needs to be allocated at runtime, whereas enumerating x in a || x in b
less so. There are definitely ways to optimize that (e.g. the lazy sequence concatenation logic in the C# and Java runtimes) but I like the idea of making it easier to infer the rough runtime cost just by looking at the structure of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you wanted to enumerate the elements of a + b
without duplication, you would explicitly write x in a || (x in b && !x in a)
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that would work! It would only be efficient if both a
and b
had efficient in
tests though.
0003-for-each.md
Outdated
seq x: T | !A == seq x: T - seq x | A // Not an actual built-in operation, but a generalization of set subtraction. | ||
// Only well-formed if T itself is enumerable. | ||
|
||
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2] | |
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2, 1] |
?
Also:
seq x | x in [1, 2, 2] && x in [2, 2, 1] == [1, 2, 2, 2, 2] | |
(seq x | x in [1, 2, 2] && x in [2, 2, 1]) == [1, 2, 2, 2, 2] |
If you could fix parentheses below that'd be great. Parsing will always consider the seq expression to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent question! The way I've defined the enumeration for conjuncts, the multiplicities are multiplied. The way this example breaks down is:
seq x | x in [1, 2, 2] && x in [2, 2, 1]
== seq x | x in [1] && x in [2, 2, 1]
+ seq x | x in [2, 2] && x in [2, 2, 1]
== [1]
+ [2, 2, 2, 2]
And thanks for catching all the missing parentheses. We'll see if I'm sufficiently annoyed by them to improve that. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. It's not obvious at all but it makes sense now. That would work the same way for a multiset, right?
Co-authored-by: Mikaël Mayer <[email protected]>
(accepting the edit resolved the conversation and even un-resolving it still won't let me reply inline :) Yes, the behaviour of multisets is similar: (seq x | x in [1, 2, 2] && x in multiset{2, 1, 2}) == [1, 2, 2, 2, 2]
(seq x | x in multiset{2, 1, 2} && x in multiset{2, 1, 2}) == [2, 2, 1, 2, 2] // Or possibly other orderings It's a bit unfortunate that although these rules are also well-defined for multiset comprehensions if we add those later, they don't line up with the semantics of multiset operations in Dafny: multiset intersection ( |
|
||
type SmallIndex = x: nat | 0 <= x < 8 | ||
|
||
foreach i: SmallIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the enumeration values computed here? Is it possible to translate x: nat | 0 <= x < 8
to something that efficiently creates the right values of x
? And what if the condition is more complex, does it become an enumeration over all natural numbers at some point?
Does the developer need to know the default order of basic types? Could there be scenarios where the developer has an incorrect assumption about the order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it definitely possible, and existing quantifiers and comprehensions already have a concept of "Bounded Pools" of values and how to iterate through them:
https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/AST/DafnyAst.cs#L11349
If you use a set comprehension like set x: nat | 0 <= x < 8
in a compiled context, the resolver already detects that the range is a single IntBoundedPool
, and the compiler will emit something like a foreach
loop in the target language to build up the runtime set value: https://github.com/dafny-lang/dafny/blob/master/Source/Dafny/Compilers/SinglePassCompiler.cs#L4671-L4686
With the existing features, if the heuristics can't figure out a finite pool to search through, you get an error. That will be the case here as well unless you're using a foreach
loop with an explicit decreases
clause.
You're right that for some cases users will want to know the enumeration ordering of some of these pools. I make the argument in other places that we should already document all of this better, since it affects the runtime performance of several existing Dafny features like the above. That will likely be a beneficial side-effect of this work.
``` | ||
datatype Suit = Clubs | Diamonds | Hearts | Spades | ||
|
||
foreach s: Suit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the type has an infinite size. How are the enumeration values generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my previous answer - does that cover it?
but is substantially more general and expressive. A very simple example only looks slightly different than expected: | ||
|
||
```dafny | ||
foreach x | x in [1, 2, 3, 4, 5] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the time complexity of computing the enumeration values? Suppose I have
foreach x | x in xs && x in ys {
}
What time complexity should I expect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, and I should include that dimension in the more finalized definition of domain enumeration.
In that particular case it should be O(|xs|
* L
), where L
is the cost of a in
membership test in the ys
collection. If ys
is a set you'd expect it to be O(1) on average, but if it's a sequence you'd expect it to be O(|ys|
).
In some cases we could likely optimize more cleverly, but we should be able to set upper bounds on the average and worst cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That xs
serves as the value generating source here seems too implicit. You're saying that x in ys && x in xs
could result in better performance if the membership test for xs
is faster than for ys
, but given that &&
is usually a lazy operator, my intuition would be to put the fastest membership test first.
which is generally a good default choice since this datatype offers constant-time access on average. | ||
Although different copies of the same set must place each value into the same hashing bucket, multiple | ||
values may be assigned to the same bucket, so it is necessary to use a secondary data structure of some kind for each bucket. | ||
A common simple approach is to use a linked list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the requirement of a deterministic ordering based only on the set of contained elements.
I'm confused, I would think that either:
- You want to enumeration order over a collection to be defined, and then it must be deterministic as well
- You don't define the enumeration order, and then it doesn't have to be deterministic either since apparently you don't care about it, likely because the order doesn't affect the output of your program. You might be summing the elements of a set for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need the order to be deterministic so that, for example, the identity S == S' ==> (seq x | x in S)[0] == (seq x | x in S')[0]
always holds. That's critical for verification to be sound and agree with compiled runtime behavior. But you don't actually have to know whether seq x | x in {1, 2}
is actually [1, 2]
or [2, 1]
. That's what I mean by "under-specified".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the origin of the problem seems to be that you want to have an ordered iteration (like (seq x | x in S)
) over an unordered collection.
Why not disallow an ordered iteration over an unordered collection?
[prior-art]: #prior-art | ||
|
||
Sequence comprehensions bear a strong resemblance to list comprehensions in Python, which also include similar abilities to | ||
bind multiple variables and filter the enumerated values: `[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find there's a big difference to Python's comprehensions and the ones introduced in this RFC, namely that in Python a value generator has to be specified, while in this RFC the comprehension seems to extract a value generator from a type and a boolean expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. It's really only related prior art in that Python's comprehensions are values expressed in a similar way rather than loops. Sequence comprehensions are definitely more declarative and less explicit about the implementation.
|
||
// After: | ||
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) { | ||
foreach left | left in s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reuse the word "for"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in still support the existing for <variable> := <lo> to <hi> { ... }
syntax but also support for <variables> | <range> { ... }
?
That should be doable in the parser AFAICT. I have a slight preference for how foreach
reads but perhaps that's not worth the cost of yet another keyword. On the other hand communicating about the two different features will be easier if we can just say "for loops and foreach loops" versus "for loops over a range and for loops over a quantifier domain" or something like that. Open to opinions on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use different keywords for for
and foreach
loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.
foreach k, v | k in m.Keys && m[k] == v { | ||
print "m[", k, "] == ", v; | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked it even more when I just had
foreach k, v | m[k] == v {
print "m[", k, "] == ", v;
}
but then realized I needed the k in m.Keys
part. :)
Note that the existing `seq(size, i => i + 1)` syntax is inconsistently-named in the Dafny reference manual, | ||
but we refer to it here as a "sequence construction" expression, to disambiguate it from the proposed sequence comprehension feature. | ||
Sequence comprehensions are strictly more expressive, as any construction `seq(size, f)` can be rewritten as | ||
`seq i | 0 <= i < size :: f(i)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we could eventually move seq(size, f)
to a definition in the standard library, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely - @RustanLeino was willing to fully deprecate the old version as long as we handled the equivalent sequence comprehension equally well in verification. Having an equivalent standard library version would make dropping support for it easier!
a boolean expression and iterates through all matching bindings. For example, the statement `foreach(a[int i] != 0) n += i;` | ||
sums the indices of all non-zero elements of the array `a`. The JMatch version declares the bound variables inline in | ||
the boolean expression, which is also proposed here as a future possibility below. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another instance I'm aware of that provides a capability not available in this proposal. I don't know if it's worth the complexity of supporting, but perhaps worth considering (or perhaps including in here to say why it's not the right thing to do).
Cryptol has two sequence comprehension separators, sequential and parallel. The sequential separators work like those in Python (and Haskell), and like &&
in this proposal. For example, the cross product of two sequences is:
crossProduct xs ys = [ (x, y) | x <- xs, y <- ys ]
But you can also use a |
instead of a comma to get this:
zip xs ys = [ (x, y) | x <- xs | y <- ys ]
You can also get this by zipping two iterators together first and then iterating over the result, which isn't much harder. The slightly more powerful (but sometimes confusing!) use case is when you alternate ,
and |
in arbitrary ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! Cryptol's version is at least worth naming as prior art as it's even closer to this proposal - thanks for the pointer!
My plan is more likely to handle that case when we support user-defined collection types. I included zipping as a key higher-order operation in these experiments, and I'm fairly convinced it's maximally "Dafny-like" to require the LHS and RHS have provably-equal lengths. Between that and the fact that there's no natural boolean operator to express zipping, I'd say it's it doesn't fit as well here.
It seems to me that this RFC introduces a notion of
If I understood correctly, the semantics of a seq comprehension depends on the syntactic structure of a To give an example of such a rewrite, let's consider the expression
If I understood the proposed semantics correctly, it would evaluate to
Now add let's make the definition
and ask: Does the seq comprehension
produce the same result as the original one? If the answer is No, it would mean that applying the rewrite lemma
in the original seq comprehension would change its result, which violates basic expectations about the behavior of equality (if On the other hand, if the answer is Yes, what if I define
and now consider the expression
This one will produce
(note that to prove this equality, the Dafny verifier will unfold Now, a potential solution to this problem would be to say that no rewrites are allowed in
and consider the modified seq comprehension
Dafny should be able to prove that it's equivalent to the original one (otherwise proofs would become very cumbersome and confusing), but that does require applying the rewrite
in the original seq comprehension. This problem would go away if we syntactically separated the source collection being iterated from the filtering predicates. The above example would then be written as
which would make it clear that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this exceptional write-up, Robin.
I have left detailed comments. I have some high-level feedback, too.
- On the performance aspect, I think we shouldn't require iterators to get good performance. There is no reason why removing one value from a set should be a slow operation. In fact, it should be fast, even if we are talking about pure sets. Let's fix that.
- On determinism: at the moment, we don't need our sets, maps, etc. to have predictable ordering. Adding that is introduces a performance concern, and we should evaluate whether this is the best approach. Would it really be an issue if iteration order was unpredictable for sequences, even in expressions? You could still write sequence comprehensions in functions, and you could implement them "by method" as needed, allowing you to leverage the nondeterminism for performance and restore determinism as needed.
Note that this last point about sequence comprehensions in functions is already true today: in fact, we have sequence comprehensions already! It's already possible to ask Dafny (in a function context, where performance does not matter) for a var s: seq<T> :| forall t: T :: t in s <==> P(t)
(which is essentially a sequence comprehension with filter P
). It would be nice to have better syntax for this in ghost function contexts, but do we really need it in function method context?
|
||
// Or even better, as an expression: | ||
function AllPairs(s: seq<nat>): seq<(nat, nat)> { | ||
seq left, right | left in s && right in s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this code doesn't compute the same thing as above, does it? Is order guaranteed?
``` | ||
|
||
This is functionally correct, but because set subtraction requires creating a fresh copy, | ||
when compiled this loop requires quadratic time and memory (or at least garbage-collection work) in the size of the set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a bug in the runtime, no?
|
||
A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall` | ||
loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop | ||
executes its body once for each tuple of quantified values in sequence, one at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domain of a forall statement doesn't have to be enumerable, I think, so it's not superseded by foreach
AFAICT.
but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`. | ||
|
||
```dafny | ||
foreach x: real | 0.0 <= x < 1.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not allowed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment to that effect just to be super clear :)
} | ||
``` | ||
|
||
The domain is allowed to be potentially infinite if the loop is used in a compiled context and an explicit `decreases` clause is provided. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the domain
here? If you have an invariant bounding the variable is that still an infinite domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I think assuming that the invariant is allowed to refer to the bounds variables at all, I don't think it should be allowed to influence the domain enumeration.
Most mainstream programming languages include rich libraries for working with collections and iterators, | ||
and define much the same operations such as "filter" and "flatMap". | ||
The proposed Dafny features provide similar functionality at a higher level | ||
of abstraction more amenable to verification: quantification expressions are used to succinctly declare the behavior of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about more amenable to verification: Do we think that the verifier will be better at dealing with seq expressions and the like than at dealing with plain for loops and/or recursive functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have said here that they should be friendlier for users, in terms of how much effort it takes to prove properties of a sequence comprehension compared to an equivalent loop or recursive function. My hope here is that the verifier will already know enough about an expression like seq x | x in mySeq && P(x)
such that no additional work is necessary.
|
||
```dafny | ||
var setOfReals := {3.141, 2.718, 1.618}; | ||
var sortedList := seq x | x is real && x in setOfReals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need x is real
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to provide a LHS that drives the desired ordering. I need to add this case (and more) to the ordering rules.
|
||
It would likely be prudent to limit these cases to those where the size of the domain | ||
is easy to infer statically, so that the size of the array to allocate is known before | ||
enumerating the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could say that these are equivalent to sequence comprehension + cast to array? I don't think we have a conversion to arrays from sequences, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly, although it turns out there's a syntax I wasn't aware of similar to sequence constructions:
var arr := new int[10] (i => i * i);
|
||
This mirrors the `Collector` concept from the Java streaming APIs, and ideally the shape of the | ||
parameter to `collect` expressions would define similar operations for merging intermediate results | ||
to leave the door open for parallel or even distributed computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should look into the loop
construct in lisp, which does all the above and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for the link! That's roughly the shape of what I'm thinking of.
|
||
One of the first milestones when implementing this RFC will be to prototype just enough verification | ||
plumbing to evaluate how difficult proving such properties will be, and how many additional axioms | ||
may need to be added to the Dafny prelude. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a proof with Dafny as of today:
predicate method IsLower(s: set<int>, m: int) {
forall y :: y in s ==> m <= y
}
function Min(s: set<int>) : (m: int)
requires s != {}
ensures m in s
ensures IsLower(s, m)
{
var m :| m in s;
if y :| y in s && y < m then
var s' := s - {m};
assert s == s' + {m};
Min(s')
else m
}
predicate Increasing(s: seq<int>) {
forall i, j | 0 <= i < j < |s| :: s[i] <= s[j]
}
lemma IncreasingConcat(hd: int, tl: seq<int>)
requires Increasing(tl)
requires forall x | x in tl :: hd <= x
ensures Increasing([hd] + tl)
{
var s := [hd] + tl;
forall i, j | 0 <= i < j < |s| ensures s[i] <= s[j] {
assert s[j] in tl;
}
}
lemma IsLowerSeq(s: set<int>, sq: seq<int>, m: int)
requires IsLower(s, m)
requires multiset(sq) == multiset(s)
ensures forall x | x in sq :: m <= x
{
forall x | x in sq ensures m <= x {
assert x in multiset(sq);
}
}
function method Sort(sq: seq<int>) : (sq': seq<int>)
ensures Increasing(sq')
ensures multiset(sq') == multiset(sq)
lemma MultiSetAddInj<T>(b: multiset<T>, a1: multiset<T>, a2: multiset<T>)
requires b + a1 == b + a2
ensures a1 == a2
{
assert forall k :: a1[k] == a2[k] by {
forall k {
calc {
a1[k];
(b[k] + a1[k]) - b[k];
(b + a1)[k] - b[k];
(b + a2)[k] - b[k];
(b[k] + a2[k]) - b[k];
a2[k];
}
}
}
}
lemma SortedElements_Unique(s: set<int>, sq: seq<int>)
requires multiset(sq) == multiset(s)
requires Increasing(sq)
ensures sq == SortedElements(s)
{
if sq == [] {
assert |sq| == |multiset(s)| == 0;
} else {
var hd := sq[0];
var sq' := sq[1..];
var s' := s - {hd};
calc { hd in sq; hd in multiset(sq); hd in multiset(s); hd in s; }
assert sq == [hd] + sq';
assert multiset(sq) == multiset([hd]) + multiset(sq');
assert s == {hd} + s';
assert multiset(sq) == multiset([hd]) + multiset(s');
MultiSetAddInj(multiset([hd]), multiset(sq'), multiset(s'));
assert Increasing(sq');
SortedElements_Unique(s', sq');
}
}
function SortedElements(s: set<int>): (sq: seq<int>)
ensures multiset(sq) == multiset(s)
ensures Increasing(sq)
{
if s == {} then []
else
var m := Min(s);
var s := s - {m};
var sq := SortedElements(s);
IsLowerSeq(s, sq, m); IncreasingConcat(m, sq);
[m] + sq
} by method {
sq := [];
var leftover := s;
ghost var removed := {};
while leftover != {}
decreases leftover
invariant leftover !! removed
invariant multiset(sq) == multiset(removed)
invariant multiset(leftover + removed) == multiset(s)
{
var x :| x in leftover;
leftover := leftover - {x};
removed := removed + {x};
sq := sq + [x];
}
sq := Sort(sq);
SortedElements_Unique(s, sq);
}
method Foo() {
var smallSet := {1, 2};
var smallSortedSeq := SortedElements(smallSet);
// smallSortedSeq is fully specified, but can the verifier figure that out?
assert multiset(smallSortedSeq) == multiset{1, 2};
assert smallSortedSeq == [1, 2];
}
Can we think about how that proof would change?
I had another thought, related to @samuelgruetter's response, and perhaps conveying roughly the same idea in a different way. In this proposal, the |
To the excellent @samuelgruetter 's analysis of cases, I answer this:
Based on Robin's proposal, Dafny's heuristics will decide that the collection to iterate on is
That should take care of everything, including proving or disproving equality of seq expressions. |
|
||
// After: | ||
method AllPairs(s: seq<nat>) returns (result: seq<(nat, nat)>) { | ||
foreach left | left in s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use different keywords for for
and foreach
loops. Their loop invariants are different, and (depending on the enumerator used) the latter may have an effect on the state beyond just the index variable.
|
||
// Or even better, as an expression: | ||
function AllPairs(s: seq<nat>): seq<(nat, nat)> { | ||
seq left, right | left in s && right in s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed seq
comprehension on L47 does not have an analogous set
comprehension today, because a set
comprehension with more than one bound variable is required to include a :: Expr
at the end. To look more like what we have today, L47 would be better rendered as
seq left, right | left in s && right in s :: (left, right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that's a typo. Will fix.
// s is a set<int> | ||
var ss := s; | ||
while ss != {} | ||
decreases |ss| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decreases
clause is not needed, since the default decreases ss
works fine for this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, don't tell me, tell the reference manual I copied it from. :) I can address it there separately.
The runtime implementation of this loop can use the native iteration features of the target language. | ||
|
||
Going further, it is better to avoid writing any imperative loop at all where possible. Sticking to immutable data and | ||
expressions rather than statements allows logic to be used in specifications as well as implementation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up until this point in the document, there are several feedback questions about the ordering. This will be answered later in the document, I presume, but it would be good to add a sentence somewhere here in the motivation section that reassures the reader that order will be dealt with later, and for now the examples are more intended in a dream-like fashion ("wouldn't it be nice if you could write something like this--and don't think too hard about the ordering yet").
|
||
```dafny | ||
var myDoubleDeckerSeq: seq<seq<int>> := ...; | ||
foreach x, y | x in myDoubleDeckerSeq && y in x && y != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I imagine that this particular range expression can be equivalently written as y != 0 && y in x && x in myDoubleDeckerSeq
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That version would be rejected, actually, since the rules for &&
say that the enumeration is nested left-to-right. y != 0
is enumerable (as a sub-domain of int
) but not finite, and then y in x
for each value of y
would not even be enumerable (i.e. all possible sequences that contain y
).
|
||
A `foreach` loop closely resembles a `forall` statement, the key difference being that a `forall` | ||
loop executes its body once for every tuple of quantified values simultaneously, whereas a `foreach` loop | ||
executes its body once for each tuple of quantified values in sequence, one at a time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the forall
statement is not a loop. It is an aggregate statement. Its compiled form is like a generalization of simultaneous assignment. The beauty of the forall
statement is that you use it without a loop invariant. :O This is made possible because--again--the forall
statement is not a loop in the first place!
Loops (while
, for
, foreach
) sequence their iterations. And to use them, you need a loop invariant, so that you can capture "what has been done so far".
Similarly to set comprehensions or assign-such-that statements, the domain of a `foreach` loop | ||
must be enumerable. The following loop would produce the error `"the domain of a foreach loop must be enumerable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The domain has to be enumerable only if the set comprehension or assign-such-that statement is compiled. It is not a requirement in ghost contexts. Is that true also for foreach
? It seems like a ghost foreach
could support domains that are not enumerable, provided they still are ordered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just chatted about this and came to the conclusion that a ghost foreach
should still require an enumerable domain, since otherwise it's hard to imagine the semantics of a loop where the body can observe the effects of the executions before it.
Furthermore it should also be finite, since otherwise there isn't a well-defined notion of the state of the program "after" the infinite loop. This isn't the case for a forall
statement since there is no looping, and the body executes once for each set of variable bindings simultaneously.
but Dafny's heuristics can't figure out how to produce an enumeration for 'x'"`. | ||
|
||
```dafny | ||
foreach x: real | 0.0 <= x < 1.0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not.
|
||
```dafny | ||
foreach x: nat | ||
decreases 10 - x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be added is not an invariant, but a range expression. There would be two problems with an invariant x <= 10
. One problem would be that this does not hold true of all values that the foreach
statement will consider (since this example used a foreach
loop with no range expression). The other problem is that the loop invariant of a foreach
statement is not allowed to mention the index variable (this is also true of the for
statement).
There is no builtin support for automatically tracking the enumerated values so far, as there is for | ||
`iterator` values (see https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-iterator-types). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be needed for every loop that you want to prove something about. It would be good to have built-in support for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually hoping that many such loops can be expressed as sequence comprehensions instead, where that isn't necessary. I'm open to using the same xs
trick if we have to, but trying to leave it out of the initial scope.
Thank you SO much for all the thoughtful reviews and comments everyone! I honestly wasn't expecting this much to go through so quickly and it's going to take me multiple days to respond, but rest assured I have no intention of ignoring any questions or feedback. Quite the contrary, I'm deeply enjoying the discussion. :) One thing I know already is that I need to make my sketch of the enumeration domain calculation more complete and rigorous, so I'm definitely going to tackle that as a response to several comments. |
I was intending to clarify how I've defined the ordered enumeration of quantifier domains, but after thinking about some of the common feedback here I've decided it really would be better not to extract it implicitly out of the range expression. I believe I have an alternative that loses barely any useful expressivity, if any. It will take some time to revise the whole proposal, but in a nutshell:
The only case this doesn't handle cleanly is the |
That's a very interesting idea, but I think it raises new questions. For example: seq i in [0, 1, 2, 3, 4], y in x[F(i+2)], z in w[y] | i % 3 != 0 && y % 2 == 0 && z %2 == 1 :: z How would you compile that? You cannot emit the range after iterating over the three variables, that would be under-performing at best, or even cause crashes at worst (for example if the condition of Consider now the much lower complexity of your original proposal, both for verification and compilation: seq i: int, y: int, z: int | i in [0, 1, 2, 3, 4] && i % 3 != 0 && y in x[F(i+2)] && y % 2 == 0 && z in w[y] && z %2 == 1 :: z Now it's clear how it is going to be both compiled and verified, just following the order of the operands. If you are really concerned about differentiating condition from enumeration, you could add the binary operator "<-" to the set of expressions, which translates to "in", except for comprehensions. seq i: int, y: int, z: int | i <- [0, 1, 2, 3, 4] && i % 3 != 0 && y <- x[F(i+2)] && y % 2 == 0 && z <- w[y] && z %2 == 1 :: z That way, it would be obvious that the operator "<-" is not tolerated in expressions other than comprehensions so they cannot be refactored. Moreover, you could easily distinguish an enumeration from a pure test, so you could actually have this: set i: int | i <- A && i in B does not mean the same computationally as set i: int | i in A && i <- B The first one will enumerate A (and ensure |
@MikaelMayer Yes, the new form would sacrifice some expressive power, in favour of avoiding the confusion of overloading boolean expressions with ordering/multiplicity semantics. In your example as written, I would expect the seq i in (seq i in [0, 1, 2, 3, 4] | i % 3 != 0),
y in (seq y in x[F(i+2)] | y % 2 == 0),
z in w[y] | z % 2 == 1
:: z I think this aligns well with the fact that the LHS of an I should also say I'm hoping I can weaken the statement that " var setOfReals := {3.141, 2.718, 1.618};
// Ordered by the implicit `real` type of `x`
// At runtime, setOfReals is the enumeration source,
// but the results are then sorted.
var sortedList := seq x | x in setOfReals;
assert sortedList == [1.618, 2.718, 3.141]; |
@cpitclaudel - I've been meaning to respond to your top-level comments for a while, apologies for the delay. :)
It's not a question of the runtimes being inefficient though, it's a more fundamental problem with sets being immutable values. If you have the statement
Making the ordering unpredictable in expressions leads to unsoundness unfortunately (and thanks to @RustanLeino for the example, which I've tweaked to use a sequence comprehension instead of function method Pick(s: set<int>): int
requires s != {}
{
(seq x in s)[0]
}
method Main() {
var s0 := {6, 28, 496};
var s1 := {6, 496} + {28};
assert s0 == s1;
var x, y := Pick(s0), Pick(s1);
assert x == y;
if x != y {
var badnessEnsues := 5 / 0;
}
}
You're correct that you can describe any notion of sequence comprehensions in a specification context (your expression is weaker than what I'm proposing since it doesn't specify ordering or multiplicity, but I'm sure there's a more complicated version that does as Mikael outlined earlier :). I've definitely seen a lot of compiled Dafny code that needs to process |
Well, let's not use the simplest implementation then :) Pure sets are very fast in practice, and certainly don't need to be quadratic. A simple balanced tree guarantees
Not more so than it does today with |
My bad, I'm talking about the simplest way to compile the source language semantics to the target runtime, not the simplicity of the runtime representation of var s := {1, 2, 3};
var x := 2;
var oldS := s;
s := s - {x};
print oldS;
print s; The way the code generation currently works would produce output something like: var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
var sTmp = new TreeSet<int>(s);
sTmp.Remove(x);
s = sTmp;
Console.Write(oldS); // {1, 2, 3}
Console.Write(s); // {1, 3} Having to make the fresh copy of var s = new TreeSet<int>({1, 2, 3});
var x = 2;
var oldS = s;
s.Remove(x);
Console.Write(oldS); // {1, 3} <- incorrect
Console.Write(s); // {1, 3} I was describing alternate translation strategies above for optimizing this, such as creating a
The missing part is that a This is why I'm proposing an implementation strategy for making more domains predictably enumerable, which as a side-effect would allow us to remove this uniquely-determined restriction on compiled |
The form you suggest is an interesting workaround, but 1) Unless you support iterator comprehensions, it will build all the intermediate data structures so we lose both performance and memory and 2) As a user, I don’t think it would be very friendly to write it this way, with a redundant “seq I in (seq I in” , and only for the last variable I can put a condition in the range and 3) “x in y” is still a predicate and people will still wonder why they can’t refactor it, it’s because as the same syntax as a predicate. 4) How would you encode integer ranges anyway, when you have the lower and upper bound?
At this point, since this is a proposal, one might propose to just add the conditions between the ranges like this:
seq i: int if 0 <= I <= 1000 && i % 3 != 0,
y in x[F(i+2)] if y % 2 == 0),
z in w[y] if z % 2 == 1
:: z
But then, wait, this syntax is really isomorphic with your initial proposal, the only thing that differs is that we can explicit what is the range and what is the test in the presence of “in” expressions.
Hence, I would love to have your feedback on introducing something like the new left arrow operator that indicates a “in” but with the hint that it’s going to be enumerated.
From: Robin Salkeld ***@***.***>
Sent: Tuesday, April 12, 2022 6:10 PM
To: dafny-lang/rfcs ***@***.***>
Cc: Mayer, Mikael ***@***.***>; Mention ***@***.***>
Subject: Re: [dafny-lang/rfcs] foreach loops and sequence comprehensions (PR #9)
@MikaelMayer<https://github.com/MikaelMayer> Yes, the new form would sacrifice some expressive power, in favour of avoiding the confusion of overloading boolean expressions with ordering/multiplicity semantics. In your example as written, I would expect the F(i+2) subexpression to be rejected as not well-formed, if F had a precondition that was only satisfied if i % 3 != 0 as specified in the range. That can be worked around by pulling the necessary filters into nested sequence comprehensions though, which still don't necessarily need to be materialized as actual sequence values at runtime:
seq i in (seq i [0, 1, 2, 3, 4] | i % 3 != 0),
y in (seq y in x[F(i+2)] | y % 2 == 0),
z in w[y] | z % 2 == 1
:: z
I think this aligns well with the fact that the LHS of an && expression can be assumed true on the RHS, but Dafny does not currently go out of its way to find other true expressions from elsewhere in the program to try to make the RHS well-formed.
I should also say I'm hoping I can weaken the statement that "in expressions on bound variables define the enumeration source" to "in expressions on bound variables define the ordering of results". That would give the heuristics more leeway to choose a more efficient enumeration strategy, and allow cases where the variable type is ordered but not enumerable (which was previously a less crisp Future Possibility, but seems more natural with this version):
var setOfReals := {3.141, 2.718, 1.618};
// Ordered by the implicit `real` type of `x`
// At runtime, setOfReals is the enumeration source,
// but the results are then sorted.
var sortedList := seq x | x in setOfReals;
assert sortedList == {1.618, 2.718, 3.141};
—
Reply to this email directly, view it on GitHub<#9 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA3PFNYTYSA5COTMA3EB54TVEX7D7ANCNFSM5SEEJZWA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I'm not sure I agree. With the right TreeSet representation (specifically, an immutable treeset), copies are
Eventually we do want to do this, but we don't need to right now: The C# We could even get rid of the log factor if we didn't have to be deterministic — but unfortunately we do.
I think that would be bad: when I write a function with |
@MikaelMayer I'm not sold on |
Indeed, maybe just I am confused. You win 3) :-) |
Design of the first step of dafny-lang/dafny#1753, and follow-up to #4
cc @samuelgruetter, @backtracking, @mschlaipfer, @parno, @RustanLeino
UPDATE: I'm planning on revising this soon based on ideas from the awesome discussion so far, so any interested parties should at least be aware of this comment, and may want to wait for the next revision (and I may even cut a fresh RFC to reset and avoid any confusion).