As with most languages, Dafny syntax is defined in two levels. First the stream of input characters is broken up into tokens. Then these tokens are parsed using the Dafny grammar.
The Dafny grammar is designed as an attributed grammar, which is a conventional BNF-style set of productions, but in which the productions can have arguments. The arguments control some alternatives within the productions, such as whether an alternative is allowed or not in a specific context. These arguments allow for a more compact and understandable grammar.
The precise, technical details of the grammar are presented together in Section 17. The expository parts of this manual present the language structure less formally. Throughout this document there are embedded hyperlinks to relevant grammar sections, marked as grammar.
Dafny source code files are readable text encoded in UTF-8.
All program text other than the contents of comments, character, string and verbatim string literals
consists of printable and white-space ASCII characters,
that is, ASCII characters in the range !
to ~
, plus space, tab,
carriage return and newline (ASCII 9, 10, 13, 32) characters.
(In some past versions of Dafny, non-ASCII, unicode representations of some mathematical symbols were
permitted in Dafny source text; these are no longer recognized.)
String and character literals and comments may contain any unicode character, either directly or as an escape sequence.
The characters used in a Dafny program fall into four groups:
- White space characters: space, tab, carriage return and newline
- alphanumerics: letters, digits, underscore (
_
), apostrophe ('
), and question mark (?
) - punctuation:
(){}[],.`;
- operator characters (the other printable characters)
Except for string and character literals, each Dafny token consists of a sequence of consecutive characters from just one of these groups, excluding white-space. White-space is ignored except that it separates tokens and except in the bodies of character and string literals.
A sequence of alphanumeric characters (with no preceding or following additional
alphanumeric characters) is a single token. This is true even if the token
is syntactically or semantically invalid and the sequence could be separated into
more than one valid token. For example, assert56
is one identifier token,
not a keyword assert
followed by a number; ifb!=0
begins with the token
ifb
and not with the keyword if
and token b
; 0xFFFFZZ
is an illegal
token, not a valid hex number 0xFFFF
followed by an identifier ZZ
.
White-space must be used to separate two such tokens in a program.
Somewhat differently, operator tokens need not be separated.
Only specific sequences of operator characters are recognized and these
are somewhat context-sensitive. For example, in seq<set<int>>
, the grammar
knows that >>
is two individual >
tokens terminating the nested
type parameter lists; the right shift operator >>
would never be valid here. Similarly, the
sequence ==>
is always one token; even if it were invalid in its context,
separating it into ==
and >
would always still be invalid.
In summary, except for required white space between alphanumeric tokens, adding or removing white space between tokens can never result in changing the meaning of a Dafny program. For most of this document, we consider Dafny programs as sequences of tokens.
This section defines character classes used later in the token definitions. In this section
- a backslash is used to start an escape sequence (so for example
'\n'
denotes the single linefeed character) - double quotes enclose the set of characters constituting a character class
- enclosing single
quotes are used when there is just one character in the class
(perhaps expressed with a
\
escape character) +
indicates the union of two character classes-
is the set-difference between the two classesANY
designates all unicode characters.
name | description |
---|---|
letter | ASCII upper or lower case letter; no unicode characters |
digit | base-ten digit ("0123456789") |
posDigit | digits, excluding 0 ("123456789") |
posDigitFrom2 | digits excluding 0 and 1 ("23456789") |
hexdigit | a normal hex digit ("0123456789abcdefABCDEF") |
special | `?_" |
cr | carriage return character (ASCII 10) |
lf | line feed character (ASCII 13) |
tab | tab character (ASCII 9) |
space | space character (ASCII 32) |
|
nondigitIdChar | characters allowed in an identifier, except digits (letter + special) idchar | characters allowed in an identifier (nondigitIdChar + digits) nonidchar | characters not in identifiers (ANY - idchar) charChar | characters allowed in a character constant (ANY - ''' - '\' - cr - lf) stringChar | characters allowed in a string constant (ANY - '"' - '\' - cr - lf) verbatimStringChar | characters allowed in a verbatim string constant (ANY - '"')
The special characters are the characters in addition to alphanumeric characters that are allowed to appear in a Dafny identifier. These are
'
because mathematicians like to put primes on identifiers and some ML programmers like to start names of type parameters with a'
,_
because computer scientists expect to be able to have underscores in identifiers, and?
because it is useful to have?
at the end of names of predicates, e.g.,Cons?
.
A nonidchar
is any character except those that can be used in an identifier.
Here the scanner generator will interpret ANY
as any unicode character.
However, nonidchar
is used only to mark the end of the !in
token;
in this context any character other than whitespace or printable ASCII
will trigger a subsequent scanning or parsing error.
Comments are in two forms.
- They may go from
/*
to*/
. - They may go from
//
to the end of the line.
A comment is identified as a token during the tokenization of
input text and is then discarded for the purpose of interpreting the
Dafny program. (It is retained to enable auto-formatting
and provide accurate source locations for error messages.)
Thus comments are token separators: a/*x*/b
becomes two tokens
a
and b
.
Comments may be nested, but note that the nesting of multi-line comments is behavior that is different from most programming languages. In Dafny,
method m() {
/* comment
/* nested comment
*/
rest of outer comment
*/
}
is permitted; this feature is convenient for commenting out blocks of program statements that already have multi-line comments within them. Other than looking for end-of-comment delimiters, the contents of a comment are not interpreted. Comments may contain any characters.
Note that the nesting is not fool-proof. In
method m() {
/* var i: int;
// */ line comment
var j: int;
*/
}
and
method m() {
/* var i: int;
var s: string := "a*/b";
var j: int;
*/
}
the */
inside the line comment and the string are seen as the end of the outer
comment, leaving trailing text that will provoke parsing errors.
Like many other languages, Dafny permits documentation comments in a program file. Such comments contain natural language descriptions of program elements and may be used by IDEs and documentation generation tools to present information to users.
In Dafny programs.
- Documentation comments (a) either begin with
/**
or (b) begin with//
or /*` in specific locations - Doc-comments may be associated with any declaration, including type definitions, export declarations, and datatype constructors.
- They may be placed before or after the declaration.
- If before, it must be a
/**
comment and may not have any blank or white-space lines between the comment and the declaration. - If after, any comments are placed after the signature (with no intervening lines), but before any
specifications or left-brace that starts a body, and may be
//
or/**
or/*
comments. - If doc-comments are in both places, only the comments after the declaration are used.
- If before, it must be a
- Doc-comments after the declaration are preferred.
- If the first of a series of single-line or multi-line comments is interpreted as a doc-string, then any subsequent comments are appended to it, so long as there are no intervening lines, whether blank, all white-space or containing program text.
- The extraction of the doc-string from a multiline comment follow these rules
- On the first line, an optional
*
right after/*
and an optional space are removed, if present - On other lines, the indentation space (with possibly one star in it) is removed, as if the content was supposed to align with A if the comment started with
/** A
for example.
- On the first line, an optional
- The documentation string is interpreted as plain text, but it is possible to provide a user-written plugin that provides other interpretations. VSCode as used by Dafny interprets any markdown syntax in the doc-string.
Here are examples:
const c0 := 8
/** docstring about c0 */
/** docstring about c1 */
const c1 := 8
/** first line of docstring */
const c2 := 8
/** second line of docstring */
const c3 := 8
// docstring about c3
// on two lines
const c4 := 8
// just a comment
// just a comment
const c5 := 8
Datatype constructors may also have comments:
datatype T = // Docstring for T
| A(x: int,
y: int) // Docstring for A
| B() /* Docstring for B */ |
C() // Docstring for C
/** Docstring for T0*/
datatype T0 =
| /** Docstring for A */
A(x: int,
y: int)
| /** Docstring for B */
B()
| /** Docstring for C */
C()
As can export
declarations:
module M {
const A: int
const B: int
const C: int
const D: int
export
// This is the eponymous export set intended for most clients
provides A, B, C
export Friends extends M
// This export set is for clients who need to know more of the
// details of the module's definitions.
reveals A
provides D
}
2.6. Tokens (grammar) {#sec-tokens}
The Dafny tokens are defined in this section.
Dafny has a set of reserved words that may not be used as identifiers of user-defined entities. These are listed here.
In particular note that
array
,array2
,array3
, etc. are reserved words, denoting array types of given rank. However,array1
andarray0
are ordinary identifiers.array?
,array2?
,array3?
, etc. are reserved words, denoting possibly-null array types of given rank, but notarray1?
orarray0?
.bv0
,bv1
,bv2
, etc. are reserved words that denote the types of bitvectors of given length. The sequence of digits after 'array' or 'bv' may not have leading zeros: for example,bv02
is an ordinary identifier.
In general, an ident
token (an identifier) is a sequence of idchar
characters where
the first character is a nondigitIdChar
. However tokens that fit this pattern
are not identifiers if they look like a character literal
or a reserved word (including array or bit-vector type tokens).
Also, ident
tokens that begin with an _
are not permitted as user identifiers.
A digits
token is a sequence of decimal digits (digit
), possibly interspersed with
underscores for readability (but not beginning or ending with an underscore).
Example: 1_234_567
.
A hexdigits
token denotes a hexadecimal constant, and is a sequence of hexadecimal digits (hexdigit
)
prefaced by 0x
and
possibly interspersed with underscores for readability (but not beginning or ending with an underscore).
Example: 0xffff_ffff
.
A decimaldigits
token is a decimal fraction constant, possibly interspersed with underscores for readability (but not beginning or ending with an underscore).
It has digits both before and after a single period (.
) character. There is no syntax for floating point numbers with exponents.
Example: 123_456.789_123
.
The escapedChar
token is a multi-character sequence that denotes a non-printable or non-ASCII character.
Such tokens begin with a backslash characcter (\
) and denote
a single- or double-quote character, backslash,
null, new line, carriage return, tab, or a
Unicode character with given hexadecimal representation.
Which Unicode escape form is allowed depends on the value of the --unicode-char
option.
If --unicode-char:false
is stipulated,
\uXXXX
escapes can be used to specify any UTF-16 code unit.
If --unicode-char:true
is stipulated,
\U{X..X}
escapes can be used to specify any Unicode scalar value.
There must be at least one hex digit in between the braces, and at most six.
Surrogate code points are not allowed.
The hex digits may be interspersed with underscores for readability
(but not beginning or ending with an underscore), as in \U{1_F680}
.
The braces are part of the required character sequence.
Note that although Unicode letters are not allowed in Dafny identifiers, Dafny does support Unicode in its character, string, and verbatim strings constants and in its comments.
The charToken
token denotes a character constant.
It is either a charChar
or an escapedChar
enclosed in single quotes.
A stringToken
denotes a string constant.
It consists of a sequence of stringChar
and escapedChar
characters enclosed in
double quotes.
A verbatimStringToken
token also denotes a string constant.
It is a sequence of any verbatimStringChar
characters (which includes newline characters),
enclosed between @"
and "
, except that two
successive double quotes represent one quote character inside
the string. This is the mechanism for escaping a double quote character,
which is the only character needing escaping in a verbatim string.
Within a verbatim string constant, a backslash character represents itself
and is not the first character of an escapedChar
.
The ellipsisToken
is the character sequence ...
and is typically used to designate something missing that will
later be inserted through refinement or is already present in a parent declaration.
A basic ordinary identifier is just an ident
token.
It may be followed by a sequence of suffixes to denote compound entities.
Each suffix is a dot (.
) and another token, which may be
- another
ident
token - a
digits
token - the
requires
reserved word - the
reads
reserved word
Note that
- Digits can be used to name fields of classes and destructors of
datatypes. For example, the built-in tuple datatypes have destructors
named 0, 1, 2, etc. Note that as a field or destructor name, a digit sequence
is treated as a string, not a number: internal
underscores matter, so
10
is different from1_0
and from010
. m.requires
is used to denote the precondition for methodm
.m.reads
is used to denote the things that methodm
may read.
A NoUSIdent
is an identifier except that identifiers with a leading
underscore are not allowed. The names of user-defined entities are
required to be NoUSIdent
s or, in some contexts, a digits
.
We introduce more mnemonic names
for these below (e.g. ClassName
).
A no-underscore-identifier is required for the following:
- module name
- class or trait name
- datatype name
- newtype name
- synonym (and subset) type name
- iterator name
- type variable name
- attribute name
A variation, a no-underscore-identifier or a digits
, is allowed for
- datatype member name
- method or function or constructor name
- label name
- export id
- suffix that is a typename or constructor
All user-declared names do not start with underscores, but there are internally generated names that a user program might use that begin with an underscore or are just an underscore.
A wild identifier is a no-underscore-identifier except that the singleton
_
is allowed. The _
is replaced conceptually by a unique
identifier distinct from all other identifiers in the program.
A _
is used when an identifier is needed, but its content is discarded.
Such identifiers are not used in expressions.
Wild identifiers may be used in these contexts:
- formal parameters of a lambda expression
- the local formal parameter of a quantifier
- the local formal parameter of a subset type or newtype declaration
- a variable declaration
- a case pattern formal parameter
- binding guard parameter
- for loop parameter
- LHS of update statements
A qualified name starts with the name of a top-level entity and then is followed by
zero or more DotSuffix
s which denote a component. Examples:
Module.MyType1
MyTuple.1
MyMethod.requires
A.B.C.D
The identifiers and dots are separate tokens and so may optionally be separated by whitespace.
Identifiers are typically declared in combination with a type, as in
var i: int
However, Dafny infers types in many circumstances, and in those, the type can be omitted. The type is required for field declarations and formal parameters of methods, functions and constructors (because there is no initializer). It may be omitted (if the type can be inferred) for local variable declarations, pattern matching variables, quantifiers,
Similarly, there are circumstances in which the identifier name is not needed, because it is not used. This is allowed in defining algebraic datatypes.
In some other situations a wild identifier can be used, as described above.
2.7.4. Quantifier Domains (grammar) {#sec-quantifier-domains}
Several Dafny constructs bind one or more variables to a range of possible values.
For example, the quantifier forall x: nat | x <= 5 :: x * x <= 25
has the meaning
"for all integers x between 0 and 5 inclusive, the square of x is at most 25".
Similarly, the set comprehension set x: nat | x <= 5 :: f(x)
can be read as
"the set containing the result of applying f to x, for each integer x from 0 to 5 inclusive".
The common syntax that specifies the bound variables and what values they take on
is known as the quantifier domain; in the previous examples this is x: nat | x <= 5
,
which binds the variable x
to the values 0
, 1
, 2
, 3
, 4
, and 5
.
Here are some more examples.
x: byte
(where a value of typebyte
is an int-based numberx
in the range0 <= x < 256
)x: nat | x <= 5
x <- integerSet
x: nat <- integerSet
x: nat <- integerSet | x % 2 == 0
x: nat, y: nat | x < 2 && y < 2
x: nat | x < 2, y: nat | y < x
i | 0 <= i < |s|, y <- s[i] | i < y
A quantifier domain declares one or more quantified variables, separated by commas. Each variable declaration can be nothing more than a variable name, but it may also include any of three optional elements:
-
The optional syntax
: T
declares the type of the quantified variable. If not provided, it will be inferred from context. -
The optional syntax
<- C
attaches a collection expressionC
as a quantified variable domain. Here a collection is any value of a type that supports thein
operator, namely sets, multisets, maps, and sequences. The domain restricts the bindings to the elements of the collection:x <- C
impliesx in C
. The example above can also be expressed asvar c := [0, 1, 2, 3, 4, 5]; forall x <- c :: x * x <= 25
. -
The optional syntax
| E
attaches a boolean expressionE
as a quantified variable range, which restricts the bindings to values that satisfy this expression. In the example abovex <= 5
is the range attached to thex
variable declaration.
Note that a variable's domain expression may reference any variable declared before it,
and a variable's range expression may reference the attached variable (and usually does) and any variable declared before it.
For example, in the quantifier domain i | 0 <= i < |s|, y <- s[i] | i < y
, the expression s[i]
is always well-formed
because the range attached to i
ensures i
is a valid index in the sequence s
.
Allowing per-variable ranges is not fully backwards compatible, and so it is not yet allowed by default;
the --quantifier-syntax:4
option needs to be provided to enable this feature (See Section 13.9.5).
2.7.5. Numeric Literals (grammar) {#sec-numeric-literals}
Integer and bitvector literals may be expressed in either decimal or hexadecimal (digits
or hexdigits
).
Real number literals are written as decimal fractions (decimaldigits
).