The page introduces the basic syntax of the wolfram language and explains the structure of the syntax files. src/syntaxes/simplest.yaml is a direct implementation of this page.
Note: the syntax definition uses some YAML tags which can be found in extended schema.
There are some basic concepts in this overview. These regular expressions are called variables and will be auto inserted into the syntax files through Mustache in the building process.
- alnum:
[0-9a-zA-Z]
- number:
(?:\d+\.?|\.\d)\d*
- symbol:
[$a-zA-Z]+[$0-9a-zA-Z]*
A simplest syntax definition for Wolfram Language support the following syntax:
- Shebang
- Numbers
- Strings
- Operators
- Variables
- Functions
- Patterns
- Bracketing
- Box Forms
- Comment blocks
- Shorthand expressions
- Escaping before newlines
See here for the shebang definition. It's easy to support such a syntax: \A(#!).*(?=$)
.
In Wolfram Language, numbers can:
- have base:
2^^10
,11^^a.a
- have precision:
2`10
,11`
- have accuracy:
2``10
,11``
- in scientific form:
2*^10
,2*^-1.1
So a complete syntax for number should be:
(?x)
(?:
([1-9]\d*\^\^) # base
((?:{{alnum}}+\.?|\.{{alnum}}){{alnum}}*) # value
|
({{number}}) # value
)
(?:
(\`\`(?:{{number}})?) # accuracy
|
(\`(?:{{number}})?) # precision
)?
(\*\^[+-]?{{number}})? # exponent
Note: ^^
, `
, ``
and *^
should not be treated as operators.
Reference: Input Syntax.
A string in Wolfram Language must be quoted in a pair of "
and can have the following special syntaxes:
Some special characters may have their names, and can be matched with \\\[{{alnum}}+\]
.
Note: not every \\\[{{alnum}}+\]
is corrent grammar, but the simplest syntax definition does not provides a list of supported names.
In Wolfram Language, some charcters can be "escaped" while others cannot. Try the following code on Mathematica:
Reap[
Scan[
Sow[#, Quiet @ Check[Length @ Characters @ ToExpression["\"\\" <> # <> "\""], -1]] &,
CharacterRange[33, 126]
],
_,
#1 -> StringJoin[#2] &
] // Last
You can obtain the following result:
- disappeared:
<>
- unchanged:
#$',-89;=?]{|}~
- escaped:
!"%&()*+/@\^_`bfnrt
- errored: other characters
The first three kinds of characters can be placed after a \
while characters from the last kind cannot.
The Wolfram Language also supports characters with encoding:
- 3-digits octal:
\\[0-7]{3}
- 2-digits hexadecimal:
\\\.[0-9A-Fa-f]{2}
- 4-digits hexadecimal:
\\:[0-9A-Fa-f]{4}
Note: a string which begins with a \
, \.
or \:
and followed by at least one number (or hexdecimal) character but don't matched with the syntax above is illegal.
A string can also include box forms which will be introduced later on. But in the simplest syntax, box forms in string will not be supported.
References:
There are so many operators in Wolfram Language! But syntax definitions for them is easy to write. You only need to check them out and write them in a proper sequence. I divided them into 15 categories:
Replace:
/. Replace
//. ReplaceAll
Call:
@ Prefix
@@ Apply
@@@ Apply
/@ Map
//@ MapAll
// Postfix
~ Infix
@* Composition
/* RightComposition
Comparison:
> Greater
< Less
>= GreaterEqual
<= LessEqual
== Equal
!= Unequal
=== SameQ
=!= UnsameQ
Logical:
! Not
|| Or
&& And
Assignment:
= Set
:= SetDelayed
^= UpSet
^:= UpSetDelayed
/: TagSet (TagUnset, TagSetDelayed)
=. Unset
+= AddTo
-= SubtractFrom
*= TimesBy
/= DivideBy
Rule:
-> Rule
:> RuleDelayed
<-> TwoWayRule
Condition:
/; Condition
Repeat:
.. Repeated
... RepeatedNull
Arithmetic:
+ Plus
- Minus, Subtract
* Multiply
/ Devide
^ Power
. Dot
! Factorial
!! Factorial2
' Derivative
** NonCommutativeMultiply
++ Increment, PreIncrement
-- Decrement, PreDecrement
Flow:
<< Get
>> Put
>>> PutAppend
String:
<> StringJoin
~~ StringExpression
| Alternatives
Span:
;; Span
Compound:
; CompoundExpression
Function:
& Function
Definition:
? Definition
?? FullDefinition
Note: Some operators may not be included in the list if they are declared in other scopes.
Also, named characters can also be recognized as operators.
Reference: Operators.
A general variable is some symbols joined with some `
(a symbol before a `
is called "context").
match: (`?(?:{{symbol}}`)*){{symbol}}
name: variable.other.wolfram
captures: !raw
1: variable.other.context.wolfram
Functions have no difference with variables in Wolfram Language. But we should color them more like functions in a syntax definition. Here are some basic way to identify a function:
- an variable placed before
(@{1,3}|//?@|[/@]\*)
- an variable placed after
(//|[@/]\*)
- an variable placed on an even order in some expressions joined with some
~
- an variable placed after a PatternTest (which was introduced in the next part)
Apart from functions, patterns have two forms:
- in the shorthand form of pattern, that is a variable before
:(?=[^:>=])
- in the shorthand form of blank and default, that is a variable before
(?x)
(_\.) # Default
|
(_{1,3}) # Blank, BlankSequence, BlankNullSequence
({{identifier}})? # Head (here "identifier" means variable)
After a pattern, there may be some additional syntaxes other than expressions:
- Optional:
:
- PatternTest:
?
However, how to color them properly is of great difficulty, and is not supposed to be discussed here.
There are many kinds of bracketing in the Wolfram Language. A general bracketing rule should be like this:
begin: \\(
beginCaptures: !all punctuation.section.parens.begin.wolfram
end: \\)
endCaptures: !all punctuation.section.parens.end.wolfram
name: meta.parens.wolfram
patterns: !push expressions
In a simplest syntax declaration, we only need to support the following bracketing:
- parens:
(
and)
- braces:
{
and}
- brackets:
[
and]
- association:
<|
and|>
- parts:
[[
and]]
- box:
\(
and\)
Reference: The Four Kinds of Bracketing in the Wolfram Language.
Box forms is a nested scope with all expression rules and some special syntaxes:
\\`
: FormBox\\@
: SqrtBox\\/
: FractionBox\\[%&+_^]
: x-scriptBox (x can be Sub/Super/Over/Under/...)\\\*
: box constructors
Reference: String Representation of Boxes.
A comment block is wrapped in a pair of (*
and *)
:
begin: \(\*
end: \*\)
patterns: !push comment-block
Note: in the inner scope of a comment block, the rule itself must be included because the following syntax is legal in Wolfram Language and can be found in some .wl files:
(* ::Input:: *)
(*(* some *)
(* comments *)*)
There are also some syntaxes which corresponds to a function but cannot be simply treated as operators.
- Out:
%(\d*|%*)
- MessageName:
(::)\s*({{alnum}}+)
- Slot, SlotSequence:
(#[a-zA-Z]{{alnum}}*|##?\d*)
- Get, Put, PutAppend:
(<<|>>>?) *([a-zA-Z0-9`/.!_:$*~?\\-]+) *(?=[\)\]\},;]|$)
Reference: Wolfram Language Syntax.
Finally, if a back-slash (\\\r?\n
) is placed before a newline, it will eacape the newline.