You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While writing a JavaDoc Extractor, it was seen that the Lexing rules do not appear to follow the description in the Documentation. Where it is stated:
The order in which terminal rules are defined is critical as the lexer will always return the first match.
In the First Screenshot, the grammar can be seen to be extracting the correct text in the syntax tree, so the task is therefore to define some terminal rules that ignore everything else.
Adding the 'IGNORE' rule we can see that the syntax tree has removed the earlier matches, in favour of the later 'IGNORE' rule.
This seems to be in contradiction to the expectation from the requirement about the order of terminal rules.
@NigelWSewell It seems like the documentation skipped over the small detail that we move terminals that can potentially match whitespace characters to the front as a performance optimization. See here.
Note that unlike in Xtext, it's not recommended in Langium to have a catch-all terminal. Langium's underlying lexer implementation (Chevrotain) works quite differently from ANTLR and catch-all terminals will always lead to trouble (even if the order of tokens is correct). A catch-all token will always consume the rest of the input, as even making it non-greedy doesn't work.
Instead, lexer errors are dealt with on a diagnostics level, and unexpected characters are simply omitted from the token stream.
@msujew That would explain the behaviour well eough.
Is there a workaround to this? Either:
A way of forcing strict declaration order.
Ignoring other syntax errors
A complete non-whitespace character set to catch other unwanted text.
Something else ive not thought of.
Either way im sure this is a question/mistake many people from ANTLR/XText will encounter so this can be a good opportunity to improve the documentation.
Not directly in the grammar, though you can override the DefaultTokenBuilder to prevent the behavior. We should probably add a flag to disable the optimization.
Either way I'm sure this is a question/mistake many people from ANTLR/XText will encounter so this can be a good opportunity to improve the documentation.
I assume so as well. We should probably mention that in the docs.
Description
While writing a JavaDoc Extractor, it was seen that the Lexing rules do not appear to follow the description in the Documentation. Where it is stated:
In the First Screenshot, the grammar can be seen to be extracting the correct text in the syntax tree, so the task is therefore to define some terminal rules that ignore everything else.
Adding the 'IGNORE' rule we can see that the syntax tree has removed the earlier matches, in favour of the later 'IGNORE' rule.
This seems to be in contradiction to the expectation from the requirement about the order of terminal rules.
Grammar Used
Test Input
The text was updated successfully, but these errors were encountered: