You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to define a terminal that matches words except specific words.
This is why: trying this code
importparglaregrammar=r"""Sentence: The? object_name=Identifier "is" A Identifier DOT;Identifier: IdentifierWord+;terminalsThe: /(?i)The/;A: /(?i)An?/;IdentifierWord: /\w+/;DOT: ".";"""text="""The apple is a fruit."""g=parglare.Grammar.from_string(grammar)
p=parglare.Parser(g, debug=True, consume_input=False)
result=p.parse(text)
print(result)
fails, expectedly, with Can't disambiguate between: <IdentifierWord(The)> or <The(The)>, because IdentifierWord matches everything. So what I'd like to do is have IdentifierWord not match certain things, such as "the" and "a". However, when I try this, by changing the definition of the IdentifierWord terminal to IdentifierWord: /(?!The|a)\w+/; so that it uses a negative lookahead to exclude certain words from matching, then the above code fails with
Error at 2:4:"\nThe **> apple is a" => Expected: IdentifierWord but found <A(a)>
I don't understand why this is. It's finding the "a" at the beginning of "apple" and treating it as an "a". I don't know if I'm solving this the best way; is there some other way I should be structuring this sort of grammar, or maybe some better way of defining a terminal that matches all words except certain ones?
The text was updated successfully, but these errors were encountered:
Word apple is not matched by (?!The|a)\w+. It is because the negative assertion will match a at the beginning. What you need to do it to make sure that the negative assertion take into account the word boundary. Try this (?!(The|a)\b)\w+.
I'd like to define a terminal that matches words except specific words.
This is why: trying this code
fails, expectedly, with
Can't disambiguate between: <IdentifierWord(The)> or <The(The)>
, becauseIdentifierWord
matches everything. So what I'd like to do is haveIdentifierWord
not match certain things, such as "the" and "a". However, when I try this, by changing the definition of theIdentifierWord
terminal toIdentifierWord: /(?!The|a)\w+/;
so that it uses a negative lookahead to exclude certain words from matching, then the above code fails withError at 2:4:"\nThe **> apple is a" => Expected: IdentifierWord but found <A(a)>
I don't understand why this is. It's finding the "a" at the beginning of "apple" and treating it as an "a". I don't know if I'm solving this the best way; is there some other way I should be structuring this sort of grammar, or maybe some better way of defining a terminal that matches all words except certain ones?
The text was updated successfully, but these errors were encountered: