Does not parse across multiple Lines #13

IARI · 2018-08-16T13:41:22Z

I have to apologize in advance: I have no ideas about lexing and parsing.

When I try to simple a Parser and feed it content with multiple lines, the tokenizer fails

object : Grammar<String>() {
        val singleToken by token(""".+""")
        override val rootParser: Parser<String> by zeroOrMore(singleToken) map { it.joinToString("#") }
}.parseToEnd("fuu \nbar")

com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: UnparsedRemainder(startsWith=no token matched for "bar" at 4 (1:5))
at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:66)
at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:26)

Somehow, after the new line the \G in the wrapping allInOnePattern of the DefaultTokenizer (Tokenizer.kt#L42) does not match anymore.
What am I doing wrong here?

The text was updated successfully, but these errors were encountered:

silmeth · 2018-08-17T08:30:01Z

Your regex does not match every characters – it won’t match a new line.

Betterparse uses standard Kotlin regular expressions, and by default with the defaults flags. You need to use the DOT_MATCHES_ALL option for a . regex to match a new line character.

Alternatively you should be able to use regex like (.|\n|\r)+ instead of .+ with the default options.

IARI · 2018-08-17T16:37:55Z

Thanks @silmeth, but it doesn't seem to be the Problem.
I've tried with DOT_MATCHES_ALL:

val singleToken by token(""".+""".toRegex(RegexOption.DOT_MATCHES_ALL))

But at this Point it doesn't help, because the DefaultTokenizer just takes all the tokens, extracts their pattern-string, and builds its own regex (Tokenizer.kt#L42), and it does not remember regex options from what I can tell.

Consequently I reimplemented the DefaultTokenizer modifying the existing one. this is what I came up with:
https://gist.github.com/IARI/91011233658d386f1f1aefd2450537f2#file-mytokenizer-kt-L14

I made sure, that it receives Regex Options, and called in my Grammar using:

        override val tokenizer: Tokenizer by lazy {
            MyTokenizer(tokens, RegexOption.MULTILINE, RegexOption.DOT_MATCHES_ALL)
        }

The result is still the originally descriped error.

Could this have to do with the behavior of Javas Scanner, which is used by the tokenizer and the used delimiter for it?

h0tk3y · 2018-08-17T18:12:45Z

@IARI, in the 0.3.5 update, I've added reflex option transformation to embedded flags. Before 0.3.5, you could also just add the reflex embedded flags into the pattern string.

IARI · 2018-08-17T20:44:38Z

Thats nice, but as stated that from as much as I understand doesn't solve the problem. So far, it worked only without the \G

IARI · 2018-09-18T16:43:16Z

Any hints?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not parse across multiple Lines #13

Does not parse across multiple Lines #13

IARI commented Aug 16, 2018 •

edited

Loading

silmeth commented Aug 17, 2018

IARI commented Aug 17, 2018 •

edited

Loading

h0tk3y commented Aug 17, 2018

IARI commented Aug 17, 2018 •

edited

Loading

IARI commented Sep 18, 2018

Does not parse across multiple Lines #13

Does not parse across multiple Lines #13

Comments

IARI commented Aug 16, 2018 • edited Loading

silmeth commented Aug 17, 2018

IARI commented Aug 17, 2018 • edited Loading

h0tk3y commented Aug 17, 2018

IARI commented Aug 17, 2018 • edited Loading

IARI commented Sep 18, 2018

IARI commented Aug 16, 2018 •

edited

Loading

IARI commented Aug 17, 2018 •

edited

Loading

IARI commented Aug 17, 2018 •

edited

Loading