Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not parse across multiple Lines #13

Open
IARI opened this issue Aug 16, 2018 · 5 comments
Open

Does not parse across multiple Lines #13

IARI opened this issue Aug 16, 2018 · 5 comments

Comments

@IARI
Copy link

IARI commented Aug 16, 2018

I have to apologize in advance: I have no ideas about lexing and parsing.

When I try to simple a Parser and feed it content with multiple lines, the tokenizer fails

object : Grammar<String>() {
        val singleToken by token(""".+""")
        override val rootParser: Parser<String> by zeroOrMore(singleToken) map { it.joinToString("#") }
}.parseToEnd("fuu \nbar")

com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: UnparsedRemainder(startsWith=no token matched for "bar" at 4 (1:5))
at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:66)
at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:26)

Somehow, after the new line the \G in the wrapping allInOnePattern of the DefaultTokenizer (Tokenizer.kt#L42) does not match anymore.
What am I doing wrong here?

@silmeth
Copy link

silmeth commented Aug 17, 2018

Your regex does not match every characters – it won’t match a new line.

Betterparse uses standard Kotlin regular expressions, and by default with the defaults flags. You need to use the DOT_MATCHES_ALL option for a . regex to match a new line character.

Alternatively you should be able to use regex like (.|\n|\r)+ instead of .+ with the default options.

@IARI
Copy link
Author

IARI commented Aug 17, 2018

Thanks @silmeth, but it doesn't seem to be the Problem.
I've tried with DOT_MATCHES_ALL:

val singleToken by token(""".+""".toRegex(RegexOption.DOT_MATCHES_ALL))

But at this Point it doesn't help, because the DefaultTokenizer just takes all the tokens, extracts their pattern-string, and builds its own regex (Tokenizer.kt#L42), and it does not remember regex options from what I can tell.

Consequently I reimplemented the DefaultTokenizer modifying the existing one. this is what I came up with:
https://gist.github.com/IARI/91011233658d386f1f1aefd2450537f2#file-mytokenizer-kt-L14

I made sure, that it receives Regex Options, and called in my Grammar using:

        override val tokenizer: Tokenizer by lazy {
            MyTokenizer(tokens, RegexOption.MULTILINE, RegexOption.DOT_MATCHES_ALL)
        }

The result is still the originally descriped error.

Could this have to do with the behavior of Javas Scanner, which is used by the tokenizer and the used delimiter for it?

@h0tk3y
Copy link
Owner

h0tk3y commented Aug 17, 2018

@IARI, in the 0.3.5 update, I've added reflex option transformation to embedded flags. Before 0.3.5, you could also just add the reflex embedded flags into the pattern string.

@IARI
Copy link
Author

IARI commented Aug 17, 2018

Thats nice, but as stated that from as much as I understand doesn't solve the problem. So far, it worked only without the \G

@IARI
Copy link
Author

IARI commented Sep 18, 2018

Any hints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants