-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser/printer could be more lazy (to handle prefix of infinite stream) #9
Comments
I think you're definitely right. It would be nice if However, like you say, I'm not sure how to do this in practice. The parser wants nested parenthesis (in order for the output printer to print the right amount of indentation), so I'm not sure how to go about implementing this. If I were to take a guess at it, I might start playing around with the streaming parsers like pipes-attoparsec or conduit-extra's Data.Conduit.Attoparsec module. I'm pretty sure there are incremental streaming parsers for things like json and xml, so I think it should work in theory. If you (or anyone else) wants to take a shot at implementing this, I would be happy to help. It might be easier to tackle this after implementing #16. |
Edward Kmett left a good comment about this on reddit:
|
Would something like this work? https://gist.github.com/andrew-lei/565d7fa9e71059a59d979371c604562c |
I've just realised a couple details are wrong (e.g. I've ignored legal Haskell operators and this only lexes for non-negative integers), but those should be able to be fixed. What to do about non-legal Haskell? |
@andrew-lei Thanks for the work on this. I think something like your code posted above is definitely on the right track!
pretty-simple doesn't really care about Haskell syntax, per se. It is really only interested in parens, braces, brackets, commas, and quotes. Right now, the only "non-legal" input to pretty-simple is something where braces, brackets, etc is not balanced. However, ideally pretty-simple should still try to pretty-print these types of input. There are three open issues related to this:
A version of "pretty-simple" that can stream output (like you're working on above), should ideally just ignore stray parens, braces, etc. Issue #16 is about splitting pretty-simple's current parser into a lexer/parser. However, after thinking about it a little, it is possible that all we really need is a lexer. It might be possible to figure out the output indentation and color based entirely on lexed input instead of fully parsed input. I think your solution above is on the right track as far as this is concerned. |
Yes, that's what I was thinking, but I was considering a few possible issues relating to lexing for a number.
Since it would make most sense to store numbers as strings (since we aren't doing anything with numbers anyway) perhaps it would just make sense to store anything that starts being interpreted as a number as a string. Or we could stop lexing the rest as a number, and store the rest under another constructor, e.g. 'SyntaxError', which could be highlighted as red just as an unmatched paren would be. Or numbers could be parsed strictly instead of lazily, and if something turns out not to be a number, it gets read as Other or something, if we're not worried about some number whose string representation is potentially very long (which certainly seems unlikely but not sure if it can be ruled out). In any case, if numbers are lexed lazily I don't believe it would be possible to simply backtrack and say it isn't a number or something any more.
|
Currently pretty-simple doesn't do anything special with numbers at all, so if you take the time to implement anything at all, it will be an improvement over the current state of affairs. As far as I know, pretty-simple is only used for debugging. It is used for printing out large datatypes in a manner that is simple to read. I don't think we would lose any users if we don't handle floating points numbers and negative numbers perfectly. For an initial implementation, I think just printing out all numbers in green (and ignoring floating point numbers and negative numbers) would be sufficient. That being said, I do appreciate you taking the time to think about this. If you want to tackle how to get floating-point and negative numbers working, that would be great! Let me respond to your questions inline:
I think you're on the right track here. I don't know if we'd be able to parse lazily as well as backtracking when something isn't a number. I think you are right that it should be fine to store numbers as strings, since that is basically what we are treating them as. I would be hesitant to treat poorly formatted numbers as a syntax error, since I can easily imagine the Dates like
If you wanted to try implementing this, I would say go for it! It sounds like it might work as you describe 👍 |
Yes, you are right about UTCTime and Day. To further complicate matters, Day uses dashes in between parts of the date (which could interfere with the negative case since a space after the dash would cause Is the following fully general? A number
The moment the lexer for a number encounters a space, non-'e' alpha, bracket, quote, or a second 'e', that becomes part of the next token. Or should a number be parsed strictly/eagerly, and a failed number then goes as an 'Other'? |
These rules sound good to me. This way, a
This would be fine with me as well. For instance, if some |
OK, I have a working lazy parser in my fork of the repo. Proof of laziness: > take 10 . unCommaSeparated . unBracket . head . expressionParse . show $ [1..]
[[Other "1"],[Other "2"],[Other "3"],[Other "4"],[Other "5"],[Other "6"],[Other "7"],[Other "8"],[Other "9"],[Other "10"]] Almost all of the tests pass, I'll look into the ones that don't (see below). Unfortunately, pShow is still not lazy, so something in the render method must also be changed. Failed test: ### Failure in src/Text/Pretty/Simple.hs:210: expression `pPrintOpt defaultOutputOptionsNoColor (1, (2, "foo\nbar\nbaz", 3))'
expected: ( 1
,
( 2
, "foo
bar
baz"
, 3
)
)
but got: ( 1
,
( 2
, "foo\nbar\nbaz"
, 3
)
)
Examples: 50 Tried: 47 Errors: 0 Failures: 1 |
This looks great! Feel free to fix up the tests as you see fit. If you need any help or have any questions with the tests let me know. I would be happy to help. |
The test is right, I believe I'm just not properly catching escaped characters (which I can work out later). There seems to be several barriers to laziness for the outputs end. First, I believe the other problems involve the state execution requiring a traversal to the end of the list, and the representation of the outputs as a strict I'll have to think about how to deal with running the printer state, since the solution there isn't immediately obvious to me. |
I believe probably the way to handle the printer state is to move the output list out of the printer state and use a Also, is it possible to use the monadic parser combinator libraries lazily? In my solution I parsed it manually, but I suspect if I were more familiar with any of these libraries I could figure out a way to use them lazily. I haven't seen any examples of usage on infinite input. |
Your two proposed solutions here sound good to me. I think it is more important to get lazy (streaming?) parsing/printing working than it is to make sure that
This also sounds good to me. To be honest, I haven't really looked deeply at the output-printing Haskell code, so feel free to make any changes necessary to get this working! Even big sweeping changes are completely fine!
The problem with some of the parsing combinator libraries is their return type. For instance here is parsec's runParser :: Parsec a -> Text -> Either ParseError a The problem here is the return type. The entire Your code from above is basically just returning a I'm not sure what the best way to do this is. Although now that I think about it, I haven't actually tried Parsec in this case. So I dunno, maybe it all just works out. |
Yes, that was my conclusion about Parsec as well, which was why I went that route. I think I did see something about a lazy As I've mentioned, the This risks the same issue you were worried about with other show instances doing something weird as we were talking about with respect to numbers, but probably fewer things use a show instance to print out quotes containing non-standard escape characters or single backslashes than printing something that starts with a number. At any rate, it probably is not optimal to try to parse a string and switch out for something else if it fails to be a valid string since strings can potentially be infinite in length. |
@andrew-lei Thanks for thinking more about this. I'm not sure if I have a good answer here. I had two thoughts:
However, I trust your judgement with this, so I'm definitely willing to go with whatever you deem is the best course of action. |
I think if we parsed string literals lazily, we could design some post-processing step that takes a lazy list of string literals and invalid escape sequences that allows configuration for unusual cases. On the other hand, I don't think it would be possible to go the other way if we start out by parsing string literals strictly. Incidentally, this is what I was thinking of doing for numbers, which is part of why I don't have a separate constructor for numbers in my current fork (the other reason being that I wanted to keep the tests correct). We could also just go with leaving the escaping in. After all, |
Yeah, this sounds good to me!
Ideally, I think I would like the default This is because Thinking about the end-users, I think the benefit from actually printing the newlines in However, I definitely agree that making this one of the configuration options would be really nice. Users that want it to work differently could easily do that! |
Well, andrew-lei/pretty-simple@a8c00fb makes > take 100 . show . expressionsToOutputs . expressionParse . show $ repeat 'a'
"[Output {outputNestLevel = NestLevel {unNestLevel = 0}, outputOutputType = OutputStringLit \"aaaaaaaa" |
I believe Perhaps we can extract a function that returns the next bit, either as a EDIT: Or perhaps keep the Builders and run the fold on the Builders outside of the monad. |
andrew-lei/pretty-simple@b341085 makes Proof of laziness: > import Data.Text.Lazy as T
> T.take 100 . pShowNoColor $ [1..]
"[ 1\n, 2\n, 3\n, 4\n, 5\n, 6\n, 7\n, 8\n, 9\n, 10\n, 11\n, 12\n, 13\n, 14\n, 15\n, 16\n, 17\n, 18\n, 19\n, 20\n, 21\n, 22" No additional tests have been broken, currently at 47/50; the stringlit issue must still be resolved. I'll probably be able to get that done either today or tomorrow, and submit a pull request then. |
That sounds great! Thanks again for working on this. |
So it's actually pretty simple to solve these last tests, throw in a As I've mentioned, I think it would work well as part of a post-processing step that would allow a user to perform more general sorts of manipulation on the intermediate bits. So the post-processor should be a function either I was thinking one way to do this would be to perhaps have a constructor that contains the parsed value and either the type information (string literal, number, datetime, etc.) or the colour. Anyway, this would require a bit more of a redesign than is strictly necessary for laziness (plus, I think it would make sense to open it as a separate issue so we can discuss this further), so I think perhaps this should wait until later, and in the meantime I'll just use |
I think I see what you did there ;-)
I think both of these ideas sound really good. It would be great to have a post-processing step that would allow a user a way to perform arbitrary manipulation on the output. This would allow pretty-simple to be used in quite a general manner to Haskell types in many different ways.
This sounds like a good plan. It would be great to get a release out with the work you've done so far! |
Just an idea. It seems that the input is evaluated fully, before any output appears.
Try
pPrint $ repeat (False,())
which shows nothing, whileshow $ repeat (False,())
will produce output.At least in a ghci session it's nice to see prefixes of (potentially) infinite streams appear as the are computed, as in
filter (\q -> show q == reverse(show q)) $ map (^2) [1..]
or whatever.Now I see that is a challenge for the parser (it wants nested parentheses, so it will wait
for the closing one always?) and for the printer (layout combinators may be strict as well).
The text was updated successfully, but these errors were encountered: