-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Shrubbery parser as a sample #110
Conversation
@microsoft-github-policy-service agree |
Small question: where do the test files come from? If they have been copied in from another repo, could we potentially clone that repo as part of the build. to avoid having all of these in the repo? Not a big deal if not, just curious. You can see an example of what I'm talking about in the parser directory: if(TRIESTE_BUILD_PARSER_TESTS)
enable_testing()
add_subdirectory(test)
if(NOT IS_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/JSONTestSuite)
execute_process(COMMAND ${GIT_EXECUTABLE} clone --depth=1 https://github.com/nst/JSONTestSuite
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
OUTPUT_QUIET)
endif()
if(NOT IS_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/yaml-test-suite)
execute_process(COMMAND ${GIT_EXECUTABLE} clone --depth=1 --branch data-2022-01-17 https://github.com/yaml/yaml-test-suite
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
OUTPUT_QUIET)
endif()
endif() It would also be neat for these tests to be added either via a test driver like json_test.cc or by adding them in CMake: add_test(NAME json_trieste COMMAND json_trieste test -f WORKING_DIRECTORY $<TARGET_FILE_DIR:json_trieste>) Anything to have more robust CI :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these comments are non-blocking, i.e., more suggestions for improvements that could be done in the future.
I would like (if possible) to not host the test files in this repo, though.
// on the previous line. When parsing the program above, the Indent will be | ||
// ((0,0), 2), indicating that the established indentation level is line 0 | ||
// colum 0, and that this line is continued on column 2. | ||
struct Indent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general note on Trieste usage, there are two strategies that may make your life considerably easier as regards meaningful whitespace. The first, and perhaps simplest, is to make the parser even dumber and have it explicitly capture whitespace as Whitespace
and NewLine
tokens, i.e.:
R"(\r?\n)" >> [](auto& m){
m.add(NewLine);
}
R"([ \t]+)" >> [](auto& m){
m.add(Whitespace);
}
You can then defer decisions about indents etc. to a dedicated pass in Trieste, where you'll have much more flexibility and context to make your decisions. This works rather well. Another option is to use much the same pattern, but to use pattern matching to measure the indent explicitly while processing the end of the previous line, like so:
R"(\r?\n([ \t]*))" >> [](auto& m){
size_t indent = m.match(1).len;
// your indent logic
}
This lookahead can save a bunch of bookkeeping (you can preemptively push or pop indent blocks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about a version with the explicit whitespace tokens, but tried writing the Shrubbery parser as a way to stress the parser module more than the existing samples (when this was developed, infix was the only other sample in the repo). It is also nice how, modulo whitespace sensitivity, the Trieste parser sort of looks like a generalized shrubbery parser. The second option does look a lot nicer than what I am doing right now though, so I will give that a spin!
samples/shrubbery/shrubbery.h
Outdated
@@ -0,0 +1,26 @@ | |||
#pragma once | |||
|
|||
#include <trieste/driver.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practice is now to include trieste/trieste.h
.
inline const auto Id = TokenDef("id"); | ||
inline const auto Lhs = TokenDef("lhs"); | ||
inline const auto Rhs = TokenDef("rhs"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practice is to export your final WF definition here, as otherwise people will not be able to easily use the ASTs you produce downstream.
samples/shrubbery/reader.cc
Outdated
return { | ||
"merge alternatives", | ||
wf_alternatives, | ||
dir::topdown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be turned into a dir::once
pass by adding ++
to the captures below? I'll give an example of what I mean.
samples/shrubbery/reader.cc
Outdated
[](Match& _) { return Alt << (Block << *_[Alt]); }, | ||
|
||
// Merge sequence of alternatives into a single alternative | ||
(T(Alt)[Alt] * (T(Alt) << T(Block)[Block])) >> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential alternative to allow dir::once
?
(T(Alt)[Alt] * (T(Alt) << T(Block)))++[Group] >>
[](Match& _) {
Node alt = _(Alt);
for(auto& node : _[Group]){
alt << node->front();
}
return alt;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(NB this requires my PR which changes NodeRange
to a span)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having foreach loops for NodeRange
looks very nice! Are the parentheses in the pattern off by one? Reading the code, I would have guessed that the Group
capture should not include the Alt
capture, but only the subsequent nodes (so that we can add their blocks under the alt
node in the effect.
samples/shrubbery/reader.cc
Outdated
return { | ||
"drop separators", | ||
wf_no_semis_or_commas, | ||
dir::topdown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a dir::bottomup|dir::once
pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are dir::bottomup
passes more efficient than dir::topdown
passes? I can see how dir::once
is preferrable.
samples/shrubbery/reader.cc
Outdated
return { | ||
"check parsing", | ||
wf_check_parser, | ||
dir::topdown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a dir::bottomup|dir::once
pass I believe
I wrote them myself. Some of them come from the documentation of shrubbery notation. I have been looking for an existing test suite without success (also there would need to be a filtering process since I don't support all of Shrubbery notation). |
OK in that case I suppose we can put them here for now. Maybe one potential improvement down the line would be to put them in a YAML file and write a test driver to process them. |
@matajoh I have addressed most of your comments now.
I also added a final rewriting pass that takes the tree to a shape that matches the grammar of shrubbery notation (this is presented in |
c32055d
to
bf5b1c2
Compare
samples/infix/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't commit this file.
This PR adds a sample that parses Shrubbery Notation. It complements the
infix
example by having a slightly more involved parsing pass (whereas the three rewriting passes are quite straightforward). In particular, it gives an example of handling indentation sensitivity. There is a README that gives an overview of Shrubbery parsing and the implementation. The source code has explanatory comments, but assumes that the reader has done theinfix
tutorial and therefore understands the basics of parsing and pattern matching.The implementation relies on #90.