Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tooling to debug pattern matching. #135

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xFrednet
Copy link

@xFrednet xFrednet commented Aug 21, 2024

It can be hard to determine why a certain pattern isn't matching with the given tree. This PR adds a Dbg function which can be wrapped around the pattern and effect to debug the matching step.

The function has a custom pattern that wraps around the original pattern. It then counts how many nodes have been matched until the match failed. The debug macro has a defined start token, that engages the debugging. The results are remitted as error nodes to the AST.

Example

Let's consider the following AST as an example:

(if)
(int 1:5)
(leq)
(int 1:6)
(then)
(int 2:10)
(else)
(int 3:100)

I wrote the following rule, to match the expressions:

T(If)
    * (((Any * --T(Then))++ * Any)[Op]) * T(Then)
    * (((Any * --T(Else))++)[Then]) * T(Else)
    * Any++[Else] >>
  [](Match& _)
  {
    // [..]
  },

However, the match sadly fails. Now I can add the Dbg function as a wrapper:

+ Dbg(If,
 T(If)
     * (((Any * --T(Then))++ * Any)[Op]) * T(Then)
     * (((Any * --T(Else))++)[Then]) * T(Else)
     * Any++[Else] >>
   [](Match& _)
   {
     // [..]
   },
+ ),

The first If indicates that the debugging should start at the first If token it encounters. The function generates the following output:

(error
    (debugmsg 26:match failed after 5 nodes)
    (debugmsg 20:capture 'then': )
    (debugmsg 21:capture 'op': 
        (int 1:5)
        (leq)
        (int 1:6))
    (debugmsg 13:matched nodes
        (if)
        (int 1:5)
        (leq)
        (int 1:6)
        (then)))
    (int 2:10)
    (else)
    (int 3:100))

It's indicated that the first five nodes (if)(int)(leq)(int)(then) could be matched.

Planned Improvements:

  • The console output currently only prints the message. Is there a simple way to show the children nodes and also show the "source code" representation of what has been matched?
  • Document the function in a more visible place. Do you have any suggestions where?

My C++ is a bit rusty, so there are probably several unidiomatic expressions. I'll be happy to fix them.

I also encountered a problem with intrusive_ptr where I couldn't pass intrusive_ptr<DebugPattern> into a function expecting PatternPtr. Is there a simple way to do this conversion?

Also, I wasn't sure how to properly test this, since the usage of this function should be an exception and not the rule. 😅


cc: @EliasC

@xFrednet

This comment was marked as off-topic.

Copy link
Member

@mjp41 mjp41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xFrednet thanks for looking into this. I am unclear on the precise aims of this. As you are only generating pretty print on the final state of a match can we get the same information in the debugger without code changes?

Adding logging to the pattern matching code be very helpful for investigating why something failed internally.

Does this provide the part of the pattern that matched, and the part that didn't?

I have been wondering if we should have an AST for rewrite patterns, and I think that could help with this kind of debugging experience as you would be able to return the continuation that has not matched. Currently there is no printable representation for that.

include/trieste/rewrite.h Show resolved Hide resolved
Comment on lines +1071 to +1072
bool match(NodeIt& it, const Node& parent, Match& match) const& override
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure on what this will produce in the failure case for a disjunction? Will it only report the failure of the final disjunct?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is sadly a limitation of this non-intrusive approach.

@xFrednet
Copy link
Author

@xFrednet thanks for looking into this. I am unclear on the precise aims of this. As you are only generating pretty print on the final state of a match can we get the same information in the debugger without code changes?

You could say that this is "only generating pretty print". It should be possible to get the same information with a debugger, but then the question becomes where you set the break points?

Patterns are checked for every node. So I don't believe that you can just say break ./trieste/include/trieste/rewrite.h. You would want to say, "once the pattern started matching, then break". And then you would still need to step though all the match infrastructure to get the matched groups etc.

This is meant to get this information fast.

Adding logging to the pattern matching code be very helpful for investigating why something failed internally.

I haven't seen any logging in the existing infrastructure and therefore didn't consider this possibility. If we have general logging, it would likely be more flexible. I'm uncertain how we would add it, but it's certainly a valid alternative. Assuming that it doesn't ruin performance.

Does this provide the part of the pattern that matched, and the part that didn't?

It shows which nodes have been matched and what groups have been captured. The current implementation can't say was part didn't match as that would require a more intrusive approach.

I have been wondering if we should have an AST for rewrite patterns, and I think that could help with this kind of debugging experience as you would be able to return the continuation that has not matched. Currently there is no printable representation for that.

Interesting idea! I was considering to manually walk the rewrite pattern to include information, like which PatternDef failed, but the current infrastructure doesn't quite allow it, since the PatternDefs call their continuation manually.

It would also feel like a more elementary change that I'm not yet comfortable making in Trieste 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants