-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rx, a program for compiling sets of regular expressions #488
Conversation
…ng at \n-terminated patterns in-situ. This is just so much less complicated.
I think at the time of writing, fsm_example() is broken, because I get no output here. Possibly related to #438
…ray. This sets the scene for multi-file pattern lists, later on. But also seems to keep scopes tighter and make things simpler as a byproduct.
Now this is always equivalent to the count passed in, when the return status is 1. And the return status is always 1 when the count is enough. In all situations we know the count is enough.
This allows for eventually iterating over multiple input files.
When only one file is given, the endid is per pattern (that is, the line number) in the input file. When multiple files are given, the endid is the argv[] index the file. That is, all patterns within each file share the same endid.
This should find the same result anyway, there's just no need to go to the trouble of constructing an AST here, when we already know the pattern is a literal.
… handle in the caller.
I'm not particularly thrilled about the handling for different AMBIG_ modes here. I'm thinking eventually it might make sense to move this stuff into libfsm proper and share it with the other cli tools.
I'm not sure why I thought this was neccessary; I tested and the trie code seems happy with an empty string. Which is fortunate, because I'm a believer in recursive datastructures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me.
I already mentioned the error where it generates C code that attempts to write into a const unless -u
is set. That seems like it came from an earlier PR, though.
assert(!fsm_empty(fsm)); | ||
|
||
if (!fsm_setendid(fsm, id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the intent for setting this after minimisation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That there's only one end id for this fsm (as the comment just above says), so there's no need to deal with endids through the various transformations as we construct the fsm. I'm setting the id after construction just because it's unnecessary to set it any earlier.
fprintf(stderr, "overriding dialect by extension for %s: %s\n", | ||
argv[arg], ext); | ||
dialect = dialect_name(ext); | ||
if (override_dialect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a good idea to make this require a flag, rather than being fully automatic behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is! -s
From the manpage:
You can get some resource stats with -Q:
There are a few small fixes and things on this branch, that superficially have nothing to do with rx. That's because I originally had much more groundwork here, which I've pulled out to separate PRs (especially #485 and #486, but also others). I want to keep the history for rx itself intact, rather than rebase away the stuff I moved out to other PRs. So I've merged over from main, and left the few seemingly-unrelated fixes without rebasing them out.
rx was named by @averymcnab