-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backported pcre-anchor tests and fixes, found while fuzzing on capture branch. #453
Merged
katef
merged 18 commits into
main
from
sv/backported-fixes-from-fuzzing-on-capture-branch
Jan 3, 2024
Merged
Backported pcre-anchor tests and fixes, found while fuzzing on capture branch. #453
katef
merged 18 commits into
main
from
sv/backported-fixes-from-fuzzing-on-capture-branch
Jan 3, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These are anchoring edge cases found by fuzzing. They have to do with incomplete pruning of repeated nodes that are unsatisfiable due to anchoring, nesting of anchors inside groups and/or ALT subtrees, etc. The fixes will go in their own commit.
There's extra error codes in #440 for regexes that aren't UNSATISFIABLE per se, but depend on particular corner cases in PCRE that probably aren't worth supporting in an automata-based implementation. Add a test case for one, tests/pcre/in48.re: ^a|$[^x]b* This is a tricky one to handle properly; according to PCRE it should match either "a<anything...>" OR "\n", but nothing else. The newline match is because $ is a non-input-consuming check that evaluation is either at the end of input, or at a newline immediately before the end. In this case `$[^x]b*` matches exactly one newline; it's equivalent to "$\n". This probably isn't worth supporting, but we can detect cases where a potential newline match appears after a $ and reject them as an unsupported PCRE behavior.
`^(?:$^|a)(?:$^|^)*`: This should be equivalent to `^(?:^$|a)` because the first group is always either fully anchored (making the second redundant) or must match 'a' (making it being anchored at the start impossible), but that information wasn't being passed properly to the second group. `$[^a]*` was returning AST_ANALYSIS_ERROR_UNSUPPORTED_PCRE while inside the repeat node (`[]*`), so it didn't flag the `^` immediately after. If a repeat node that can match 0 times is unsupported, unsatisfiable, etc., just set its max count to 0 and continue. With this change analysis finds the `^` and treats the whole regex overall like `$^|$^`, which reduces to `^$`. The fixes for both of these will be integrated in later commits -- I'm integrating changes from a working branch that had several intermediate steps committed to push to another computer for fuzzing.
If a repeat node's subtree returns an error but has a min count of 0, then prune it by setting its max count to 0. If its error is returned directly then the analysis won't see nodes that appear after it.
Found while fuzzing.
This should normally be set when allocation fails, but when built for fuzzing analysis can pre-emptively reject inputs that would need excessive memory due to large repetition counts (for example), so in that case make sure an error code is set to avoid triggering an assertion later on.
Lots of other bugs in linking have been indirect consequences of an ancestor ALT node that changed its top-down x,y linkage to global or the global start loop instead. Avoiding modification here eliminates a lot of noise. While trying to simultaneously satisfy the pcre-anchor tests 92, 99, and 100 I took a step back, thought through the system as a whole, and realized this was where their linkage was getting messed up.
Rather than modifying this directly, add a function, which has its own asserts and logging.
katef
approved these changes
Jan 3, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds
tests/pcre-anchor/
tests 82-100, which are hand-reduced versions of incorrectly handled regex inputs found during fuzzing.Most of these have to do with:
^
anchor in(a|$)(b|^)
should not link to the unanchored start loop ("pincer anchors"))$[^a]
should not match "x" but should match "x\n" according to PCRE