-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix excessive backtracking in regex engine by introducing backtrackig limit #311
base: master
Are you sure you want to change the base?
Conversation
The patch fixes a 4 years old issue reported by ClusterFuzz. This issue is possibly responsible (among others) for the poor fuzzing results. |
Minimal qjs repro of the infinite loop that tries to match the empty string non-greedily: var r = new RegExp("()*?a");
r.exec(","); Execute: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment! @chqrlie you probably want to have a look.
@@ -1995,6 +1996,10 @@ static intptr_t lre_exec_backtrack(REExecContext *s, uint8_t **capture, | |||
|
|||
for(;;) { | |||
// printf("top=%p: pc=%d\n", th_list.top, (int)(pc - (bc_buf + RE_HEADER_LEN))); | |||
if (++s->backtrack_count > 1000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should likely be a define
with some sane default value, but overridable at compile time.
I think it is better to fix the non greedy matching first. This patch only hides the real problem. |
The commit |
@bellard Thanks for the fix, it really fixed the recursion in the previous test case! However, I have another repro: var r = new RegExp("(.+)+a");
r.exec("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV"); |
You are right (I think), but I believe we also agree that such inputs are either useless or malicious. Therefore, if we limit the number of recursive calls they trigger, we won't be restricting the actual capabilities of the tool, but we can prevent malicious loops. (I killed the above code after 6 minutes, and who knows how long it would have run. And the input isn't even that long...) |
Oh, it seems that the comment I replied to has disappeared in the meantime. |
I see no problem with your example because it really requires a large number of backtracking (try it in another JS engine). It is the same problem as having a JS function which takes a long time to execute. However, there is a real problem that should be addressed: the interrupt handler registered with JS_SetInterruptHandler() is not called during regexp execution which prevents the user from easily interrupting their execution. |
@bellard Is there any update on the correct method for registering an interrupt handler during regex execution? Alternatively, could the above patch serve as a temporary solution to address the frequent timeout issues during fuzzing, which are causing early exits and poor performance? |
@bellard: I think I found a solution: we can create a fuzzer-friendly build, as suggested by the libFuzzer documentation, using the |
Clone the full repo, otherwise the known-good commit we want to test against won't be available.
Gentle ping |
The regex engine is prone to excessive backtracking, leading to timeouts, especially while fuzzing. This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded during fuzzing, the regex engine aborts to prevent excessive backtracking. For this, the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro is used, as per suggested by the documentation of libFuzzer.
The change LGTM (perhaps the name could use some bikesheding) but I have no commit rights here :-) |
@saghul Thanks for the LGTM! The question about the name of the environment variable is an "easy" one, since it is a constant set by OSS-Fuzz and it is also the officially recommended name by LLVM for creating fuzzer friendly build from the target. :-) |
The regex engine was prone to excessive backtracking, leading to timeouts and infinite loops, particularly with patterns involving nested quantifiers.
This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded, the regex engine aborts to prevent excessive backtracking.