Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if this is the correct fix, or if it was supposed to be
||
instead of&&
. Were you able to identify based on your understanding of regex character classes what this is trying to do and which would be the correct fix? I'm hesitant to make either change without knowing what was originally intended. Ideally we'd be able to test the fix also makes something work that previously wouldn't work. E.g. if there is some regex pattern that wouldn't work before and would now work after this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making suggestions based on static code review so you are right to question them.
Searching for "[:" makes sense because there are predefined character classes in POSIX ( https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions#Character_classes ).
If using a regex with one of those character classes works with the fix but not without, that would show that the fix is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller of this function is here in
parse_regex()
:So it has already found the leading
[
. So it would make sense if this were looking for something like[[:lower:]]
Note this regex parsing code is only used in the phishing signature regex parsing code for PDB and WDB signatures.
To test this, I made this PDB signature:
to monitor the display domain name "bankofamerica.com" found in emails for suspicious real-URL's.
Then I made this WDB signature to test the regex features:
This silly signature will allow the real-URL to be
eba5.com
(or any other digit aftereba
), but anything other than that (such asebay.com
, orebat.com
) will not be ignored and will alert as suspicious.I tested by scanning this email, with variations of the link:
I tested ClamAV 1.2.0 to see if using the
[[:digit:]]
character class works.E.g. I ran commands like this:
It appears to work already, though I didn't debug step through line by line to see how it works. But I find that strange, because I did test in a debugger with this PR and was able to confirm it working, and also running the "new" code:
In short, it's not clear to me if this code is really needed. Though it does seem to work okay with your change. But it also works okay like this:
As a side note: testing character classes relating to upper/lower character changes will not work as expected because the text is all converted to lowercase before matching.