-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preg_match is inconsistent in how it handles unmatched capturing groups #17934
Comments
Would PREG_UNMATCHED_AS_NULL solve that issue for you? If I remember correctly, it has been introduced to solve this problem, in a mostly backwards compatible way: #2526 |
@cmb69 thanks; wasn't aware of that flag. It does seem like this fixes the inconsistency. I guess this is one of the many secret PHP flags that you should always know to pass :-) |
Seems like this is not a bug then. 🙂 |
I mean, I would consider it a bug, just perhaps one maintained for the sake of backwards compatibility. It would be nice if this were at least documented. The documentation on preg_match says this about
However, this is not true; as the original example I posted shows, without this flag unmatched subpatterns might be reported as empty string and they might not be reported at all. At least in the documentation then perhaps there's room to clarify this and suggest using this flag more strongly? |
It's more that this will unnecessarily hurt performance. PCRE is very performance sensitive. It would be nice if we could remove unmatched groups, but this may cause BC issues. |
Why does that flag hurt performance? I've not looked at the internals, but I'd think that PCRE needs to parse the pattern and identify the name/index of each capturing group as a prerequisite to matching. So the only added cost would be pre-populating the match array with those extra keys/values. Obviously that's not zero overhead, but I would expect it to be less than the cost of parsing/matching. |
PCRE is heavily cached, and even compiles to native machine code using a JIT. Population can be comparatively expensive. But I haven't actually verified how much of a difference this flag makes. |
Description
The following code:
Resulted in this output:
But I expected this output instead:
I would expect that a capturing group only appears in the match if the pattern captured that group. However, it seems like in some cases groups on the left side of an alternation will appear with an empty string as the value while groups on the right side are omitted when they aren't captured.
This makes it difficult to easily ask "Did group N get captured?" because, depending on the structure of the regex, sometimes "not captured" will report as empty string and sometimes it will report as an omitted key. The problem is even more confusing if empty string was a possible capture for the group; in that case there's no way to tell what happened without using PREG_OFFSET_CAPTURE which gives -1 for the extraneous matches.
PHP Version
PHP 8.2.12
Operating System
Windows 11
The text was updated successfully, but these errors were encountered: