Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preg_match is inconsistent in how it handles unmatched capturing groups #17934

Open
mikethea1 opened this issue Feb 25, 2025 · 7 comments
Open

Comments

@mikethea1
Copy link

Description

The following code:

<?php
$pattern = '/A(?<a>a)|B(?<b>b)/';
preg_match($pattern, 'Aa', $matches);
echo json_encode($matches)."\n";
preg_match($pattern, 'Bb', $matches);
echo json_encode($matches)."\n";

Resulted in this output:

{"0":"Aa","a":"a","1":"a"}
{"0":"Bb","a":"","1":"","b":"b","2":"b"}

But I expected this output instead:

{"0":"Aa","a":"a","1":"a"}
{"0":"Bb","b":"b","2":"b"}

I would expect that a capturing group only appears in the match if the pattern captured that group. However, it seems like in some cases groups on the left side of an alternation will appear with an empty string as the value while groups on the right side are omitted when they aren't captured.

This makes it difficult to easily ask "Did group N get captured?" because, depending on the structure of the regex, sometimes "not captured" will report as empty string and sometimes it will report as an omitted key. The problem is even more confusing if empty string was a possible capture for the group; in that case there's no way to tell what happened without using PREG_OFFSET_CAPTURE which gives -1 for the extraneous matches.

PHP Version

PHP 8.2.12

Operating System

Windows 11

@cmb69
Copy link
Member

cmb69 commented Feb 26, 2025

Would PREG_UNMATCHED_AS_NULL solve that issue for you? If I remember correctly, it has been introduced to solve this problem, in a mostly backwards compatible way: #2526

@mikethea1
Copy link
Author

@cmb69 thanks; wasn't aware of that flag. It does seem like this fixes the inconsistency. I guess this is one of the many secret PHP flags that you should always know to pass :-)

@iluuu1994
Copy link
Member

Seems like this is not a bug then. 🙂

@iluuu1994 iluuu1994 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2025
@mikethea1
Copy link
Author

Seems like this is not a bug then. 🙂

I mean, I would consider it a bug, just perhaps one maintained for the sake of backwards compatibility. It would be nice if this were at least documented. The documentation on preg_match says this about PREG_UNMATCHED_AS_NULL:

If this flag is passed, unmatched subpatterns are reported as; otherwise they are reported as an empty string.

However, this is not true; as the original example I posted shows, without this flag unmatched subpatterns might be reported as empty string and they might not be reported at all.

At least in the documentation then perhaps there's room to clarify this and suggest using this flag more strongly?

@iluuu1994
Copy link
Member

suggest using this flag more strongly

It's more that this will unnecessarily hurt performance. PCRE is very performance sensitive.

It would be nice if we could remove unmatched groups, but this may cause BC issues.

@mikethea1
Copy link
Author

It's more that this will unnecessarily hurt performance. PCRE is very performance sensitive.

Why does that flag hurt performance? I've not looked at the internals, but I'd think that PCRE needs to parse the pattern and identify the name/index of each capturing group as a prerequisite to matching. So the only added cost would be pre-populating the match array with those extra keys/values. Obviously that's not zero overhead, but I would expect it to be less than the cost of parsing/matching.

@iluuu1994
Copy link
Member

PCRE is heavily cached, and even compiles to native machine code using a JIT. Population can be comparatively expensive. But I haven't actually verified how much of a difference this flag makes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants