Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Technical Question] Weird behavior of a functional - Is it intended ? #13196

Open
1 task done
DEKHTIARJonathan opened this issue Jan 31, 2025 · 9 comments
Open
1 task done
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior

Comments

@DEKHTIARJonathan
Copy link

Description

I have a quick technical question on this function: https://github.com/pypa/pip/blob/f5ff4fa27dd991f209f2a9f9283a7c1561221b2c/src/pip/_internal/req/constructors.py#L44C1-L53C34

def _strip_extras(path: str) -> Tuple[str, Optional[str]]:
    m = re.match(r"^(.+)(\[[^\]]+\])$", path)
    extras = None
    if m:
        path_no_extras = m.group(1)
        extras = m.group(2)
    else:
        path_no_extras = path

    return path_no_extras, extras

It's being used to separate package_name[extra] into package_name and [extra]

However its seems that this function (the regex specifically) does not work if you use a version specifier ==1.0.0 or >=1.0.0, is it intended ?

import re
from typing import Optional, Tuple


def _strip_extras(path: str) -> Tuple[str, Optional[str]]:
    m = re.match(r"^(.+)(\[[^\]]+\])$", path)
    extras = None
    if m:
        path_no_extras = m.group(1)
        extras = m.group(2)
    else:
        path_no_extras = path

    return path_no_extras, extras


if __name__ == "__main__":
    print(f"{_strip_extras('package_name[extra]')=}").  # 
    print(f"{_strip_extras('package_name[extra]==1.0.0')=}")
    print(f"{_strip_extras('package_name[extra]>=1.0.0')=}")
_strip_extras('package_name[extra]')=('package_name', '[extra]')
_strip_extras('package_name[extra]==1.0.0')=('package_name[extra]==1.0.0', None)
_strip_extras('package_name[extra]>=1.0.0')=('package_name[extra]>=1.0.0', None)

Expected behavior

The expected behavior would probably be, instead of

_strip_extras('package_name[extra]==1.0.0')=('package_name[extra]==1.0.0', None)
_strip_extras('package_name[extra]>=1.0.0')=('package_name[extra]>=1.0.0', None)

getting this:

_strip_extras('package_name[extra]==1.0.0')=('package_name==1.0.0', '[extra]')
_strip_extras('package_name[extra]>=1.0.0')=('package_name>=1.0.0', '[extra]')

pip version

Any

Python version

Any

OS

Any

How to Reproduce

See above

Output

No response

Code of Conduct

@DEKHTIARJonathan DEKHTIARJonathan added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jan 31, 2025
@pfmoore
Copy link
Member

pfmoore commented Jan 31, 2025

Taking a quick look, it appears to only be called on a path or a URL, neither of which can have a version.

@notatallshaw
Copy link
Member

Taking a quick look, it appears to only be called on a path or a URL, neither of which can have a version.

I did the same and came to the same conclusion, could do with better documentation on it's limitations, but appears to be correctly being used.

@DEKHTIARJonathan
Copy link
Author

DEKHTIARJonathan commented Jan 31, 2025

@pfmoore If I use my debugger and do pip install package[extra]==1.0.0 it's absolutely being called. And it returns ("package[extra]==1.0.0", None)

I was having fun to see if I could correct the regex to be more generic, it would probably make smthg like this:

import re
from typing import Optional, Tuple


def _strip_extras(path: str) -> Tuple[str, Optional[str]]:
    pattern = r"^(.+?)\[(.*?)\](?:([=~><!]+)(.+))?$"
    match = re.match(pattern, path)

    if match:
        base_name = match.group(1)
        bracketed_part = f"[{match.group(2)}]"
        # Handles cases with/without versions
        version_specifier = (match.group(3) or "") + (match.group(4) or "")
        return f"{base_name}{version_specifier}", bracketed_part
    return path, None


if __name__ == "__main__":
    print(f"{_strip_extras('package_name[extra]')=}")         # ('package_name', '[extra]')
    print(f"{_strip_extras('package_name[extra]==1.0.0')=}")  # ('package_name==1.0.0', '[extra]')
    print(f"{_strip_extras('package_name[extra]>=1.0.0')=}")  # ('package_name>=1.0.0', '[extra]')
    print(f"{_strip_extras('package_name[extra]<=1.0.0')=}")  # ('package_name<=1.0.0', '[extra]')
    print(f"{_strip_extras('package_name[extra]=~1.0.0')=}")  # ('package_name=~1.0.0', '[extra]')

@notatallshaw
Copy link
Member

notatallshaw commented Jan 31, 2025

IMO, where possible we should be moving away from pip specific regex and to relying on packaging instead.

>>> from packaging.requirements import Requirement

>>> Requirement('package_name[extra]==1.0.0').extras
{'extra'}

@DEKHTIARJonathan
Copy link
Author

Oh that's nice ! What do you think @pfmoore ?
That could be an easy fix what @notatallshaw proposes

@notatallshaw
Copy link
Member

That could be an easy fix what @notatallshaw proposes

Fix what exactly? Is this causing an issue or failure for something?

Just because it's called in a way that doesn't seem intuitive doesn't mean that code path results in an error.

@DEKHTIARJonathan
Copy link
Author

it's propagating extra=None in a bunch of places where it's clearly incorrect.
I'm trying to track down if it has any impact (I'm chasing a weird situation in PEP 771 where pip install package[extra] and pip install package[extra]==1.0.0 (1.0.0 being the only version available) doesn't lead to the same result.

The problem is most certainly in my code/changes, but still this could be a "silent issue" in most cases.

@notatallshaw
Copy link
Member

I would say you're welcome to try and "fix" functions like this, but be aware that code later on might be assuming that requirements are provided this way, so it might end up requiring a refactor of more parts of the codebase.

It's important that optimizations like #12095 aren't accidentally broken (though, I will verify complex real world cases for any PR submitted that affects resolution).

@pfmoore
Copy link
Member

pfmoore commented Jan 31, 2025

I agree with @notatallshaw that we should be using packaging for this sort of thing. However, it's also true that right now, the current behaviour doesn't appear to be causing any bugs. I don't think we should just change things on the basis that they are "clearly incorrect" - better to understand what's going on first.

it's propagating extra=None in a bunch of places where it's clearly incorrect.

Can you give an example? Not just that _strip_extras('package_name[extra]==1.0.0') gives ('package_name[extra]==1.0.0', None), but why it matters that it gives this value?

In parse_req_from_line, _strip_extras is called on a variable path, which is calculated as os.path.normpath(os.path.abspath(name)). So that use can never get a value of 'package_name[extra]==1.0.0' passed to it. In parse_editable, it's passed the value of editable_req, which is documented as

Accepted requirements:
svn+http://blahblah@rev#egg=Foobar[baz]&subdirectory=version_subdir
.[some_extra]

Again, the value 'package_name[extra]==1.0.0' isn't a valid input.

So either the problem is that you're using parse_req_from_line or parse_editable incorrectly, or you're calling _strip_extras yourself, in which case I'd say don't do that, it's clearly intended to be private to the functions in constructors.py, and for anything other than the specific uses I've discussed, it's preferable to use packaging.

It's not that we like having code like this, but it's been round unchanged for 7 years, if not longer - we may not even have been using packaging at that point. And changing things simply to "tidy up" is always a risk - there's a lot of technical debt here, but with the number of users we have, using pip in incredibly weird and wonderful ways, we have to take a cautious approach. In a lot of ways, https://xkcd.com/1172/ feels like it was written about pip... 🙁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants