Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotation for itertools.zip_longest() lacks constraint on how many values in each tuple may match the fillvalue. #12733

Open
Ferroin opened this issue Oct 3, 2024 · 2 comments

Comments

@Ferroin
Copy link

Ferroin commented Oct 3, 2024

I stumbled across this issue when working on some code structured like this:

# seq1 has a type of list[int]
# seq2 has a type of list[str]
# Neither has None anywhere in their values

for item in itertools.zip_longest(seq1, seq2, fillvalue=None):
    assert item[0] is not None or item[1] is not None

    match item:
        case (None, v2):
            ...
        case (v1, None) if v1 > some_value:
            ...
        case (v1, None):
            ...
        case (v1, v2):
            ...

The assert will never actually fire assuming typing constraints on the lists are met, because the typing of the two lists being passed to itertools.zip_longest() is such that item cannot be (None, None). However, the type hints of itertools.zip_longest() indicate in this case that the type of item in the above code sample is tuple[int | None, str | None]. This means that type checkers (and anybody looking only at the type information and not the semantics of itertools.zip_longest()) think that that assert could fail.

What’s worse is that this seems to have knock-on effects causing at least mypy to become sufficiently confused that it makes outlandish claims such as stating that the first case in the match statement is impossible because the first item in the tuple would have to have a type that is a subtype of str and None, that the type of v1 in the second case is int | None, and that the final case is completely unreachable. I have not tested with other type checkers, but my experience with other parts of the ecosystem as a whole suggest that they would also be confused by this code.

It’s technically possible update this code to get the type checker to actually understand that all of that is absolute hogwash, but it requires adding extra checks to each case statement to ensure that v1 and v2 are not None, resulting in code that is both significantly more verbose and has a longer runtime, all just to satisfy type checking.

In theory, it should be possible to fix this for the specific cases of defined numbers of iterables being passed to itertools.zip_longest() by changing the typing of the return values. For the example above, changing the return type of the overload case for two iterables and a specified fillvalue to the following should resolve the issue:

zip_longest[tuple[_T1 | _T2] | tuple[_T1, _T] | tuple[_T, _T2]]

I’d be happy to put together a PR to do this, but it’s a lot of typing that doesn’t seem like it can be easily done programmatically, and I wanted to confirm whether such a solution would even be considered acceptable before actually starting on it since it quickly gets very ugly for cases of more iterables (requiring 2ⁿ-1 total tuple types for n iterables).

@JelleZijlstra
Copy link
Member

@hauntsaninja
Copy link
Collaborator

The next release of mypy (1.12) should match pyright here, thanks to a recent brianschubert PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants