Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule Suggestion min/max instead of sorted(l)[0] #10463

Closed
Skylion007 opened this issue Mar 18, 2024 · 9 comments · Fixed by #10868
Closed

Rule Suggestion min/max instead of sorted(l)[0] #10463

Skylion007 opened this issue Mar 18, 2024 · 9 comments · Fixed by #10868
Assignees
Labels
rule Implementing or modifying a lint rule

Comments

@Skylion007
Copy link
Contributor

Skylion007 commented Mar 18, 2024

Saw this in code:

a = sorted(l)[0]
a = sorted(l, reverse=True)[-1] # not sure about this one....

should be

a = min(l)

Additionally, the following code pattern can also be simplified

a = sorted(l, reverse=True)[0] 
a = sorted(l)[-1] # not sure about this, sort stability could make it output a different value, right?

should be

a = max(l)

Is there already a pylint rule for this? I wouldn't be surprised. If not, let's add it as a RUFF rule.

I am not sure about the [-1] variants, as there may be some subtlety there with sort stability?

@charliermarsh charliermarsh added the rule Implementing or modifying a lint rule label Mar 18, 2024
@ottaviohartman
Copy link
Contributor

This seems nice for my first rule contribution. It occurs way more than I was expecting from this GitHub search, though some of these are not actual cases of this violation.

min() and max() support a key argument as well, so the fix would forward that to these functions.

Is there already a pylint rule for this? I wouldn't be surprised. If not, let's add it as a RUFF rule.

Not from a quick search in their docs.

@charliermarsh do you think this is something I could tackle? Any guidance would be nice but not necessary until I hit roadblocks.

Performance impact of fixing this violation

Test script

import timeit
import random

def method_sorted(a):
    return sorted(a)[0]

def method_min(a):
    return min(a)

sizes = [10**i for i in range(1, 5)]  # Sizes: 10, 100, 1000, ...
number_of_executions = 1000

for size in sizes:
    a = [random.randint(1, 1000) for _ in range(size)]
    
    time_sorted = timeit.timeit(lambda: method_sorted(a), number=number_of_executions)
    time_min = timeit.timeit(lambda: method_min(a), number=number_of_executions)
    
    speedup = time_sorted / time_min
    print(f"Size: {size}, sorted(a)[0] time: {time_sorted:.5f}s, min(a) time: {time_min:.5f}s, Speedup: {speedup:.2f}x")

On my M1 pro:

Size: 10, sorted(a)[0] time: 0.00024s, min(a) time: 0.00021s, Speedup: 1.17x
Size: 100, sorted(a)[0] time: 0.00279s, min(a) time: 0.00118s, Speedup: 2.36x
Size: 1000, sorted(a)[0] time: 0.05002s, min(a) time: 0.01093s, Speedup: 4.58x
Size: 10000, sorted(a)[0] time: 0.88557s, min(a) time: 0.10932s, Speedup: 8.10x

Results are less impressive, but still good, with key=lambda x: x:

Size: 10, sorted(a)[0] time: 0.00067s, min(a) time: 0.00065s, Speedup: 1.03x
Size: 100, sorted(a)[0] time: 0.00629s, min(a) time: 0.00433s, Speedup: 1.45x
Size: 1000, sorted(a)[0] time: 0.08372s, min(a) time: 0.04047s, Speedup: 2.07x
Size: 10000, sorted(a)[0] time: 1.23920s, min(a) time: 0.40743s, Speedup: 3.04x

@Skylion007 Skylion007 changed the title Rule Suggestion min/max instead of sorted(l)[0] Rule Suggestion min/max instead of sorted(l)[0] Mar 19, 2024
@dosisod
Copy link
Contributor

dosisod commented Mar 20, 2024

Just implemented this as FURB192 in Refurb: dosisod/refurb#333

@boolean-light
Copy link
Contributor

boolean-light commented Mar 20, 2024

Python sort and sorted are guaranteed to be stable, so it could have problem like this:

from operator import itemgetter
data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]

min(data, key=itemgetter(0))  # ('blue', 1)
sorted(data, key=itemgetter(0))[0]  # ('blue', 1)
sorted(data, key=itemgetter(0), reverse=True)[-1]  # ('blue, 2')

In this case, the order of ('red', 1), ('red', 2) and ('blue', 1), ('blue', 2) is retained for both sorted call, which cause the result of the two to become different.
So we need to be careful when adding the rule for [-1], though I think it is not very frequent to intentionally use sorted function that way.

@zanieb
Copy link
Member

zanieb commented Mar 20, 2024

I would be okay documenting that as a caveat of the rule, seems like a weird edge-case.

@Skylion007
Copy link
Contributor Author

Skylion007 commented Mar 22, 2024

We should probably flag those unstable sorts as unsafe fixes if we go through with them.

@zanieb
Copy link
Member

zanieb commented Mar 22, 2024

How can we identify them as unstable?

@Skylion007
Copy link
Contributor Author

We should test it out using the example provided above but I think only the '[-1]' index pattern is unstable.

@Skylion007
Copy link
Contributor Author

Actually to clarify, all these sorts are stable, but reverse=True reverse the order unstable elements apparently. (And reverse=True, '[-1]'). So only the first proposed fix using min (and it's reversed version) will always be guaranteed to be the same. I suppose. Maybe that warrants its own rule in that case or a rule toggle not flag it when it could change the returned value on ties.

@ottaviohartman
Copy link
Contributor

I'd like to start working on this if nobody has started yet :)

charliermarsh added a commit that referenced this issue Apr 23, 2024
)

## Summary

Fixes #10463

Add `FURB192` which detects violations like this:

```python
# Bad
a = sorted(l)[0]

# Good
a = min(l)
```

There is a caveat that @Skylion007 has pointed out, which is that
violations with `reverse=True` technically aren't compatible with this
change, in the edge case where the unstable behavior is intended. For
example:

```python
from operator import itemgetter
data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]

min(data, key=itemgetter(0))  # ('blue', 1)
sorted(data, key=itemgetter(0))[0]  # ('blue', 1)
sorted(data, key=itemgetter(0), reverse=True)[-1]  # ('blue, 2')
```

This seems like a rare edge case, but I can make the `reverse=True`
fixes unsafe if that's best.

## Test Plan

This is unit tested.

## References

https://github.com/dosisod/refurb/pull/333/files

---------

Co-authored-by: Charlie Marsh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants