Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison between bytes and string in defining a frozenset throws exception #1236

Open
chaofanhan opened this issue Aug 6, 2020 · 9 comments · May be fixed by #1286
Open

Comparison between bytes and string in defining a frozenset throws exception #1236

chaofanhan opened this issue Aug 6, 2020 · 9 comments · May be fixed by #1286
Labels

Comments

@chaofanhan
Copy link

https://github.com/python-hyper/hyper-h2/blob/3b0b241d79f5a9ff9382bbc038f84862e0d80abf/src/h2/utilities.py#L20-L26.

Hi, when a python process runs with a flag -bb, the above part of code will throw exception and make h2 not work. May I ask why we define both bytes and string in the frozenset? Is it possible to use only bytes or string? Frozenset will compare keys for deduplication.

@stroeder
Copy link

stroeder commented Feb 7, 2022

I agree this should be fixed, especially this module is used in many other stacks. Sets with mixed string types seem totally broken to me.

Maybe a custom set-class instead of frozenset?

@stroeder
Copy link

stroeder commented Feb 7, 2022

I tried to fix this but I give up for now due to lack of time. This seems really seriously broken! This whole module package serves as a good example why you should have typing.

IMO the devs have to decide where in the call-stack to decode the lower protocol data and refactor everything else above that. Especially remove the really strange kludges like h2.utility._custom_startswith().

@straz
Copy link

straz commented Jan 5, 2023

Even without -bb, this throws noisy warnings:

[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:20)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:29)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:39)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:46)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:55)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:60)

These warnings should at least be less noisy. Perhaps using warnings.simplefilter? reference

@pquentin
Copy link

This also affects urllib3 who runs tests with -bb to catch issues, but will have to stop to adopt h2.

@BYK
Copy link

BYK commented Oct 31, 2024

So I want to fix this with a PR. It seems like this is done for efficiency reasons that said I also looked at hyper/hpack and it looks like we already need to convert everything to bytes before encoding. That said the _to_bytes() helper there converts everything into a string first so, we always go through that conversion. That means there's no reason to keep headers in bytes in h2 land for efficiency so we can convert everything into strings as a first step and continue life with string comparisons and string-sets.

Does that sound plausible? If yes, PR incoming; if no, please help me understand why :)

@BYK
Copy link

BYK commented Oct 31, 2024

Pinging @Kriechi to get attention as I think otherwise this will just be missed.

@Kriechi
Copy link
Member

Kriechi commented Nov 9, 2024

@BYK I'm not familiar enough with this part of the code base to recommend such a larger refactoring. I think there is an implicit API expectation to the users of the h2 library that they can pass in both bytes and str headers, so we need to retain that backward-compatibility, until we cut a major release (which I don't see happening in the near future).

To focus on the issue itself: would separating the bytes vs. str checks solve the exception throwing? First check which class the header key and value are, and then compare them against the frozenset for its correct type? I could also see a custom frozenset implementation or class that wraps this behaviour into something like a type-agnostic compare function that does not throw an exception like the current code does.

@BYK
Copy link

BYK commented Nov 12, 2024

@Kriechi sorry for the delay and not being clear enough. My intention is keeping the h2 API unchanged. I'll list what I have in mind here and then create a sample PR to demonstrate that tomorrow:

  1. We do allow str or bytes as header names or values or in a mixed fashion in h2.
  2. h2 employs some mixed sets (containing values of type str and bytes) to identify some special headers
  3. This triggers Python's bytes-str comparison as it just checks each item in the set
  4. Splitting these sets, determining the data type, and then using the appropriate one is proposed earlier but it comes with a notable performance penalty. Just determining the data type essentially doubles the time it takes to make these comparisons. Given that these are done multiple times for each request, I think this is not the best way forward.
  5. I have noticed that we don't "normalize" the data type of the headers and just pass them down to hpack for transport
  6. In hpack it normalizes all headers by casting them to a str first and then to bytes. So we pay the price of this conversion there already and there's no reason for h2 to keep the headers dict types as whatever is passed to it.
  7. My proposal (the PR I'm going to create) is then removing all bytes variants from these sets and converting all headers to str in one go in h2. This will not only make things easier and faster on h2 side. It should not have any additional overhead downstream in hpack as it already casts things to str and str -> str conversion should be free.

Hope I did a better job explaining here but if not I'll be following up with the PR tomorrow anyway.

BYK added a commit to BYK/h2 that referenced this issue Nov 13, 2024
Fixes python-hyper#1236.

This patch makes all header operations operate on `bytes` and converts all headers and values to bytes before operation. With a follow up patch to `hpack` it should also increase efficiency as currently, `hpack` casts everything to a `str` first before converting back to bytes: https://github.com/python-hyper/hpack/blob/02afcab28ca56eb5259904fd414baa89e9f50266/src/hpack/hpack.py#L150-L151
@BYK BYK linked a pull request Nov 13, 2024 that will close this issue
@BYK
Copy link

BYK commented Nov 13, 2024

@Kriechi okay the PR is up -- please see my note regarding formatting. Happy to work on that part if review turns out to be daunting.

Btw. I had to go the other way around and use bytes for everything instead of converting everything to str. We need an upstream patch to hpack to avoid that unnecessary bytes -> str -> bytes dance.

BYK added a commit to BYK/h2 that referenced this issue Nov 14, 2024
Fixes python-hyper#1236.

This patch makes all header operations operate on `bytes` and converts all headers and values to bytes before operation. With a follow up patch to `hpack` it should also increase efficiency as currently, `hpack` casts everything to a `str` first before converting back to bytes: https://github.com/python-hyper/hpack/blob/02afcab28ca56eb5259904fd414baa89e9f50266/src/hpack/hpack.py#L150-L151
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants