Comparison between bytes and string in defining a frozenset throws exception #1236

chaofanhan · 2020-08-06T20:35:34Z

https://github.com/python-hyper/hyper-h2/blob/3b0b241d79f5a9ff9382bbc038f84862e0d80abf/src/h2/utilities.py#L20-L26.

Hi, when a python process runs with a flag -bb, the above part of code will throw exception and make h2 not work. May I ask why we define both bytes and string in the frozenset? Is it possible to use only bytes or string? Frozenset will compare keys for deduplication.

The text was updated successfully, but these errors were encountered:

stroeder · 2022-02-07T22:00:42Z

I agree this should be fixed, especially this module is used in many other stacks. Sets with mixed string types seem totally broken to me.

Maybe a custom set-class instead of frozenset?

stroeder · 2022-02-07T23:39:45Z

I tried to fix this but I give up for now due to lack of time. This seems really seriously broken! This whole module package serves as a good example why you should have typing.

IMO the devs have to decide where in the call-stack to decode the lower protocol data and refactor everything else above that. Especially remove the really strange kludges like h2.utility._custom_startswith().

straz · 2023-01-05T20:48:09Z

Even without -bb, this throws noisy warnings:

[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:20)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:29)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:39)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:46)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:55)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:60)

These warnings should at least be less noisy. Perhaps using warnings.simplefilter? reference

pquentin · 2023-11-15T23:16:53Z

This also affects urllib3 who runs tests with -bb to catch issues, but will have to stop to adopt h2.

BYK · 2024-10-31T15:47:50Z

So I want to fix this with a PR. It seems like this is done for efficiency reasons that said I also looked at hyper/hpack and it looks like we already need to convert everything to bytes before encoding. That said the _to_bytes() helper there converts everything into a string first so, we always go through that conversion. That means there's no reason to keep headers in bytes in h2 land for efficiency so we can convert everything into strings as a first step and continue life with string comparisons and string-sets.

Does that sound plausible? If yes, PR incoming; if no, please help me understand why :)

BYK · 2024-10-31T15:48:22Z

Pinging @Kriechi to get attention as I think otherwise this will just be missed.

Kriechi · 2024-11-09T10:00:19Z

@BYK I'm not familiar enough with this part of the code base to recommend such a larger refactoring. I think there is an implicit API expectation to the users of the h2 library that they can pass in both bytes and str headers, so we need to retain that backward-compatibility, until we cut a major release (which I don't see happening in the near future).

To focus on the issue itself: would separating the bytes vs. str checks solve the exception throwing? First check which class the header key and value are, and then compare them against the frozenset for its correct type? I could also see a custom frozenset implementation or class that wraps this behaviour into something like a type-agnostic compare function that does not throw an exception like the current code does.

BYK · 2024-11-12T22:06:45Z

@Kriechi sorry for the delay and not being clear enough. My intention is keeping the h2 API unchanged. I'll list what I have in mind here and then create a sample PR to demonstrate that tomorrow:

We do allow str or bytes as header names or values or in a mixed fashion in h2.
h2 employs some mixed sets (containing values of type str and bytes) to identify some special headers
This triggers Python's bytes-str comparison as it just checks each item in the set
Splitting these sets, determining the data type, and then using the appropriate one is proposed earlier but it comes with a notable performance penalty. Just determining the data type essentially doubles the time it takes to make these comparisons. Given that these are done multiple times for each request, I think this is not the best way forward.
I have noticed that we don't "normalize" the data type of the headers and just pass them down to hpack for transport
In hpack it normalizes all headers by casting them to a str first and then to bytes. So we pay the price of this conversion there already and there's no reason for h2 to keep the headers dict types as whatever is passed to it.
My proposal (the PR I'm going to create) is then removing all bytes variants from these sets and converting all headers to str in one go in h2. This will not only make things easier and faster on h2 side. It should not have any additional overhead downstream in hpack as it already casts things to str and str -> str conversion should be free.

Hope I did a better job explaining here but if not I'll be following up with the PR tomorrow anyway.

Fixes python-hyper#1236. This patch makes all header operations operate on `bytes` and converts all headers and values to bytes before operation. With a follow up patch to `hpack` it should also increase efficiency as currently, `hpack` casts everything to a `str` first before converting back to bytes: https://github.com/python-hyper/hpack/blob/02afcab28ca56eb5259904fd414baa89e9f50266/src/hpack/hpack.py#L150-L151

BYK · 2024-11-13T22:04:39Z

@Kriechi okay the PR is up -- please see my note regarding formatting. Happy to work on that part if review turns out to be daunting.

Btw. I had to go the other way around and use bytes for everything instead of converting everything to str. We need an upstream patch to hpack to avoid that unnecessary bytes -> str -> bytes dance.

Fixes python-hyper#1236. This patch makes all header operations operate on `bytes` and converts all headers and values to bytes before operation. With a follow up patch to `hpack` it should also increase efficiency as currently, `hpack` casts everything to a `str` first before converting back to bytes: https://github.com/python-hyper/hpack/blob/02afcab28ca56eb5259904fd414baa89e9f50266/src/hpack/hpack.py#L150-L151

sethmlarson added the Bug label Feb 21, 2021

bdraco mentioned this issue Nov 15, 2021

Bump httpx from 0.19.0 to 0.21.0 home-assistant/core#59723

Merged

22 tasks

pquentin mentioned this issue Nov 15, 2023

Run one test using Hypercorn urllib3/urllib3#3190

Merged

mib1185 mentioned this issue Apr 10, 2024

Python BytesWarning in h2 lib home-assistant/core#115379

Closed

BYK linked a pull request Nov 13, 2024 that will close this issue

fix: No more BytesWarnings #1286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison between bytes and string in defining a frozenset throws exception #1236

Comparison between bytes and string in defining a frozenset throws exception #1236

chaofanhan commented Aug 6, 2020

stroeder commented Feb 7, 2022

stroeder commented Feb 7, 2022

straz commented Jan 5, 2023

pquentin commented Nov 15, 2023

BYK commented Oct 31, 2024

BYK commented Oct 31, 2024

Kriechi commented Nov 9, 2024

BYK commented Nov 12, 2024

BYK commented Nov 13, 2024

Comparison between bytes and string in defining a frozenset throws exception #1236

Comparison between bytes and string in defining a frozenset throws exception #1236

Comments

chaofanhan commented Aug 6, 2020

stroeder commented Feb 7, 2022

stroeder commented Feb 7, 2022

straz commented Jan 5, 2023

pquentin commented Nov 15, 2023

BYK commented Oct 31, 2024

BYK commented Oct 31, 2024

Kriechi commented Nov 9, 2024

BYK commented Nov 12, 2024

BYK commented Nov 13, 2024