-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stable_hash(::Regex) #66
Comments
Yes, that sounds useful to me. I am wrapping up some work in #55 to improve the stability of hashes with a new API design for customizing how types get hashed. I think this improvement makes sense to make after that merges. |
In the meantime, I believe the following should be an effective workaround. (Though I haven't tested it). StableHashTraits.hash_method(::Regex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options)) Or, for a specific context, you could do struct HashRegex{T}
parent::T
end
StableHashTraits.parent_context(x::HashRegex) = x.parent
StableHashTraits.hash_method(::Regex, ::HashRegex) = FnHash(x -> (x.pattern, x.compile_options, r.match_options))
stable_hash(r"a*b*", HashRegex(HashVersion{3}())) |
Resolved by #58 |
I don't think #58 actually fixes this, it's just now feasible to fix. Need to add tests and methods of |
I've looked into this again and it seems to working for Here's what the mutable struct Regex <: AbstractPattern
pattern::String
compile_options::UInt32
match_options::UInt32
regex::Ptr{Cvoid} # Cvoid===Nothing
end For But actually, julia> bytes2hex(stable_hash(Ptr{Nothing}(); version=4))
"17cadb739b5ed376bb7c939f139debd2cba4530a93cb7ad7bbc1e61dbd3387eb"
julia> bytes2hex(stable_hash(Ptr{Nothing}(1234); version=4))
"17cadb739b5ed376bb7c939f139debd2cba4530a93cb7ad7bbc1e61dbd3387eb"
julia> bytes2hex(stable_hash(Ptr{Int}(1234); version=4))
"17cadb739b5ed376bb7c939f139debd2cba4530a93cb7ad7bbc1e61dbd3387eb" so my conclusion is that it works now. |
The natural follow-up question is if I'm fine with any handling of |
Here's a simple transformer for transform_type(::Type{Regex}) = "Base.Regex"
function transformer(::Type{Regex}, ::HashVersion{4})
# This skips the compiled regex which is stored as a Ptr{Nothing}
return Transformer(pick_fields(:pattern, :compile_options, :match_options))
end |
Ah... interesting. Sorry, I somehow missed your updates in the issue here. I think we probably need some unit tests for
|
After digging a little further, it looks like it has to do with how bitstypes (primitive types) are handled. Here's an example showing what's going on for primitive type FakePtr 64 end
FakePtr(x) = Base.bitcast(FakePtr, x)
StableHashTraits.transform_type(::Type{FakePtr}) = "Ptr" As we can see, the FakePtr and Ptr hashes the same. (And the value stored in the bitstype is ignored.) julia> p = Ptr{Int}(1)
Ptr{Int64} @0x0000000000000001
julia> fp = FakePtr(2)
FakePtr(0x0000000000000002)
julia> bytes2hex(stable_hash(p; version=4))
"17cadb739b5ed376bb7c939f139debd2cba4530a93cb7ad7bbc1e61dbd3387eb"
julia> bytes2hex(stable_hash(fp; version=4))
"17cadb739b5ed376bb7c939f139debd2cba4530a93cb7ad7bbc1e61dbd3387eb" Also note that julia> StableHashTraits.hash_trait(p)
StructTypes.UnorderedStruct()
julia> fieldtypes(typeof(p))
() In conclusion, bitstypes are treated as structs without any fields. Is there any policy on how to handle bitstypes in |
Here I think the issue is that by default 🤔 maybe if a type has a |
Erroring by default sounds good to me. These are edge cases where the user should take action that is suitable for their situation, e.g. by defining their own hash context. |
I have just realized that the This is because julia> StructTypes.StructType(r"a")
StructTypes.StringType() The regex will get hashed using function stable_hash_helper(str, hash_state, context, ::StructTypes.StringType)
nested_hash_state = start_nested_hash!(hash_state)
update_hash!(nested_hash_state, str isa AbstractString ? str : string(str), context)
return end_nested_hash!(hash_state, nested_hash_state)
end where the important part is I discovered this accidentally when running the tests on Julia 1.9 (and earlier). This feels less than ideal. The string representation of Regex can potentially change again, if someone decides it can be printed more nicely somehow. What do you think is the best course of action?
|
I agree that relying on the Given that 1.3 was only recently released, it seems relatively safe to switch to this behavior. It also isn't obvious to me which would be the right choice if we go with a "string" solution: using the So at the moment I am leaning towards simply using the transform approach, and declaring the old behavior for Regex a bug (because it wasn't stable). |
Regex
is amutable struct
which causes it tostable_hash
differently each time.I think it would make sense to make it hash in a stable manner by default. Do you agree?
(The reason why it is a
mutable struct
is so it can interact with the GC to free the pointer to the compiled regex.)Here's the corresponding
hash
function inBase
, indicating that we should takepattern
,compile_options
andmatch_options
into account.The text was updated successfully, but these errors were encountered: