Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __hash__ method to Substance class #226

Merged
merged 3 commits into from
Mar 5, 2024

Conversation

DNIIBOY
Copy link
Contributor

@DNIIBOY DNIIBOY commented Feb 24, 2024

Allow substances to be used in dictionaries, sets etc. by including a hash method in the class.

@jeremyagray
Copy link
Collaborator

Two questions:

  1. Wouldn't the hash need to be computed on all the attributes used in __eq__? Otherwise, two substances differing in only their data would be unequal and hash identically. Tests to check for this should probably be included. The documentation states that equal objects must hash identically and this code seems to do that, but I would also expect substances with equal hashes to be equal and they won't necessarily be without considering the data and composition attributes.
  2. Since tuples are hashable, why not just use return hash(self.attrs)? Your function is pretty much how tuples are hashed anyway, I think (unless I'm missing something...).

Looking through the rest of the classes in chemistry.py, it looks like there are other instances of __eq__ and __hash__ with similar issues, so maybe all of them need to be addressed simultaneously especially regarding which attributes really need to be checked for equality and then use the same attributes for hashing.

@DNIIBOY
Copy link
Contributor Author

DNIIBOY commented Feb 24, 2024

  1. The reason for not including all elements of self.attrs, is that data and composition are both dictionaries, which are unhashable. We could look at converting their types and then hashing? or maybe just hashing the elements within them? The issue here is that dictionaries are supposed to be unsorted, so just changing their type will allow two identical objects to be hashed to different values.
  2. Doing hash(self.attrs) will just hash the strings of self.attrs so the hash will be identical for all objects.

The idea behind only hashing these fields, was that self.name and the others were defined by data and composition, but i'm not sure if this assumption is correct.

@bjodah
Copy link
Owner

bjodah commented Feb 25, 2024

We can hash the "tuple of the sorted items", e.g.:
tuple(sorted(some_dict.items())).
Another alternative (maybe a separate issue, and not necessarily adressed in this PR), would be to use frozendict from PyPI.

@DNIIBOY
Copy link
Contributor Author

DNIIBOY commented Feb 25, 2024

We can hash the "tuple of the sorted items", e.g.: tuple(sorted(some_dict.items())). Another alternative (maybe a separate issue, and not necessarily adressed in this PR), would be to use frozendict from PyPI.

I agree and made the change, should i fix the spelling error stopping CI as well?

@bjodah
Copy link
Owner

bjodah commented Feb 25, 2024 via email

@DNIIBOY
Copy link
Contributor Author

DNIIBOY commented Feb 25, 2024

All right, i'll leave it for another PR

@Gustav2
Copy link

Gustav2 commented Mar 3, 2024

Whats the status? I really need this for a project, can it be merged soon?

@bjodah bjodah merged commit 7839245 into bjodah:master Mar 5, 2024
1 check failed
@bjodah
Copy link
Owner

bjodah commented Mar 5, 2024

Thank you @DNIIBOY , much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants