Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression #35

Merged
merged 2 commits into from
Jul 29, 2024
Merged

Fix regression #35

merged 2 commits into from
Jul 29, 2024

Conversation

sam-kleiner
Copy link
Contributor

Had a typo in domain lookup where when looking up related urls, I accidentally looked up the original again

@sam-kleiner
Copy link
Contributor Author

sam-kleiner commented Jul 29, 2024

Sorry about the regression there, this should sort it out.

It does bring up a question though, which might show my ignorance on how the whois is meant to work. As you can see I had to update some of the values in the test. Looks like this is because of your recursive_merge function.

Am I right to assume from context anything from the related should be overriding the original call's values? Based on the whois.com screenshot that merge looks to set the correct values for registration_date and expiration_date, but it looks like the last_changed_date is lost (but that does show up in whois.com). I don't really understand whois deep enough to know if that's 100% a bug or not, and if there is other info that could be lost. Is the ToS and url fields changing a problem, or is that intended?

Here is the diff for follow related. Left is no follow, right is follow.

Screenshot from 2024-07-29 13-47-00

Screenshot from 2024-07-29 13-52-09

@meeb
Copy link
Owner

meeb commented Jul 29, 2024

Ah right you are, good catch. I missed that as well. Thanks!

@meeb meeb merged commit 4f57540 into meeb:main Jul 29, 2024
5 checks passed
@meeb
Copy link
Owner

meeb commented Jul 29, 2024

Bundled into 3.0.2.

And yes generally the related RDAP endpoint would have more information so it should overwrite the initial results. This shouldn't remove values from the first response, but overwrite any duplicate rows before being sent to the parser.

I don't personally find replacing the TOS to just being the related info TOS URL to be a massive problem, although if it was being pedantic returning both URLs would likely be more "correct". This wasn't the previous behavior because querying and overlaying related server data is a new-ish feature and it would have changed the output data format to return multiple TOS URLs.

@meeb
Copy link
Owner

meeb commented Jul 29, 2024

In recursive_merge the new value doesn't overwrite the old one if the value isn't truthy so that last_change_date of null from the related info endpoint shouldn't overwrite the valid one from the primary endpoint:

https://github.com/meeb/whoisit/blob/main/whoisit/utils.py#L161

@sam-kleiner
Copy link
Contributor Author

sam-kleiner commented Jul 29, 2024

Cool that makes sense. Based on that though the last_changed_date is still being lost. Doing some more research it looks like its any item that comes back as a list in the raw could lose values.

Based on the values you care about in your parsers this recursive_merge_lists should select related values from the list while preserving the original value if it does not exist in the related. This does fix the missing last_changed_date. It dedupes those lists based on eventAction or title (depending on the item).

Thoughts?

Screenshot from 2024-07-29 14-48-59

This was just a quick test, there is probably a better/cleaner solution

@sam-kleiner
Copy link
Contributor Author

sam-kleiner commented Jul 29, 2024

Here is the raw events from the query in my previous comment, so you can see last changed is there, but does not end up in parsed

Screenshot from 2024-07-29 14-52-54

@meeb
Copy link
Owner

meeb commented Jul 29, 2024

Yeah something like that would probably be helpful and improve result quality, thanks. Pop in an isinstance(v, list) check as well and that would be fine. Feel free to add that as a PR.

There isn't anything about merging results anywhere I've found so there's no a standard I'm following here. I do it just to have a "query" return a singular "result" and have a nicer interface. I would imagine the proper way to do this is just to make one query, see the related info endpoints, then query them manually. You can do this if you want with calling whoisit.domain(...raw=True, follow_related=False) then manually making the related endpoint query if you wanted. Of course, minimal data loss is preferable and the related info endpoint typically has more detailed information.

(Mostly additional details for anyone stumbling over this in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants