Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for http://hesperomys.com/ #144

Open
jhpoelen opened this issue Jan 25, 2023 · 17 comments
Open

add support for http://hesperomys.com/ #144

jhpoelen opened this issue Jan 25, 2023 · 17 comments
Labels
enhancement New feature or request

Comments

@jhpoelen
Copy link
Member

suggest to add support for http://hesperomys.com/ to Nomer

related to discussions in
mammaldiversity/mammaldiversity.github.io#22 mammaldiversity/mammaldiversity.github.io#23
involving @JelleZijlstra @n8upham

@JelleZijlstra
Copy link

That would be great! Let me know if you need any help.

Hesperomys makes a distinction between taxa and names. Taxon URLs are of the form http://hesperomys.com/t/32496, with a redirect from the currently valid name (http://hesperomys.com/t/Agathaeromys). Name URLs use /n/, like http://hesperomys.com/n/59009, with a redirect for the original name (e.g. https://hesperomys.com/n/Agathaeromys).

@jhpoelen
Copy link
Member Author

jhpoelen commented Jan 25, 2023

@JelleZijlstra glad to hear that you are excited and eager to help.

I do have a question:

Would you happen to publish your database as a whole dataset?

I am trying to figure out how to access and index your databases for fast (offline) access via Nomer.

@JelleZijlstra
Copy link

I currently don't. The database is not in a practical format (a 194 MB sqlite database) and some of the internal data I'd rather not publish. However, I can generate files in some more usable format (e.g., a CSV with all names or all taxa). One of my learnings from talking to @n8upham was that versioning is important, so I am now putting together a plan to introduce a versioning scheme where every time I update the public website I increment the version and save a copy of the database. I could then also generate a data file in CSV format and publish it.

What kind of format would be useful for you?

@jhpoelen
Copy link
Member Author

Thanks for your prompt reply.

What kind of format would be useful for you?

Any digital format that is easy for you to generate.

@jhpoelen
Copy link
Member Author

a sqllite data dump would do just fine 😄

@jhpoelen
Copy link
Member Author

and you might make biologists happy be producing tabular text files like csv, ideally in denormalized form, so that no joins are needed.

@JelleZijlstra
Copy link

and you might make biologists happy be producing tabular text files like csv, ideally in denormalized form, so that no joins are needed.

I wrote a quick export format for the MDD people already, if you email me (email is on my profile) I can send it to you too as a sample. It's not the whole database, but would give a sense of what the data would look like. I can adjust the export script to add additional information and then publish those regularly.

@jhpoelen
Copy link
Member Author

I wrote a quick export format for the MDD people already,
Great!

Your proposed sample would help me get started with integration, especially if you share the example publicly or be ok with the same being public.

@jhpoelen
Copy link
Member Author

I've temporarily added your snapshot of mammalia.csv to https://github.com/jhpoelen/hesperomys/ to help prototype an integration with your dataset. Happy to make changes if needed, I attempted to credit your work, but I am sure more can be done.

Also, I have another question.

Is there a way to infer the linked taxon and the type of name-taxon relation from mammals.csv ?

I was able to find the id for the name (e.g., http://hesperomys.com/n/2756 Platypus Anatinus Shaw, 1799 ), but wasn't able to locate the information that helps to generate texts like "Valid name for Ornithorhynchus anatinus" with a link to taxon http://hesperomys.com/t/1958 . However, I do see the taxonomic information related to the taxon (e.g., "

2756,http://hesperomys.com/n/2756,Mammalia,Monotremata,Ornithorhynchidae,Ornithorhynchus,Ornithorhynchus anatinus,,Platypus Anatinus,anatinus,Shaw,http://hesperomys.com/h/62284,1799,Text to Plates 385–386,,,"Shaw, G. 1799. Platypus anatinus. The Duck-billed Platypus. The Naturalists' Miscellany: containing accurate and elegant coloured figures of the most curious and beautiful productions of nature; with descriptions in Latin and English in the Linnaean manne [From {Mammalia-review (MSW3)}: Nat. Misc., vol. 10, pl. 385-386] [From {Mammalia-review (MSW3)}: Nat. Misc., vol. 10, pl. 385-386] [From {Mammalia-review (MSW3)}: Nat. Misc., vol. 10, pl. 385-386] [From {Mammalia-review (MSW3)}: Nat. Misc., vol. 10, pl. 385-386] [From {Mammalia-review (MSW3)}: Nat. Misc., vol. 10, pl. 385-386] [From {Australia (Jackson & Groves 2015).pdf}: Shaw G (1799) The Naturalist's Miscellany: or coloured figures of natural objects, drawn and described immediately from nature. F.P. Nodder & Co., London. Volume 10. Text to Plates 385–386. [Jun 1799]]",http://hesperomys.com/cg/596,New South Wales,"""Australia, New South Wales, New Holland (= Sydney)."" (Wilson & Reeder, 2005, http://hesperomys.com/a/9291); ""Sydney, New South Wales, Australia."" (Jackson & Groves, 2015, http://hesperomys.com/a/34474)",,,,,available
), but somehow wasn't able to find the taxon id.

@JelleZijlstra Do you have any suggestions on how to resolve related taxon ids for a name id?

image

@JelleZijlstra
Copy link

Great, thanks!

As for the name/taxon link, that information isn't included in the current export, sorry. I'll add a column for the taxon link to the Name exporter, and also a column for the status ("valid", "synonym" and a few other options).

@jhpoelen
Copy link
Member Author

Excellent! Happy to continue the integration work once you share the updated export. Hope it isn't too much work to create a new mammalia.csv . In fact, please feel free to create a pull request for https://github.com/jhpoelen/hesperomys if you feel comfortable doing that.

@JelleZijlstra
Copy link

Sounds good! Let me know if you have any other feedback about ways to make the format more useful to you.

Should I include fossils as well as extant species?

@jhpoelen
Copy link
Member Author

Should I include fossils as well as extant species?

Yes please!

@JelleZijlstra
Copy link

And should I include higher-rank names as well as species-group names? (The current format was for comparing to species in the MDD, which is why I only included extant species within Mammalia.)

@jhpoelen
Copy link
Member Author

@JelleZijlstra Thanks for the suggestions.

Ideally, all information would be included in the export, with each row being an denormalized, independent representation of a name relation.

And, I also try to be pragmatic, so I'd rather have an updated export with some minor items missing sooner rather than the "perfect" export many months from now.

Curious to see what you come up with.

@jhpoelen
Copy link
Member Author

btw - I've had some success exporting hierarchical data using line json, one json object per line. But . . . json may understandably alienate some folks that are more comfortable in table/spreadsheet land.

@JelleZijlstra
Copy link

JelleZijlstra commented Jan 26, 2023

Ideally, all information would be included in the export, with each row being an denormalized, independent representation of a name relation.

Thanks, that's a good guideline to work with.

And, I also try to be pragmatic, so I'd rather have an updated export with some minor items missing sooner rather than the "perfect" export many months from now.

For the most part these are very easy changes: JelleZijlstra/taxonomy@5e8b7ca. But definitely agree that working now is better than perfect a long time in the future.

@jhpoelen jhpoelen added the enhancement New feature or request label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants