Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomy cleaner is not populating tidInterpreted when matching on a newly-added CoL name #429

Closed
themerekat opened this issue Dec 11, 2023 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@themerekat
Copy link
Collaborator

themerekat commented Dec 11, 2023

For example, this name was added to the taxonomic thesaurus when I ran the cleaning tool previously: https://allasiatcn.org/taxa/index.php?taxon=6279

But when I ran it again, I found that it the specimen had not actually been linked to the new taxonomic name:
image

@themerekat themerekat changed the title Taxonomy cleaner is not populated tidInterpreted when matching on a newly-added CoL name Taxonomy cleaner is not populating tidInterpreted when matching on a newly-added CoL name Dec 11, 2023
@themerekat themerekat self-assigned this Dec 11, 2023
@themerekat themerekat added the bug Something isn't working label Dec 11, 2023
@themerekat themerekat removed their assignment Dec 11, 2023
@themerekat themerekat added the high priority This issue has been considered a priority and will be developed soon label Dec 11, 2023
@egbot
Copy link
Member

egbot commented Dec 12, 2023

I'm not able to reproduce this issue. I'm finding that the occurrence are being indexed to the thesaurus.
However, the indexing step happens at the end of the processing. Thus, if they page times out or is closed before the processing finishes, indexing the occurrences to the new names will not happen. According to the output in the screenshot, I'm guess that is what happened in this case.

@egbot
Copy link
Member

egbot commented Dec 12, 2023

Running the cleaning and linking script after each addition of a name would significantly slow down the process. However, we could trigger that process after every hundred or so names are added. That way, if the process fails before completion, most of the occurrence will be indexed. Unindexed names are often resolved in short order, since that same cleaning process gets triggered by other tools. Occurrences rarely remain unlinked to the thesaurus for long.

@egbot egbot removed bug Something isn't working high priority This issue has been considered a priority and will be developed soon labels Dec 12, 2023
@themerekat
Copy link
Collaborator Author

Ah, ok, thanks for the clarification! That makes a lot of sense.

@egbot
Copy link
Member

egbot commented Dec 12, 2023

I'm going to leave this open with an enhancement tag. Until then, it's best to add taxa in smaller batches (e.g. <= 2000), or if a larger run fails, run tool with a limit of 10 and that will trigger the indexing script.

@egbot egbot reopened this Dec 12, 2023
@egbot egbot self-assigned this Dec 12, 2023
@egbot egbot added the enhancement New feature or request label Dec 12, 2023
@themerekat themerekat transferred this issue from BioKIC/Symbiota Dec 12, 2023
@themerekat
Copy link
Collaborator Author

Sounds good! I've moved it to the symbiota-docs repo, where we put optional enhancements.

@themerekat themerekat closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Symbiota issue triage Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants