Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too fuzzy name searching with name2taxid #111

Open
tao-bioinfo opened this issue Feb 8, 2025 · 2 comments
Open

Too fuzzy name searching with name2taxid #111

tao-bioinfo opened this issue Feb 8, 2025 · 2 comments

Comments

@tao-bioinfo
Copy link

tao-bioinfo commented Feb 8, 2025

#88

$ echo 'Pholoe glabra' | taxonkit name2taxid -f
Pholoe glabra   2975041
$ echo 2975041 | taxonkit lineage
2975041 cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Asphodelaceae;Asphodeloideae;Gasteria;Gasteria carinata;Gasteria carinata var. glabra (Salm-Dyck)

The genus has been changed (Pholoe -> Gasteria), seems unreasonable. The "fuzzy" search is too fuzzy.

Pholoe glabra is from https://v4.boldsystems.org/index.php/Public_RecordView?processid=BIOMB241-23.

@shenwei356
Copy link
Owner

NCBI Taxonomy official website can't return useful matches either. For these taxonomic names changed a lot, you can query the history at https://taxonomy.onecodex.com/ or https://github.com/shenwei356/taxid-changelog . But the two sources did not return any hit ...

From google:

Pholoe glabra is a species of marine worm in the genus Pholoe

Other sources

Looks like it exists, but NCBI Taxonomy did not record it.

Try Pholoe glabra

# nothing found in taxid-changelog
$ zcat taxid-changelog.csv.gz | csvtk grep -I -f name -i -p 'Pholoe glabra'
taxid,version,change,change-value,name,rank,lineage,lineage-taxids

# directly on old taxdump files
fd names.dmp.gz | rush 'zgrep -i "Pholoe glabra" || true'

Try only Pholoe, looks like it's what you're looking for

$ echo Pholoe | taxonkit  name2taxid 
Pholoe  222012

$ echo 222012 | taxonkit lineage -nr
222012  cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Annelida;Polychaeta;Errantia;Phyllodocida;Pholoidae;Pholoe       Pholoe  genus

Species

222012 [genus] Pholoe
  222013 [species] Pholoe minuta
  318818 [species] Pholoe baltica
  328599 [species] Pholoe pallida
  1888207 [species] Pholoe longa
  2594901 [species] Pholoe assimilis
  2594902 [species] Pholoe inornata
  2644711 [no rank] unclassified Pholoe
    862950 [species] Pholoe sp. CMC01
    868068 [species] Pholoe sp. CMC02
    ...

@tao-bioinfo
Copy link
Author

tao-bioinfo commented Feb 8, 2025

In my opinion, for such case, it is better to return a blank result even under -f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants