Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem when BLAST taxdb becomes temporarily out of sync with the NCBI taxonomy db #23

Open
stenglein-lab opened this issue Oct 26, 2023 · 0 comments

Comments

@stenglein-lab
Copy link
Owner

stenglein-lab commented Oct 26, 2023

An issue can arise when the BLAST TAXDB (https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz) becomes temporarily out of sync with the NCBI taxonomy database (https://ftp.ncbi.nih.gov/pub/taxonomy). This can happen if the NCBI taxonomy database is updated and release and updating and release of the BLAST taxdb lags behind and becomes out of sync.

In this case, BLAST, using the most recent version of taxdb.tar.gz can report a taxid in the staxid column of BLAST output that no longer exists in the most recent NCBI Taxonomy database.

Since this pipeline downloads the most recent BLAST taxdb and the most recent NCBI taxonomy databases, this can happen. When it occurs, it will be difficult to detect and the pipeline may fail to report a virus sequence and will fail silently.

Example

This happened to me on Oct 26, 2023 where a BLASTN query aligned to D. melanogaster nora virus sequences such as KP970098.1.

This sequence is currently assigned to species Drosophila melanogaster Nora virus with taxid 3071212. It looks like this assignment reflects a relatively recent reorganization of its taxonomy. Before the re-organization, this sequence was assigned to species "Nora virus", with taxid 363716.

As of Oct 26, 2023, the most recent taxdb.tar.gz file lists accession KP970098.1 as being assigned to taxid 363716 but that taxid no longer exists in the most recent NCBI taxonomy database.

Possible Solutions

I'm honestly not sure how to fix this. It's variable and dependent on the timing of NCBI database releases and any particular manifestation of it is temporary and will go away when the BLAST taxdb catches up to the taxonomy db.

One possible solution is to use fixed taxdb and taxonomy files that are in sync (if this exists?).

@stenglein-lab stenglein-lab changed the title Potential problem when BLAST taxdb becomes temporarily out of sync with the NCBI taxonomy db Problem when BLAST taxdb becomes temporarily out of sync with the NCBI taxonomy db Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant