You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this case, BLAST, using the most recent version of taxdb.tar.gz can report a taxid in the staxid column of BLAST output that no longer exists in the most recent NCBI Taxonomy database.
Since this pipeline downloads the most recent BLAST taxdb and the most recent NCBI taxonomy databases, this can happen. When it occurs, it will be difficult to detect and the pipeline may fail to report a virus sequence and will fail silently.
Example
This happened to me on Oct 26, 2023 where a BLASTN query aligned to D. melanogaster nora virus sequences such as KP970098.1.
This sequence is currently assigned to species Drosophila melanogaster Nora virus with taxid 3071212. It looks like this assignment reflects a relatively recent reorganization of its taxonomy. Before the re-organization, this sequence was assigned to species "Nora virus", with taxid 363716.
As of Oct 26, 2023, the most recent taxdb.tar.gz file lists accession KP970098.1 as being assigned to taxid 363716 but that taxid no longer exists in the most recent NCBI taxonomy database.
Possible Solutions
I'm honestly not sure how to fix this. It's variable and dependent on the timing of NCBI database releases and any particular manifestation of it is temporary and will go away when the BLAST taxdb catches up to the taxonomy db.
One possible solution is to use fixed taxdb and taxonomy files that are in sync (if this exists?).
The text was updated successfully, but these errors were encountered:
stenglein-lab
changed the title
Potential problem when BLAST taxdb becomes temporarily out of sync with the NCBI taxonomy db
Problem when BLAST taxdb becomes temporarily out of sync with the NCBI taxonomy db
Oct 27, 2023
An issue can arise when the BLAST TAXDB (https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz) becomes temporarily out of sync with the NCBI taxonomy database (https://ftp.ncbi.nih.gov/pub/taxonomy). This can happen if the NCBI taxonomy database is updated and release and updating and release of the BLAST taxdb lags behind and becomes out of sync.
In this case, BLAST, using the most recent version of taxdb.tar.gz can report a taxid in the staxid column of BLAST output that no longer exists in the most recent NCBI Taxonomy database.
Since this pipeline downloads the most recent BLAST taxdb and the most recent NCBI taxonomy databases, this can happen. When it occurs, it will be difficult to detect and the pipeline may fail to report a virus sequence and will fail silently.
Example
This happened to me on Oct 26, 2023 where a BLASTN query aligned to D. melanogaster nora virus sequences such as KP970098.1.
This sequence is currently assigned to species Drosophila melanogaster Nora virus with taxid 3071212. It looks like this assignment reflects a relatively recent reorganization of its taxonomy. Before the re-organization, this sequence was assigned to species "Nora virus", with taxid 363716.
As of Oct 26, 2023, the most recent taxdb.tar.gz file lists accession KP970098.1 as being assigned to taxid 363716 but that taxid no longer exists in the most recent NCBI taxonomy database.
Possible Solutions
I'm honestly not sure how to fix this. It's variable and dependent on the timing of NCBI database releases and any particular manifestation of it is temporary and will go away when the BLAST taxdb catches up to the taxonomy db.
One possible solution is to use fixed taxdb and taxonomy files that are in sync (if this exists?).
The text was updated successfully, but these errors were encountered: