You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for making such an amazing consortium with chromosome level assembly data. I am currently trying to compare some DNAzoo assembly with assemblies on NCBI for BUSCO analyses and have a very small question. However, I noticed that for some <genomename>functional.transcripts.fasta files, the transcripts includes -- sites which are not accepted for BUSCO.
Dear DNAzoo annotation team,
Thank you for making such an amazing consortium with chromosome level assembly data. I am currently trying to compare some DNAzoo assembly with assemblies on NCBI for BUSCO analyses and have a very small question. However, I noticed that for some
<genomename>functional.transcripts.fasta
files, the transcripts includes--
sites which are not accepted for BUSCO.For example, in the brydes whale annotations:
Balaenoptera_edeni_HiC.fasta_v2.functional.transcripts.fasta.gz there are lines that look like these:
>Balaenoptera_027329-RA transcript Name:"Similar to Bzw1 Basic leucine zipper and W2 domain-containing protein 1 (Rattus norvegicus OX=10116)" offset:2 AED:0.00 eAED:0.00 QI:0|-1|1|1|-1|0|1|219|268
CTATGTTGACTGGTGTTCTTCTGGCTAATGGAACACTTAATGCATCCATTCTTAATAGCC
TTTATAATGAGAATTTGGTTAAAGAAGGGGTTTCAGCAGCTTTTGCTGTAAAGCTCTTTA
AATCATGGATAAATGAAAAAGATATCAATGCAGTAGCTGCAAGTCTTCGGAAAGTCAGCA
TGGATAACAGACTGATGGAACTTTTTCCTGCCAATAAACAAAGCGTTGAACACTTCACTA
AGTATTTTACTGAGGCAGGCTTGAAAGAACTCTCAGAGTATGTTCGAAATCAGCAAACCA
TAGGAGCTCGAAAGGAACTCCAGAAAGAACTTCAAGAACAGATGTCCCGTGGTGATCCAT
TTAAGGATATAATTTTGTATGTCAAGGAGGAGATGAAAAAAAACAACATCCCAGAACCCG
TTGTCATTGGGATAGTCTGGTCCAGCGTAATGAGCACCGTGGAATGGAACAAAAAGGAAG
AGCTTGTAGCAGAGCAGGCCATCAAGCACTTGAAGCAATACAGCCCTCTACTTGCTGCCT
TTACTACTCAAGGTCAGTCTGAGCTGACTCTGTTACTGAAGATTCAGGAGTATTGCTATG
ACAACATTCATTTCATGAAAGCCTTCCAGAAAATCGTGGTGCTTTTTTATAAAGCTGAAG
TCCTGAGTGAAGAGCCCATTTTGAAGTGGTATAAAGATGCACATGTTGCAAAGGGAAAAA
GTGTCTTCCTTGAGCAAATGAAAAAGTTTGTAGAGTGGCTCAAAAATGCTGAAGAAGAAT
CTGAGTCTGAAGCTGAAGAAGTTAGGAGTAATGGA--------CCCCGGCATGGCAAACA
GTTGAAGAACGGAGAAAACTGGATAGCTGACCT-TCCAGATAGTTGTTGGCACTCAGAAC
CACC-----TCAAG-----TACA--GCCATCCAAACCAGTAATTACATTGCTGCATTATT
TCTGTGTTAACTGTGAAAT-CTG--CTGCTTGTCTGTACCCTTGAAATGGAA-TAAAATT
TC-ATG
However the
-
is not a valid base in the BUSCO software:I was wondering should I remove the
-
(GA--------CC toGACC
) or change them toN
(GA--------CC toGANNNNNNNNCC
) for BUSCO compatibility?Thank you very much in advance! And thank you for making this amazing pipeline available!
Best,
Meixi
The text was updated successfully, but these errors were encountered: