You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For submission of sequences to databases like NCBI, one often requires genome annotations. Since we align sequences to a well annotated reference sequence, we can 'lift' this annotation to the query sequences.
Essentially, we each feature in the annotation, we would record the beginning and end coordinates of the feature (and subfeatures) on the query sequence.
this could for example happen alongside the extraction here:
In the output, the simplest thing would probably be to stream the new gff entries into a common gff file where the chromosome contains the index/ID of the sequence. One could also make a separate file for each sequence, but that can generate a lot of files when run on a large dataset.
The text was updated successfully, but these errors were encountered:
For submission of sequences to databases like NCBI, one often requires genome annotations. Since we align sequences to a well annotated reference sequence, we can 'lift' this annotation to the query sequences.
Essentially, we each feature in the annotation, we would record the beginning and end coordinates of the feature (and subfeatures) on the query sequence.
this could for example happen alongside the extraction here:
nextclade/packages/nextclade/src/translate/extract.rs
Line 10 in c9c28ad
and would require adding a map
aln_to_qry
herenextclade/packages/nextclade/src/coord/coord_map_global.rs
Line 16 in c9c28ad
(could happen via the function make_aln_to_ref_map)
In the output, the simplest thing would probably be to stream the new gff entries into a common gff file where the chromosome contains the index/ID of the sequence. One could also make a separate file for each sequence, but that can generate a lot of files when run on a large dataset.
The text was updated successfully, but these errors were encountered: