Annotation of data in query coordinates #1571

rneher · 2025-02-23T09:11:15Z

For submission of sequences to databases like NCBI, one often requires genome annotations. Since we align sequences to a well annotated reference sequence, we can 'lift' this annotation to the query sequences.

Essentially, we each feature in the annotation, we would record the beginning and end coordinates of the feature (and subfeatures) on the query sequence.

this could for example happen alongside the extraction here:

nextclade/packages/nextclade/src/translate/extract.rs

Line 10 in c9c28ad

    
           pub fn extract_cds_from_aln(seq_aln: &[Nuc], cds: &Cds, coord_map_global: &CoordMapGlobal) -> Vec<Nuc> {

and would require adding a map aln_to_qry here

nextclade/packages/nextclade/src/coord/coord_map_global.rs

Line 16 in c9c28ad

ref_to_aln_table: Vec<NucAlnGlobalPosition>,

(could happen via the function make_aln_to_ref_map)

In the output, the simplest thing would probably be to stream the new gff entries into a common gff file where the chromosome contains the index/ID of the sequence. One could also make a separate file for each sequence, but that can generate a lot of files when run on a large dataset.

The text was updated successfully, but these errors were encountered:

rneher added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement labels Feb 23, 2025

ivan-aksamentov mentioned this issue Feb 23, 2025

feat: output Genebank feature table file #982

Draft

12 tasks

ivan-aksamentov removed good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment labels Feb 23, 2025

ivan-aksamentov linked a pull request Mar 9, 2025 that will close this issue

feat: annotation of data in query coordinates #1578

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotation of data in query coordinates #1571

Annotation of data in query coordinates #1571

rneher commented Feb 23, 2025

Annotation of data in query coordinates #1571

Annotation of data in query coordinates #1571

Comments

rneher commented Feb 23, 2025