-
Notifications
You must be signed in to change notification settings - Fork 2
Variant Annotation adding new columns
VEP has a number of Plugins and it's quite easy to add a column from VCF, BED, GFF, GTF, BigWig - see VEP Custom.
I suggesting running VEP manually via command line, there are some test VCFs for GRCh37/GRCh38 in annotation/tests/test_data/
This is also useful to check how the output is formatted, and whether it varies per transcript.
Determine whether the field varies per transcript (such as amino_acids/exon) or not (eg population frequency)
Add the column to AbstractVariantAnnotation
(which is the base class of both VariantAnnotation
and VariantTranscriptAnnotation
) if you need a copy per transcript, otherwise add it to VariantAnnotation
To save space, use a choices field if it uses a limited number of text values.
You need to run python3 manage.py makemigrations annotation
to generate the schema migration script after changing the model.
If you want to display the annotation field on an analysis grid, you need to first create a new snpdb.VariantGridColumn
record via data migration, eg:
python3 manage.py makemigrations snpdb --empty --name "variant_grid_column_new_exon_field"
Then add the new record (see snpdb/migrations/0002_initial_data.create_columns
to see how initial default columns are created) eg:
VariantGridColumn.objects.create(grid_column_name='exon',
grid_column_name='variantannotation__exon',
annotation_level': 'T',
label='Exon',
description='Number(s) of affected exon(s)',
model_field=True,
queryset_field=True)
Users can add this column to their custom columns via the settings page, or if you'd like it added to default columns, create CustomColumn
records in the data migration.
The command line for VEP is generated in annotation.vep_annotation.get_vep_command
and is driven by the database records annotation.ColumnVEPField
If you're adding a Plugin or new custom VCF for the first time, you'll need to modify VEPPlugin
or VEPCustom
and then run an annotation model migration. If the plugin requires annotation data, you'll need to add a record to settings eg settings.ANNOTATION["GRCh37"]["vep_config"]
and then reference that via PLUGINS mapping in get_vep_command()
Add a new data migration:
python3 manage.py makemigrations annotation --empty --name "annotation_vep_new_exon_field"
See annotation/migrations/0003_initial_data.populate_column_vep_fields
for examples.
If you want the new column to be exported via VCF (save a grid as VCF in an analysis) then you can also add to the annotation migration a ColumnVCFInfo record (see annotation/migrations/0003_initial_data.populate_column_vcf_info
for examples)
The VEP VCF file is processed by annotation.vcf_files.bulk_vep_vcf_annotation_inserter.BulkVEPVCFAnnotationInserter
If you need to format/modify a field from VCF into data, add a formatter in _add_vep_field_handlers
To display this new field on the variant details page, you need to add it to variant_details.html
There's a management command to run the currently configured VEP pipeline, and "--test" runs it over the test VCFs:
python3.8 manage.py vep_run --test --genome-build=GRCh37
python3.8 manage.py vep_run --test --genome-build=GRCh38
This will re-generate the annotated test VCFs with the new fields, and you can add a new test for the column to annotation.tests.test_annotation_vcf