Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need distance to flanking upstream and downstream exons and genes #1834

Open
Sanaz01 opened this issue Jan 28, 2025 · 2 comments
Open

Need distance to flanking upstream and downstream exons and genes #1834

Sanaz01 opened this issue Jan 28, 2025 · 2 comments
Assignees

Comments

@Sanaz01
Copy link

Sanaz01 commented Jan 28, 2025

Hi,
I am trying to annotate my list of SNP variants using VEP in the docker container ensemblorg/ensembl-vep:release_113.3
The command line is as follows:

vep --input_file inputfiles/variants.vcf --output_file vep_results --overlaps --plugin NearestGene --plugin NearestExonJB --safe --cache --cache_version 106 --nearest=gene --fasta reference_genome.fa --numbers --total_length --biotype --protein --hgvs --hgvsg --symbol

How can I get the following information in my output file?

  1. For intron_variants, report the exon IDs that flank this region and return distance to them (both upstream and downstream)
  2. For intergenic_variant, report the gene IDs that flank this region and return distance to them (both upstream and downstream)

Thank you in advance!

@dglemos dglemos self-assigned this Jan 29, 2025
@dglemos
Copy link
Contributor

dglemos commented Jan 30, 2025

Hi @Sanaz01,

For intron_variants, report the exon IDs that flank this region and return distance to them (both upstream and downstream)

The plugin NearestExonJB returns the nearest exon junction boundary to a coding sequence variant, but it only returns one exon. However, this plugin is not designed for your specific use case, as it returns exons that overlap the variant rather than those nearest to it.
We are going to investigate if it's possible to extend the existing plugin to fit your specific requirement.

For intergenic_variant, report the gene IDs that flank this region and return distance to them (both upstream and downstream)

The plugin NearestGene reports the nearest genes to a intergenic variant both upstream and downstream.
By default it only returns one gene but that can be changed with option limit (default: 1 gene). You can also change the maximum search range with option max_range (default: 10000 bp)
The current limitation is that it only reports the gene IDs, without indicating the distance to the variant.

@Sanaz01
Copy link
Author

Sanaz01 commented Feb 4, 2025

Hi @dglemos
Thank you so much for you feedback. I investigated a little futher and found the following things:
CMD command

vep --input_file inputfiles/variants.vcf --output_file vep_results --overlaps --plugin NearestGene,limit=3,max_range=5000 --plugin NearestExonJB,intronic=1 --safe --cache --cache_version 106 --nearest=gene --fasta reference_genome.fa --numbers --total_length --biotype --protein --hgvs --hgvsg --symbol
  1. From Update NearestExonJB VEP_plugins#771 I tested the intronic argument in NearestExonJB plugin. The output is as expected. Thank you for making the changes.
  2. For variants with intergenic_variant as a consequence, can there be a quick fix to add distance to upstream and downstream gene? Perhaps something similar to the fix above?
    If this can be done, then I would use option --distance 100, to minimize upstream_gene_variant and downstream_gene_variant and increase intergenic_variant consequence. Is this a good idea? Considering I am only interested in finding whether they are in intergenic regions and their distance to genes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants