Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MechPredict plugin #772

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

ainefairbrother
Copy link
Collaborator

@ainefairbrother ainefairbrother commented Feb 5, 2025

JIRA ticket: ENSVAR-6662

Description

This PR adds the MechPredict plugin, which annotates missense variants with one of predicted gene-level mechanisms:

  • Dominant-negative (DN)
  • Gain-of-function (GOF)
  • Loss-of-function (LOF)

MechPredict does this by reading in gene-level probabilities predicted by an external model and assigning the most likely mechanism based on empircally-derived cut-offs described in the related manuscript. For example, if gene A has the following probability values: DN = 0.2, GOF = 0.3, LOF = 0.9, then the returned interpretation would be "gene_predicted_as_associated_with_loss_of_function_mechanism".

Notes

  • New VEP fields added by plugin
    • MechPredict_pDN: Numeric
    • MechPredict_pGOF: Numeric
    • MechPredict_pLOF: Numeric
    • MechPredict_interpretation: Character
  • The plugin only annotates transcript-variant pairs with missense_variant as the consequence. This is because the methods used by the authors to generate the predictions was optimised to assess missense mutations, the most common protein-altering mutations.
  • The plugin reads in MechPredict_input.tsv which can be generated using instructions in the module's header.
  • There is a known exception found during testing:
    • The 'test with 50 missense variants - should annotate all' test will annotate 49 variants only. I believe this is to do with VEP's most severe consequence functionality - if a variant-transcript pair has >1 consequence, VEP will assign the more severe one.
    • As such, in the case below, start_lost is assigned over missense, and so missense is removed as a consequence and is thus not annotated by MechPredict.

Testing

Test with 50 missense variants - should annotate all

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_missense_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_missense_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

Test with 50 intron variants - should annotate none

# run vep with MechPredict
./vep --input_file /hps/software/users/ensembl/variation/fairbrot/data/test-data/clinvar_20210102_intron_50.vcf.gz \
--output_file /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf \
--format vcf \
--vcf \
--dir_plugins /hps/software/users/ensembl/variation/fairbrot/VEP_plugins \
--plugin MechPredict,file=/nfs/production/flicek/ensembl/variation/data/MechPredict/MechPredict_input.tsv \
--offline \
--cache \
--cache_version 113 \
--dir_cache /nfs/production/flicek/ensembl/variation/data/VEP/tabixconverted \
--assembly GRCh38 \
--fasta /nfs/production/flicek/ensembl/variation/data/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

# check output - are the MechPredict fields included?
cat /hps/software/users/ensembl/variation/fairbrot/MechPredict/MechPredict_test_intron_out.vcf | \
    grep -v "^#" | \
    grep "_mechanism" | 
    wc -l

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant