Step 1:
Make sure you have mamba/conda installed.
Step 2
Check your GPU configuration and setting.
Step 3
# Clone Repository
git clone https://github.com/AmirAsgary/ParseFold-MHC.git
# install
cd ParseFold-MHC
./install.sh
conda activate parsefold_mhc
There are two main ways to use ParseFold so called --mode wrapper
and
--mode modeling
. The default is set on wrapper
mode. In this mode
You can parallelize multiple predictions and run ParseFold in --run parallel
mode
which makes template engineering pipline much faster. In modeling
mode you can
only predict a single structure and the inputs are provided in bash script,
while wrapper
requires a dataframe.
PEPTIDE='NLVPMVATV'
# MHC-I
MHC_SEQ='AGSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGCYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMCAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDA'
HLAALLEL='HLA-B*5301'
DF='data/example/wrapper_input_example.tsv'
# MHC-II
MHC_ALPHA_CHAIN_SEQ='IKEEHVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITNVPPEVTVLTNS'
MHC_BETA_CHAIN_SEQ='DTRPRFLEYSTSECHFFNGTERVRFLDRYFYNQEEYVRFDSDVGEFRAVTELGRPDEEYWNSQKDFLEDRRAAVDTYCRHNYGVGESFTVQRRVH'
HLAALLEL_II='HLA-DRA*01/HLA-DRB1*0101'
python run_parsefold.py \
--mode modeling \
--peptide "$PEPTIDE" \
--mhc_seq "$MHC_SEQ" \
--mhc_type 1 \
--output_dir outputs/basic_mhci \
--predict_anchor
We do not recommend to set anchors and it is better to predict them.
python run_parsefold.py \
--mode modeling \
--peptide "$PEPTIDE" \
--mhc_seq "MHC_ALPHA_CHAIN_SEQ/MHC_BETA_CHAIN_SEQ" \
--mhc_type 2 \
--output_dir outputs/mhcii_manual \
--anchors [2,5,7,9] \
--id "custom_id123"
python run_parsefold.py \
--mode modeling \
--peptide "$PEPTIDE" \
--mhc_fasta mhc_sequence.fasta \
--mhc_type 1 \
--mhc_allele "$HLAALLEL" \
--output_dir outputs/fasta_input \
--predict_anchor
We highly recommed to use a --df path/dataframe
as input and run the Wrapper mode. You can run it in two
different ways --run parallel/single
. The first runs them in parallel depending on your
defined resources in --max_ram , --max_cores
per job. The single mode runs predictions
in a for loop one by one.
You require to make a tab separated dataframe same as below:
id peptide mhc_allele mhc_type anchors
ex1 NLVPMVATV HLA-B*5301 1 1;8
ex2 AAGASSLLL HLA-A*0201 1
ex3 SLLPEPPDAPDAPP HLA-DRB1*04:01 2
ex3 SLLPEPPDAPDAPP HLA-DRB1*04:01 2 2;4;6;9
Empty anchors rows will be predicted.
python run_parsefold.py \
--mode wrapper \
--run single \
--df "$DF" \
--output_dir outputs/wrapper_serial \
--num_templates 4 \
--num_recycles 3
The valie models are model_1_ptm, model_2_ptm, model_3_ptm, model_4_ptm, model_5_ptm, model_1, model_2, model_3, model_4, model_5
from original alphafold params. If you
want to try a fine-tuned model, you could provide its path e.g
--fine_tuned_model_path AFfine/af_params/params_finetune/params/model_ft_mhc_20640.pkl
and its name --models model_2_ptm_ft
. Make sure the name contains _ft
so it is
interpreted as fine-tuned model.
python run_parsefold.py \
--mode wrapper \
--run parallel \
--df "$DF" \
--output_dir outputs/wrapper_parallel \
--max_ram 2 \ # GB per job
--max_cores 16 \ # Total cores to use
--num_templates 5 \
--num_recycles 3 \
--models model_2_ptm model_3_ptm
python run_parsefold.py \
--mode wrapper \
--run parallel \
--df "$DF" \
--output_dir outputs/wrapper_highmem \
--max_ram 2 \
--max_cores 32 \
--num_recycles 6 \
--best_n_templates 4 \
--n_homology_models 2