DeepPPF

A deep learning framework for predicting protein family.

Usage

Preprocess the raw dataset by running the preprocessing.py file.
Then, compute the word2vec embedding by running the w2v.py file.
Index the amino acids of the protein sequences by running the train_data_index.py and test_data_index.py files.
Train and test the model by running the deepPPF.py file.
Finetune the model by running the model_transfer.py file.

data.zip contains the GPCR and POG protein sequences.
The COG protein can be downloaded at http://epigenomics.snu.ac.kr/DeepFam/data.zip