A deep learning framework for predicting protein family.
- Preprocess the raw dataset by running the preprocessing.py file.
- Then, compute the word2vec embedding by running the w2v.py file.
- Index the amino acids of the protein sequences by running the train_data_index.py and test_data_index.py files.
- Train and test the model by running the deepPPF.py file.
- Finetune the model by running the model_transfer.py file.
- data.zip contains the GPCR and POG protein sequences.
- The COG protein can be downloaded at http://epigenomics.snu.ac.kr/DeepFam/data.zip