BioDolphin

Automatically curate new entries from PDB

Set up the conda environment

cd BioDolphin
conda env create -f environment.yaml
conda activate biodolphin_env
pip install biopython

Make sure the files are placed correctly

Place the previous version of the dataset (txt file. For example: BioDolphin_vr1.0.txt) in /data.
Make sure the lipid annotation file (lipid_annotations.txt) is in /data.

Run the code

Generate a report file

python main.py --report
This will generate a report file to see what pdbs are missing for each lipid.
Note that some of them may contain pdbs that don't have proteins, and thus they will not be currated.
The file generated will have names such as: Report_MissingEntry_2024-09-06.txt with the current date at the end

Step1 for main.py

python main.py -d BioDolphin_vr1.0.txt -l lipid_annotations.txt -o BioDolphin_vr1.1 -r Report_MissingEntry_2024-09-06.txt --step1 \

-d: BioDolphin_vr1.0.txt (This is the current full BioDolphin dataset)
-l: lipid_annotations.txt (The lipid annotation files that maps lipid CCD to its annotation)
-o: BioDolphin_vr1.1 (This is the name of the next version of BioDolphin dataset)
-r: Report_MissingEntry_2024-09-06.txt (This is the Report file generated from the previous step) \

After this step, fasta files will be generated and we will need to manually upload them onto the DeepLoc webserver to get protein subcellular location predictions.
Follow the instructions of the output on the terminal to prepare for the next step.

Get residue numbers

source run_resnum.sh
This will submit 9 slurm scripts (run_resnum.slurm), and each slurm script will run parse_resnum.py

Step2 for main.py

python main.py -o BioDolphin_vr1.1 --step2 \

This step will combine the DeepLoc results and residue numbers to the dataset and produce the final updated dataset in /result
Current number of data is: 107849

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
README.md		README.md
affinity.py		affinity.py
combine.py		combine.py
get_format.py		get_format.py
get_lipidAnnotation.py		get_lipidAnnotation.py
get_newEntries.py		get_newEntries.py
get_proteinAnnotation.py		get_proteinAnnotation.py
main.py		main.py
parse_resnum.py		parse_resnum.py
run_resnum.sh		run_resnum.sh
run_resnum.slurm		run_resnum.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioDolphin

Set up the conda environment

Make sure the files are placed correctly

Run the code

Generate a report file

Step1 for main.py

Get residue numbers

Step2 for main.py

About

Releases

Packages

Languages

zoeyxyang/BioDolphin

Folders and files

Latest commit

History

Repository files navigation

BioDolphin

Set up the conda environment

Make sure the files are placed correctly

Run the code

Generate a report file

Step1 for main.py

Get residue numbers

Step2 for main.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages