-
Notifications
You must be signed in to change notification settings - Fork 4
06. Refinement of BGCs Belonging to GCF
Boundary prediction of BGCs by AntiSMASH is often an approximation and can conjoin multiple discrete BGCs if they are located nearby each other on a genome. This issue was highlighted in a recent study which explored the BGC diversity of the genus Bacillus by the Kovacs lab.
We thus developed lsaBGC-Refiner.py
to allow users to automatically curate BGCs belonging to a GCF and retain only those found in between two user specified homolog groups.
As an example, one of the major BGCs we found in M. luteus genomes was predicted to encode for a terpene related metabolite. Netzer et al. 2010 functionally characterized this BGC and identified the key genes and their role in generating the terpene carotenoid sarcinaxanthin. Using it as a reference we whittled the raw BGC predictions by AntiSMASH to produce refined BGC genbanks and visualized the results pre/post refinement with lsaBGC-See.py
.
usage: lsaBGC-Refiner.py [-h] -g GCF_LISTING -m ORTHOFINDER_MATRIX [-i GCF_ID] -o OUTPUT_DIRECTORY [-p BGC_PREDICTION_SOFTWARE] -b1 FIRST_BOUNDARY_HOMOLOG -b2 SECOND_BOUNDARY_HOMOLOG
Program: lsaBGC-Refiner.py
Author: Rauf Salamzade
Affiliation: Kalan Lab, UW Madison, Department of Medical Microbiology and Immunology
This program will take in a list of homologous (ideally orthologous) BGC genbanks belonging to a single GCF and
whittle them down to include only annotations/features in between user specified homolog groups. It is particularly
useful for curation of GCFs which featuere distinct BGCs aggregated together due to close physical proximity as
described in: https://msystems.asm.org/content/6/2/e00057-21/article-info
optional arguments:
-h, --help show this help message and exit
-g GCF_LISTING, --gcf_listing GCF_LISTING
BGC listings file for a gcf. Tab delimited: 1st column lists sample name while the 2nd column is the path to a BGC prediction in Genbank format.
-m ORTHOFINDER_MATRIX, --orthofinder_matrix ORTHOFINDER_MATRIX
OrthoFinder homolog by sample matrix.
-i GCF_ID, --gcf_id GCF_ID
GCF identifier.
-o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
Output directory.
-p BGC_PREDICTION_SOFTWARE, --bgc_prediction_software BGC_PREDICTION_SOFTWARE
Software used to predict BGCs (Options: antiSMASH, DeepBGC, GECCO).
Default is antiSMASH.
-b1 FIRST_BOUNDARY_HOMOLOG, --first_boundary_homolog FIRST_BOUNDARY_HOMOLOG
Identifier for the first homolog group to be used as boundary for pruning BGCs..
-b2 SECOND_BOUNDARY_HOMOLOG, --second_boundary_homolog SECOND_BOUNDARY_HOMOLOG
Identifier for the second homolog group to be used as boundary for pruning BGCs.