-
Notifications
You must be signed in to change notification settings - Fork 4
02. The Object Oriented Core of lsaBGC
Rauf Salamzade edited this page May 24, 2022
·
16 revisions
Many of the core functionalities of lsaBGC are provided across three class objects: Pan, GCF, and BGC.
In combination with lsaBGC's open-source licensing, this infrastructure allows users to potentially incorporate the functions into their own Python programs. To demonstrate the ease of usage of this infrastructure here is an excerpt from the program lsaBGC-Refine:
from lsaBGC import util
from lsaBGC.classes.GCF import GCF
# create logging object
log_file = outdir + 'Progress.log'
logObject = util.createLoggerObject(log_file)
# Create GCF object
GCF_Object = GCF(gcf_listing_file, gcf_id=gcf_id, logObject=logObject)
# Step 1: Process GCF listings file
logObject.info("Processing BGC Genbanks from GCF listing file.")
GCF_Object.readInBGCGenbanks(comprehensive_parsing=True)
logObject.info("Successfully parsed BGC Genbanks and associated with unique IDs.")
# Step 2: Parse OrthoFinder Homolog vs Sample Matrix
logObject.info("Starting to parse OrthoFinder homolog vs sample information.")
gene_to_hg, hg_genes, hg_median_copy_count, hg_prop_multi_copy = util.parseOrthoFinderMatrix(orthofinder_matrix_file, GCF_Object.pan_genes)
GCF_Object.inputHomologyInformation(gene_to_hg, hg_genes, hg_median_copy_count, hg_prop_multi_copy)
logObject.info("Successfully parsed homolog matrix.")
# Check whether boundary homolog groups are associated with GCF
try:
assert(first_boundary_homolog in hg_genes.keys())
assert(second_boundary_homolog in hg_genes.keys())
except Exception as e:
logObject.error("Unable to determine one or both boundary homolog groups in set of homolog groups associated with GCF!")
raise RuntimeError("Unable to determine one or both boundary homolog groups in set of homolog groups associated with GCF!")
# Step 3: Refine BGCs based on user specifications.
logObject.info("Beginning refinement of BGCs.")
new_gcf_listing_file = outdir + gcf_id + '.txt'
GCF_Object.refineBGCGenbanks(new_gcf_listing_file, outdir, first_boundary_homolog, second_boundary_homolog)
logObject.info("Refinement of BGC Genbanks complete!")
