-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement OWL parsing with iterparse #2
Comments
Not exactly sure what you had in mind, but parsing the PC12 dump on my laptop takes like 30 minutes and melts my lap, so it would be great :) import pystow
def ensure_pc_detailed(version: Optional[str], force: bool = False):
if version is None:
import bioversions
version = bioversions.get_version("pathwaycommons")
url = f"https://www.pathwaycommons.org/archives/PC2/v{version}/PathwayCommons{version}.Detailed.BIOPAX.owl.gz"
path = pystow.ensure("bio", "pathwaycommons", version, url=url)
return pybiopax.model_from_owl_gz(path) |
One approach here is to load the RDF/XML into a SQLIte database, and then operate over triples in the relational database There are various ways to do a fast load of RDF/XML into SQLIte. For semantic sql we use rdftab.rs but we have plans to wrap the rust in python (INCATools/semantic-sql#41) Of course, RDF triples are still quite a low level way of working with a higher level representation like OWL or BioPAX instances, but this can be abstracted in a number of ways, YMMV, e.g. views, sqla code, basic python routines. ... IMHO I think having a set of sqlite downloads of all available pathway biopax files would be appealing to a lot of people... |
As a demonstrator, I put up a version of Reactome here: https://s3.amazonaws.com/bbop-sqlite/reactome-Homo-sapiens.db.gz I haven't tried PC yet but once you have the initial download, obviously sqlite bypasses the need for any start-up parse, making it quite nice for interactive exploration You can query things at a (very low-level) RDF level: sqlite> select subject from rdf_type_statement where object = 'biopax:BiochemicalReaction' limit 5;
reactome.biopax:BiochemicalReaction1
reactome.biopax:BiochemicalReaction10
reactome.biopax:BiochemicalReaction100
reactome.biopax:BiochemicalReaction1000
reactome.biopax:BiochemicalReaction10000 It looks like you are parsing BioPAX as XML rather than OWL so I am not sure if this would be a simple drop-in replacement FWIW, you can even use OAK to treat it as a (strangely behaved) OWL "ontology": runoak -i sqlite:obo:reactome-Homo-sapiens descendants .desc//p=t biopax:BiochemicalReaction .and t~Calmodulin
reactome.biopax:BiochemicalReaction11326 ! IQGAPs bind F-actin, which is inhibited by calmodulin
reactome.biopax:BiochemicalReaction10284 ! Calmodulin activates Cam-PDE 1
reactome.biopax:BiochemicalReaction10300 ! Inactive catalytic PP2B is activated by the binding of calmodulin
reactome.biopax:BiochemicalReaction1327 ! Calcium binds calmodulin
reactome.biopax:BiochemicalReaction10662 ! Active calmodulin binds CAMK2
reactome.biopax:BiochemicalReaction8747 ! Sepiapterin reductase (SPR) is phosphorylated by Ca2+/calmodulin-dependent protein kinase II
reactome.biopax:BiochemicalReaction6200 ! CaMKK binds activated calmodulin in the nucleus
reactome.biopax:BiochemicalReaction6208 ! CaMKK binds activated calmodulin in the cytosol
reactome.biopax:BiochemicalReaction6204 ! Calmodulin binds CAMK4
reactome.biopax:BiochemicalReaction6210 ! CAMK1 binds calmodulin
reactome.biopax:BiochemicalReaction6196 ! Activated calmodulin binds ADCY1,ADCY8
reactome.biopax:BiochemicalReaction6199 ! Activated calmodulin dissociates from CaMKII-gamma
reactome.biopax:BiochemicalReaction6197 ! Calmodulin-activated adenylate cyclases ADCY1 and ADCY8 generate cAMP
reactome.biopax:BiochemicalReaction6184 ! CaMKII binds activated calmodulin
reactome.biopax:BiochemicalReaction6183 ! Calcium binds calmodulin at the synapse
reactome.biopax:BiochemicalReaction11015 ! S-Farn-Me KRAS4B binds calmodulin
reactome.biopax:BiochemicalReaction11016 ! Calmodulin dissociates KRAS4B from the plasma membrane
reactome.biopax:BiochemicalReaction11291 ! MYLK (MLCK) Active Calmodulin Binding |
This would be better for some very large OWL files
The text was updated successfully, but these errors were encountered: