Implement OWL parsing with iterparse #2

bgyori · 2020-06-15T15:09:56Z

This would be better for some very large OWL files

cthoyt · 2021-11-30T16:50:06Z

Not exactly sure what you had in mind, but parsing the PC12 dump on my laptop takes like 30 minutes and melts my lap, so it would be great :)

import pystow

def ensure_pc_detailed(version: Optional[str], force: bool = False):
    if version is None:
        import bioversions
        
        version = bioversions.get_version("pathwaycommons")

    url = f"https://www.pathwaycommons.org/archives/PC2/v{version}/PathwayCommons{version}.Detailed.BIOPAX.owl.gz"
    path = pystow.ensure("bio", "pathwaycommons", version, url=url)    
    return pybiopax.model_from_owl_gz(path)

cmungall · 2022-07-13T15:40:07Z

One approach here is to load the RDF/XML into a SQLIte database, and then operate over triples in the relational database

There are various ways to do a fast load of RDF/XML into SQLIte. For semantic sql we use rdftab.rs but we have plans to wrap the rust in python (INCATools/semantic-sql#41)

Of course, RDF triples are still quite a low level way of working with a higher level representation like OWL or BioPAX instances, but this can be abstracted in a number of ways, YMMV, e.g. views, sqla code, basic python routines. ...

IMHO I think having a set of sqlite downloads of all available pathway biopax files would be appealing to a lot of people...

cmungall · 2022-07-13T22:45:30Z

As a demonstrator, I put up a version of Reactome here: https://s3.amazonaws.com/bbop-sqlite/reactome-Homo-sapiens.db.gz

I haven't tried PC yet but once you have the initial download, obviously sqlite bypasses the need for any start-up parse, making it quite nice for interactive exploration

You can query things at a (very low-level) RDF level:

sqlite> select subject from rdf_type_statement where object = 'biopax:BiochemicalReaction' limit 5;
reactome.biopax:BiochemicalReaction1
reactome.biopax:BiochemicalReaction10
reactome.biopax:BiochemicalReaction100
reactome.biopax:BiochemicalReaction1000
reactome.biopax:BiochemicalReaction10000

It looks like you are parsing BioPAX as XML rather than OWL so I am not sure if this would be a simple drop-in replacement

https://github.com/indralab/pybiopax/blob/7a90a177a8a08274931b8f9df52f916751ab5e37/pybiopax/biopax/model.py#L109-L123

FWIW, you can even use OAK to treat it as a (strangely behaved) OWL "ontology":

runoak -i sqlite:obo:reactome-Homo-sapiens descendants .desc//p=t biopax:BiochemicalReaction .and t~Calmodulin
reactome.biopax:BiochemicalReaction11326 ! IQGAPs bind F-actin, which is inhibited by calmodulin
reactome.biopax:BiochemicalReaction10284 ! Calmodulin activates Cam-PDE 1
reactome.biopax:BiochemicalReaction10300 ! Inactive catalytic PP2B is activated by the binding of calmodulin
reactome.biopax:BiochemicalReaction1327 ! Calcium binds calmodulin
reactome.biopax:BiochemicalReaction10662 ! Active calmodulin binds CAMK2
reactome.biopax:BiochemicalReaction8747 ! Sepiapterin reductase (SPR) is phosphorylated by Ca2+/calmodulin-dependent protein kinase II
reactome.biopax:BiochemicalReaction6200 ! CaMKK binds activated calmodulin in the nucleus
reactome.biopax:BiochemicalReaction6208 ! CaMKK binds activated calmodulin in the cytosol
reactome.biopax:BiochemicalReaction6204 ! Calmodulin binds CAMK4
reactome.biopax:BiochemicalReaction6210 ! CAMK1 binds calmodulin
reactome.biopax:BiochemicalReaction6196 ! Activated calmodulin binds ADCY1,ADCY8
reactome.biopax:BiochemicalReaction6199 ! Activated calmodulin dissociates from CaMKII-gamma
reactome.biopax:BiochemicalReaction6197 ! Calmodulin-activated adenylate cyclases ADCY1 and ADCY8 generate cAMP
reactome.biopax:BiochemicalReaction6184 ! CaMKII binds activated calmodulin
reactome.biopax:BiochemicalReaction6183 ! Calcium binds calmodulin at the synapse
reactome.biopax:BiochemicalReaction11015 ! S-Farn-Me KRAS4B binds calmodulin
reactome.biopax:BiochemicalReaction11016 ! Calmodulin dissociates KRAS4B from the plasma membrane
reactome.biopax:BiochemicalReaction11291 ! MYLK (MLCK) Active Calmodulin Binding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement OWL parsing with iterparse #2

Implement OWL parsing with iterparse #2

bgyori commented Jun 15, 2020

cthoyt commented Nov 30, 2021 •

edited

Loading

cmungall commented Jul 13, 2022

cmungall commented Jul 13, 2022

Implement OWL parsing with iterparse #2

Implement OWL parsing with iterparse #2

Comments

bgyori commented Jun 15, 2020

cthoyt commented Nov 30, 2021 • edited Loading

cmungall commented Jul 13, 2022

cmungall commented Jul 13, 2022

cthoyt commented Nov 30, 2021 •

edited

Loading