Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checklist of things for REO prior to importing into go-lego #71

Closed
4 of 7 tasks
cmungall opened this issue Sep 24, 2019 · 21 comments
Closed
4 of 7 tasks

Checklist of things for REO prior to importing into go-lego #71

cmungall opened this issue Sep 24, 2019 · 21 comments
Assignees
Labels
software work here is writing code

Comments

@cmungall
Copy link
Member

cmungall commented Sep 24, 2019

@goodb goodb changed the title Checklist of things for REA prior to importing into go-lego Checklist of things for REO prior to importing into go-lego Sep 25, 2019
@goodb
Copy link
Contributor

goodb commented Sep 25, 2019

@fabregat I wonder if there are any opinions from the Reactome team here? Summing up quickly, we are building an OWL ontology that represents the physical entities used in Reactome. (See #70 for more on why.) This is constructed automatically from a BioPAX export. Do you have any interest in using or contributing to this ontology? Any thoughts on the URIs used or the name?

@goodb
Copy link
Contributor

goodb commented Sep 25, 2019

@dougli1sqrd could you tell me about robot QC requirements?

@goodb
Copy link
Contributor

goodb commented Sep 25, 2019

@deepakunni3 what is the expected pattern for adding biolink categories to classes in an OWL ontology?

@cmungall on that issue, is it time to add biolink categories to all of the other ontologies the group maintains?

@goodb
Copy link
Contributor

goodb commented Sep 25, 2019

@cmungall @balhoff regarding "make it easier to traverse from a REO entity to the relevant uniprot class"

I think we need a generic pattern for encoding what a 'relevant' gene class is in the NEO/go-lego context. 2 ideas to start with:

  1. a property on the relevant gene classes (annotation isCanonicalGene TRUE or subClassOf CanonicalGene or subClassOf CanonicalHumanGene) and then retrieve by running up the class hierarchy until one is found. That would work for REO as it stands as all the entities that could map to uniprot are defined as subclasses of the uniprot genes.
  2. an annotation property like 'canonicalGene' that could be used to link anything to the uniprot (or other default) gene regardless of whether it was a subclass. If we shift to a different architecture for NEO to reduce tbox size, this might be useful. Of course that might also impact the way REO is created in general.

A key first use case for this structure is the generation of uniprot-centric GPAD from GO-CAM models that use Reactome entity ids. See #52 on that.

?

@ukemi
Copy link

ukemi commented Sep 25, 2019

One little nuance here is that only a subset of UniProtKB identifiers are valid as gene stand-ins. Those are the ones that have been blessed as gene-centric representative proteins (GCRP)s. Do the Uniprot identifiers that Reactome maps to always come from this set?

@goodb
Copy link
Contributor

goodb commented Sep 25, 2019

Hi @wdduncan !

@goodb
Copy link
Contributor

goodb commented Sep 26, 2019

@ukemi I'm not sure how to answer that question about GCRPs. If Reactome referenced a non-GCRP record for a protein, should the REO generating process try to figure out what the appropriate GCRP term is and use that instead of what Reactome asserted? That makes me a little uncomfortable - seems like it ought to be up the Reactome curators to make that decision.

goodb pushed a commit that referenced this issue Sep 26, 2019
added an annotation property linking from reactome class to default
canonical uniprot, chebi, or ensembl record(s).  (s) when its a set.
goodb pushed a commit to geneontology/minerva that referenced this issue Sep 27, 2019
This work is in reference to geneontology/pathways2GO#71

This allows entity ontologies aside from neo to be used to construct go-cams while maintaining GPAD outputs that adhere strictly to canonical terminologies such as UniProt for human genes.  It works by adding the annotation property  http://geneontology.org/lego/canonical_record to link new terms (e.g. reactome entities) to canonical terms (e.g. corresponding uniprots).  When these annotations are present, the GPAD SPARQL export process begins by converting the model to one with all of the external types replaced by canonical types.  The rest of the gpad export process is then unchanged.
@goodb
Copy link
Contributor

goodb commented Sep 30, 2019

@cmungall could you elaborate on "rdfs:labels should be uniquified in the same manner used for the rest of NEO" ?

Shame REO is taken. It appears that the owners have abandoned it? http://www.ontobee.org/ontology/REO

I like REACTO - very xmen.

@goodb
Copy link
Contributor

goodb commented Sep 30, 2019

@cmungall @balhoff @kltm based on lack of response, I think it is up to us to decide on a URI pattern for the ontology currently known as REO. I would be excited if we could support Linked Data style URIs - e.g. resolving to RDF. Though potentially independent entities, the same thought pertains to the go-cam models themselves (including the URIs used within them). go-cams are a great application of the semantic web approach and I would like to see us follow through and deliver them in ways that encourage others to use the RDF/OWL versions of the content as much as possible. Providing Linked Data helps keep things moving in that direction.

OBO library ontologies have an established pattern for URIs. Should we apply something similar for RDF products generated by the group that are not in OBO? How does this relate to rdf.geneontology.org ? Is anyone thinking about this?

@deustp01
Copy link
Collaborator

deustp01 commented Oct 9, 2019

@ukemi I'm not sure how to answer that question about GCRPs. If Reactome referenced a non-GCRP record for a protein, should the REO generating process try to figure out what the appropriate GCRP term is and use that instead of what Reactome asserted? That makes me a little uncomfortable - seems like it ought to be up the Reactome curators to make that decision.

Assuming that all SwissProt instances are GCRP and all TrEMBL ones are not, Reactome curators should always use a SwissProt entry in preference to its TrEMBL counterparts and UniProt is organized to make this easy. So, any non-GCRP usage should either be for instances where SwissProt curators have not yet assembled a definitive SwissProt record from the available TrEMBL records or a mistake. I think our QA catches cases where we curated something before a SwissProt record was available and one has been created since.

So a list of non-GCRP usages found in the Reactome import should be short and will be another instance of useful QA provided by the GO-CAM import process.

@goodb
Copy link
Contributor

goodb commented Jan 28, 2020

Going with REACTO with URL
http://purl.obolibrary.org/obo/go/extensions/reacto.owl

supported by physical URLs like:
http://current.geneontology.org/ontology/reacto/reacto.owl
http://release.geneontology.org/2020-01-01/ontology/reacto/reacto.owl

with internal URIs following the GO pattern
http://purl.obolibrary.org/obo/REACTO_R-HSA-72571

Will try to work the create and update process into the standard ontology build makefile.

(unless @balhoff @kltm or @cmungall would like something else).

@balhoff
Copy link
Member

balhoff commented Jan 28, 2020

My only issue here is that the term URIs will not resolve without registering for an OBO library prefix.

@goodb
Copy link
Contributor

goodb commented Jan 28, 2020

I don't see this being a full on OBO ontology. Is there another mechanism?
Otherwise we could make them like http://purl.obolibrary.org/obo/go/extensions/reacto.owl#REACTO_R-HSA-72571

such that they just resolve to the entire ontology.

@kltm
Copy link
Member

kltm commented Jan 28, 2020

@goodb The Makefile is fine for me--we can easily test it on a branch of the pipeline. If another build process starts to look more promising, let me know and we can work out the best way to test it out.

@balhoff
Copy link
Member

balhoff commented Jan 28, 2020

Otherwise we could make them like http://purl.obolibrary.org/obo/go/extensions/reacto.owl#REACTO_R-HSA-72571

I think this might be the way to go, unless someone has a better idea. @cmungall?

@goodb
Copy link
Contributor

goodb commented Jan 28, 2020

@kltm where should I put the models - assuming that their creation is part of the build process here. ??

@kltm
Copy link
Member

kltm commented Jan 28, 2020

@goodb That will take a little workshopping; I think that the most likely spot would be skyhook for the time being. I'm not sure that they are ready for release. We can take a little time at how that looks in the pipeline when we need it.

@goodb
Copy link
Contributor

goodb commented Jan 28, 2020

@kltm when you say 'skyhook' I have no idea what you are talking about...
Okay. I will try to get the ontology part of this working now as it is an immediate, blocking requirement holding up a push of the latest batch of models into noctua-dev. We can return to the pipelining of the models themselves in the future.

@kltm
Copy link
Member

kltm commented Jan 28, 2020

This will part of your training that we'll do to get you up to speed on the pipeline.
For now, it is a public intermediate store for pipeline products.
http://skyhook.berkeleybop.org

goodb added a commit that referenced this issue Jan 28, 2020
@goodb
Copy link
Contributor

goodb commented Feb 20, 2020

Noting here that model validation via shex and owl work a little differently for the reactome models than for the rest because of the use of reacto. Other models currently gather the upper level types for genes and other entities by request to a golr server. These requests do not work for reactome as reacto entities are not present in that server. This is okay for the moment because the reacto ontology contains the required information linking each entity to an upper level class (just as neo did previously for everything else.).

@goodb
Copy link
Contributor

goodb commented Apr 18, 2020

I think for the purposes its going to be used for, REO (not reacto) is done for now.

@goodb goodb closed this as completed Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
software work here is writing code
Projects
None yet
Development

No branches or pull requests

6 participants