-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import/export of SBML <notes> #695
Comments
Hi that is a common request and pop up every few months. Also see the detailed discussion in opencobra/schema#4 . There is some discussion in how notes should be used in SBML. In general I think it would be ok to read it without parsing or modifying the text, so giving it as is. However, if your notes contain anything that is important to annotate your gene/metabolite/reaction this should go into |
@cdiener Thank you for the link; this is exactly what I was looking for. I totally missed #541. I understand that this discussion should be in opencobra/schema#4. |
Either way, simply reading the notes in on our roadmap. @matthiaskoenig is currently working on modernizing our SBML reader 😄 |
The notes reading can be implemented very fast. This is just an additional field to set. |
@matthiaskoenig That would be great! |
Hi all, @matthiaskoenig @draeger @Midnighter @cdiener and other members of cobrapy. This thread looks best to discuss the implementation for adding the complete support for the notes field in cobrapy as part of NRNB #137. As of now, the notes field in cobrapy is supported in a minimal way only, relevant for constraint-based models i.e., structured information from notes in the form
Any other content inside notes is left out and hence that information is lost while parsing. The notes field is a place for storing information intended to be seen by humans only and no machine parseable information should be put here. However, some information like Subsystem, Gene Association, Charge, and Formula, the support of which was not present in SBML earlier, were put in the notes field. Now, when the support for these fields has been added and they are properly defined in SBML, they should be put at their respective places instead of being written again to the notes field. We should respect the "human-only" specification of SBML about the notes field and should follow the SBML specification for this extra information. Here is what I propose for the notes field structure in cobrapy and in the JSON representation:
This will ensure the complete parsing of notes information from SBML to COBRA model and then to JSON and also the refinement of the notes field if it contains any information in the form of key-value pair, which it should not. What do you think? |
From my perspective it is desirable to have code that "upgrades" models (by moving information from notes to the correct fields) in libSBML directly. You can already read a level 2 model and save a level 3 one but specific annotation like GPR associations are not handled in the way that you describe. This is not a cobrapy specific problem, though, as it applies to any older SBML document and affects all tools that use SBML. I'm not sure yet, if it is a good idea to build better notes handling directly into cobrapy. As a starting point, we can also point users that need these capabilities in the right direction. From a cobrapy perspective it could be a plain text field that cobrapy does not touch at all. |
@Midnighter, that sounds more promising. It will be better if this information is parsed by interfacing libraries, like libSBML and JSBML, themselves instead of specific end packages like cobrapy. That way would be more general. But I am not sure if it is possible to cover this implementation in libSBML under this project. Mentors, @draeger @matthiaskoenig, please provide your reviews.
Well, I am not very clear about how much addition or manipulation of notes data is required in cobrapy. If no notes data manipulation is required in cobrapy, then plain text seems the best choice of all. We can then simply put it JSON and other formats too. However, if data addition or manipulation in notes is required, it will be a bit complex to deal it with simple strings.
Any changes made to the notes dictionary will also be updated in XMLNode tree and vice versa too. This will ensure no information loss of notes data while parsing from one format to other and also easy handling of notes field to add or modify its content in cobra model. And in the future, when libsbml adds support for upgrading the notes field itself by putting key-value pairs at their respective positions, we will simply remove the notes dictionary from the cobra model. Only XMLNode tree will then do the work. |
In the discussion so far we talked about how and where to store information. But you are right, @Hemant27031999 that JSBML, and I assume libSBML as well, contain converters from COBRA notes to FBC. This means, it is neither necessary to implement a parser for the notes again, nor a method to save the key-value pairs to a new place. Just the new key-value pair data structure is most likely not yet supported in any of the two libraries. It is possible that a function for this is being added soon. However, it is important to note that neither SBML library will just automatically parse out any specific information from the notes because the default assumption is that that is unstructured human-readable text. Cobrapy or any other end-user library will at least need to call the function for making that conversion happen. About the plain-text in cobrapy's data JSON data structure. This would, indeed, be best. If it comes from SBML, the notes text string might contain some XHTML tags that can just remain as they are. If created in cobrapy, the underlying SBML library should add the required XHTML frame around the text when missing (at least JSBML would do, possibly also libSBML). |
Hi all, and thanks @Hemant27031999 for working on this issue, I agree with what was said by @Midnighter and @draeger, and I will just add a few words as a user. As a user, I really just want import/export features of SBML notes with the possibility to read/modify the text glob within cobrapy. That's all. No fancy conversion. If there is any parsing, I can do it myself. Taking back the example from my first post, as of today (v.0.17.1), the notes attributes of gene, reaction and metabolite objects are still empty dict despite the imported SBML having a notes tag (with XHTML text). A plain-text import may not be a very interesting feature to implement... but really, that would be great 😉 |
I agree with what most said here. Notes should just be read as string without any interpretation or parsing independent of whether they may contain tags or not. The notes parsing we had was to have compatibility with cobra models before SBML version 3. This can stay as legacy support but we should not implement any new notes parsers. |
Is this still valid? |
The PR #988 is still open. Lots of good stuff in there but unfortunately a bit dated by now and hard to review. |
Closing because this is specific to notes. New annotations are tracked in the relevant PRs. |
Problem description
Hi all,
Do you intend to support import/export of
<notes>
from/to SBML files?When a SBML model is parsed with some
<notes>
under, for instance, a<geneProduct>
, those notes are not imported. I would have expected to retrieve the notes withmymodel.genes.mygene.notes
, but the attribute is always empty.I didn't saw in
cobra.io.sbml3.parse_xml_into_model
andcobra.io.sbml3.annotate_cobra_from_sbml
where the<notes>
are imported. I believe they are not; which would explain why they are not available in the python object.Code Sample
Gist to mini_fbc2_notes.xml
Expected Output
Output of
cobra.show_versions()
System Information
OS Linux
OS-release 3.10.0-693.21.1.el7.x86_64
Python 3.6.4
Package Versions
pip 9.0.3
setuptools 39.0.1
cobra 0.11.3
future 0.16.0
swiglpk 1.4.4
optlang 1.4.0
ruamel.yaml 0.14.12
pandas 0.22.0
numpy 1.14.2
tabulate 0.8.2
scipy 1.0.0
matplotlib 2.2.2
The text was updated successfully, but these errors were encountered: