Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import/export of SBML <notes> #695

Closed
bdelepine opened this issue Apr 17, 2018 · 14 comments
Closed

Import/export of SBML <notes> #695

bdelepine opened this issue Apr 17, 2018 · 14 comments

Comments

@bdelepine
Copy link

Problem description

Hi all,

Do you intend to support import/export of <notes> from/to SBML files?

When a SBML model is parsed with some <notes> under, for instance, a <geneProduct>, those notes are not imported. I would have expected to retrieve the notes with mymodel.genes.mygene.notes, but the attribute is always empty.

I didn't saw in cobra.io.sbml3.parse_xml_into_model and cobra.io.sbml3.annotate_cobra_from_sbml where the <notes> are imported. I believe they are not; which would explain why they are not available in the python object.

Code Sample

Gist to mini_fbc2_notes.xml

import cobra

m = cobra.io.sbml3.read_sbml_model("mini_fbc2_notes.xml")

print([x.notes for x in m.genes])
# [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

print([x.notes for x in m.reactions])
# [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

print([x.notes for x in m.metabolites])
# [{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

Expected Output

print([x.notes for x in m.genes])
# [{"This is an important note for humans."}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

print([x.notes for x in m.reactions])
# [{"This is an important note for humans."}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

print([x.notes for x in m.metabolites])
# [{"This is an important note for humans."}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

Output of cobra.show_versions()

System Information

OS Linux
OS-release 3.10.0-693.21.1.el7.x86_64
Python 3.6.4

Package Versions

pip 9.0.3
setuptools 39.0.1
cobra 0.11.3
future 0.16.0
swiglpk 1.4.4
optlang 1.4.0
ruamel.yaml 0.14.12
pandas 0.22.0
numpy 1.14.2
tabulate 0.8.2
scipy 1.0.0
matplotlib 2.2.2

@cdiener
Copy link
Member

cdiener commented Apr 17, 2018

Hi that is a common request and pop up every few months. Also see the detailed discussion in opencobra/schema#4 . There is some discussion in how notes should be used in SBML. In general I think it would be ok to read it without parsing or modifying the text, so giving it as is. However, if your notes contain anything that is important to annotate your gene/metabolite/reaction this should go into <annotations> and no into <notes>.

@bdelepine
Copy link
Author

@cdiener Thank you for the link; this is exactly what I was looking for. I totally missed #541.

I understand that this discussion should be in opencobra/schema#4.

@cdiener
Copy link
Member

cdiener commented Apr 17, 2018

Either way, simply reading the notes in on our roadmap. @matthiaskoenig is currently working on modernizing our SBML reader 😄

@matthiaskoenig
Copy link
Contributor

The notes reading can be implemented very fast. This is just an additional field to set.
I already implemented an updated parser for SBML
#685
and as soon as I improved the speed a bit more I can implement the notes (not adding additional features at this moment, to get a fair speed comparison).

@bdelepine
Copy link
Author

@matthiaskoenig That would be great!

@Hemant27031999
Copy link
Contributor

Hi all, @matthiaskoenig @draeger @Midnighter @cdiener and other members of cobrapy.

This thread looks best to discuss the implementation for adding the complete support for the notes field in cobrapy as part of NRNB #137.

As of now, the notes field in cobrapy is supported in a minimal way only, relevant for constraint-based models i.e., structured information from notes in the form <p>key: value</p> is read into the Object.notes dictionary when reading the SBML files. On writing the Object.notes, this dictionary is serialized to the SBML notes information in the form

<notes>
   <body>
      <p> key: value </p>
      ....
   </body>
</notes>

Any other content inside notes is left out and hence that information is lost while parsing.

The notes field is a place for storing information intended to be seen by humans only and no machine parseable information should be put here. However, some information like Subsystem, Gene Association, Charge, and Formula, the support of which was not present in SBML earlier, were put in the notes field. Now, when the support for these fields has been added and they are properly defined in SBML, they should be put at their respective places instead of being written again to the notes field. We should respect the "human-only" specification of SBML about the notes field and should follow the SBML specification for this extra information.

Here is what I propose for the notes field structure in cobrapy and in the JSON representation:

  • While reading an SBase object, we must check inside its notes field that whether anything of the form <p>key: value</p> is present inside it or not. If present, we will parse them out and put them at their respective position in cobrapy model, similar to those defined by SBML and new packages, else we will put them in the key-value pair class of annotation and will remove them from the notes string.
  • The remaining notes string will be put as it is as a value to SBase.notesString key in cobrapy and in the JSON too.
  • In order to have the support for adding something to the notes field, we will have the XMLNode tree structure for the notes field, so that if someone wants to add something inside notes, they can follow the tree hierarchy. The XMLNode tree structure of notes will easily be accessed by the SBase.notes field and one can then use it to add any valid XHTML content inside the notes.

This will ensure the complete parsing of notes information from SBML to COBRA model and then to JSON and also the refinement of the notes field if it contains any information in the form of key-value pair, which it should not.

What do you think?

@Midnighter Midnighter reopened this Apr 27, 2020
@Midnighter
Copy link
Member

From my perspective it is desirable to have code that "upgrades" models (by moving information from notes to the correct fields) in libSBML directly. You can already read a level 2 model and save a level 3 one but specific annotation like GPR associations are not handled in the way that you describe. This is not a cobrapy specific problem, though, as it applies to any older SBML document and affects all tools that use SBML.

I'm not sure yet, if it is a good idea to build better notes handling directly into cobrapy. As a starting point, we can also point users that need these capabilities in the right direction. From a cobrapy perspective it could be a plain text field that cobrapy does not touch at all.

@Hemant27031999
Copy link
Contributor

From my perspective it is desirable to have code that "upgrades" models (by moving information from notes to the correct fields) in libSBML directly.

@Midnighter, that sounds more promising. It will be better if this information is parsed by interfacing libraries, like libSBML and JSBML, themselves instead of specific end packages like cobrapy. That way would be more general. But I am not sure if it is possible to cover this implementation in libSBML under this project. Mentors, @draeger @matthiaskoenig, please provide your reviews.

From a cobrapy perspective it could be a plain text field that cobrapy does not touch at all.

Well, I am not very clear about how much addition or manipulation of notes data is required in cobrapy. If no notes data manipulation is required in cobrapy, then plain text seems the best choice of all. We can then simply put it JSON and other formats too. However, if data addition or manipulation in notes is required, it will be a bit complex to deal it with simple strings.
Here is what I think will be best if we need support for handling the internal notes data in cobrapy:

  • We will extract the key-value information out of the notes string and will put it in the form of a dictionary like it is currently done. Warnings will be issued corresponding to the deprecation of their use in the notes field.
  • We will also have the XMLNode tree structure of the notes field, which will be used to add further notes and making other changes to existing notes, and at the time of writing the cobra model to JSON, this XMLNode tree will be converted to the string and will be put as it is in JSON.

Any changes made to the notes dictionary will also be updated in XMLNode tree and vice versa too. This will ensure no information loss of notes data while parsing from one format to other and also easy handling of notes field to add or modify its content in cobra model. And in the future, when libsbml adds support for upgrading the notes field itself by putting key-value pairs at their respective positions, we will simply remove the notes dictionary from the cobra model. Only XMLNode tree will then do the work.

@draeger
Copy link

draeger commented Apr 27, 2020

In the discussion so far we talked about how and where to store information. But you are right, @Hemant27031999 that JSBML, and I assume libSBML as well, contain converters from COBRA notes to FBC. This means, it is neither necessary to implement a parser for the notes again, nor a method to save the key-value pairs to a new place. Just the new key-value pair data structure is most likely not yet supported in any of the two libraries. It is possible that a function for this is being added soon.

However, it is important to note that neither SBML library will just automatically parse out any specific information from the notes because the default assumption is that that is unstructured human-readable text. Cobrapy or any other end-user library will at least need to call the function for making that conversion happen.

About the plain-text in cobrapy's data JSON data structure. This would, indeed, be best. If it comes from SBML, the notes text string might contain some XHTML tags that can just remain as they are. If created in cobrapy, the underlying SBML library should add the required XHTML frame around the text when missing (at least JSBML would do, possibly also libSBML).

@bdelepine
Copy link
Author

Hi all, and thanks @Hemant27031999 for working on this issue,

I agree with what was said by @Midnighter and @draeger, and I will just add a few words as a user.

As a user, I really just want import/export features of SBML notes with the possibility to read/modify the text glob within cobrapy. That's all. No fancy conversion. If there is any parsing, I can do it myself.

Taking back the example from my first post, as of today (v.0.17.1), the notes attributes of gene, reaction and metabolite objects are still empty dict despite the imported SBML having a notes tag (with XHTML text). A plain-text import may not be a very interesting feature to implement... but really, that would be great 😉

@cdiener
Copy link
Member

cdiener commented Apr 29, 2020

I agree with what most said here. Notes should just be read as string without any interpretation or parsing independent of whether they may contain tags or not. The notes parsing we had was to have compatibility with cobra models before SBML version 3. This can stay as legacy support but we should not implement any new notes parsers.

@akaviaLab
Copy link
Contributor

Is this still valid?
I thought that notes are read/written (no parsing), but done.

@Midnighter
Copy link
Member

The PR #988 is still open. Lots of good stuff in there but unfortunately a bit dated by now and hard to review.

@cdiener
Copy link
Member

cdiener commented Nov 4, 2022

Closing because this is specific to notes. New annotations are tracked in the relevant PRs.

@cdiener cdiener closed this as completed Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants