You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
migrated from Trac, where originally posted by lars_h on 16-May-2014 4:28pm
The OM2 standard is not very clear about the model for foreign objects. This could make it impossible to preserve the meaning of a piece of data when converting from one OM encoding X to another OM encoding Y and then back to X again. A symptom of this unclarity of model is probably also that the encoding attribute of an OMFOREIGN element is heuristic rather than an exact piece of information.
It is somewhat striking that different chapters of the OM2 standard work within quite different models of data. When it comes to the catch-all part of "everything foreign", these differences become inconsistencies.
In Chapter 2 (OpenMath objects), the model of the universe seems to be that of formal terms: there is an alphabet with symbols such as application', '''attribution''', 'foreign, and so on that are used to build composite objects out of simpler ones, and there is a grammar which objects must obey in order to be OpenMath objects.
In Section 3.1 (The XML Encoding), the model of the universe rather seems to be that of text-with-markup. As far as the OM objects go, text may only appear in the body of certain leaf elements, so we effectively have an encoding of formal terms, but in the body of an OMFOREIGN element one will in general expect both text and markup (especially if it contains things like HTML or XHTML+PresentationMathML).
In Section 3.2 (The Binary Encoding), we have yet another model of the universe, which I'm not quite sure what to call. Its non-foreign parts are again close to formal terms, so that is fine. Foreign data is a sequence of octets (if you're speaking RFC), and character-based data formats (including XML) must be UTF-8 encoded. Here, we thus have a model that foreign data is text.
The round-trip problem is how to have two conversion utilities, one which converts XML-to-binary and the other which converts binary-to-XML, be inverses of each other (modulo irrelevant details such as indentation). It is quite easy (well, at least trivial, in the mathematical sense of the word) to create two such utilities that are injective, but not at all clear how to make one the inverse of the other.
Consider the nonsense example
<OMOBJ>
<OMATTR>
<OMATP>
<OMS cd="altenc" name="LaTeX_encoding"/>
<OMFOREIGN encoding="text/x-latex">
That's of course the variable $x$.
% In PresentationMathML, that would be
% <mi> x </mi>
</OMFOREIGN>
</OMATP>
<OMV name="x"/>
</OMATTR>
</OMOBJ>
Though the and tags appear in a LaTeX comment, they are markup as far as the XML document are concerned, rather than the character data that was probably intended; that line should have been
% <mi> x </mi>
instead; fine so far. But what happens when we convert to the binary encoding? Will the text
That's of course the variable $x$.
% In PresentationMathML, that would be
% <mi> x </mi>
appear raw? One could argue that it should, because that's the way one would expect to encode that silly piece of LaTeX code if originally encoding it using the binary encoding. So XML character entities should be decoded upon conversion. But then how can one tell character data from markup?
The matter becomes even trickier if one considers a hypothetical non-XML encoding T that still supports text-with-markup data. Should XML markup in foreign material be converted to native markup? Should markup be in XML format in the binary encoding, but then be converted to T markup during binary-to-T conversion?
I suspect these matters weren't considered when the standard was written.
The text was updated successfully, but these errors were encountered:
migrated from Trac, where originally posted by lars_h on 16-May-2014 4:28pm
The OM2 standard is not very clear about the model for foreign objects. This could make it impossible to preserve the meaning of a piece of data when converting from one OM encoding X to another OM encoding Y and then back to X again. A symptom of this unclarity of model is probably also that the encoding attribute of an OMFOREIGN element is heuristic rather than an exact piece of information.
It is somewhat striking that different chapters of the OM2 standard work within quite different models of data. When it comes to the catch-all part of "everything foreign", these differences become inconsistencies.
In Chapter 2 (OpenMath objects), the model of the universe seems to be that of formal terms: there is an alphabet with symbols such as application', '''attribution''', 'foreign, and so on that are used to build composite objects out of simpler ones, and there is a grammar which objects must obey in order to be OpenMath objects.
In Section 3.1 (The XML Encoding), the model of the universe rather seems to be that of text-with-markup. As far as the OM objects go, text may only appear in the body of certain leaf elements, so we effectively have an encoding of formal terms, but in the body of an OMFOREIGN element one will in general expect both text and markup (especially if it contains things like HTML or XHTML+PresentationMathML).
In Section 3.2 (The Binary Encoding), we have yet another model of the universe, which I'm not quite sure what to call. Its non-foreign parts are again close to formal terms, so that is fine. Foreign data is a sequence of octets (if you're speaking RFC), and character-based data formats (including XML) must be UTF-8 encoded. Here, we thus have a model that foreign data is text.
The round-trip problem is how to have two conversion utilities, one which converts XML-to-binary and the other which converts binary-to-XML, be inverses of each other (modulo irrelevant details such as indentation). It is quite easy (well, at least trivial, in the mathematical sense of the word) to create two such utilities that are injective, but not at all clear how to make one the inverse of the other.
Consider the nonsense example
Though the and tags appear in a LaTeX comment, they are markup as far as the XML document are concerned, rather than the character data that was probably intended; that line should have been
instead; fine so far. But what happens when we convert to the binary encoding? Will the text
appear raw? One could argue that it should, because that's the way one would expect to encode that silly piece of LaTeX code if originally encoding it using the binary encoding. So XML character entities should be decoded upon conversion. But then how can one tell character data from markup?
The matter becomes even trickier if one considers a hypothetical non-XML encoding T that still supports text-with-markup data. Should XML markup in foreign material be converted to native markup? Should markup be in XML format in the binary encoding, but then be converted to T markup during binary-to-T conversion?
I suspect these matters weren't considered when the standard was written.
The text was updated successfully, but these errors were encountered: