Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign object model; round-trip preservation #148

Closed
jbs1 opened this issue Jul 6, 2016 · 1 comment
Closed

Foreign object model; round-trip preservation #148

jbs1 opened this issue Jul 6, 2016 · 1 comment
Assignees

Comments

@jbs1
Copy link

jbs1 commented Jul 6, 2016

migrated from Trac, where originally posted by lars_h on 16-May-2014 4:28pm

The OM2 standard is not very clear about the model for foreign objects. This could make it impossible to preserve the meaning of a piece of data when converting from one OM encoding X to another OM encoding Y and then back to X again. A symptom of this unclarity of model is probably also that the encoding attribute of an OMFOREIGN element is heuristic rather than an exact piece of information.

It is somewhat striking that different chapters of the OM2 standard work within quite different models of data. When it comes to the catch-all part of "everything foreign", these differences become inconsistencies.

In Chapter 2 (OpenMath objects), the model of the universe seems to be that of formal terms: there is an alphabet with symbols such as application', '''attribution''', 'foreign, and so on that are used to build composite objects out of simpler ones, and there is a grammar which objects must obey in order to be OpenMath objects.

In Section 3.1 (The XML Encoding), the model of the universe rather seems to be that of text-with-markup. As far as the OM objects go, text may only appear in the body of certain leaf elements, so we effectively have an encoding of formal terms, but in the body of an OMFOREIGN element one will in general expect both text and markup (especially if it contains things like HTML or XHTML+PresentationMathML).

In Section 3.2 (The Binary Encoding), we have yet another model of the universe, which I'm not quite sure what to call. Its non-foreign parts are again close to formal terms, so that is fine. Foreign data is a sequence of octets (if you're speaking RFC), and character-based data formats (including XML) must be UTF-8 encoded. Here, we thus have a model that foreign data is text.

The round-trip problem is how to have two conversion utilities, one which converts XML-to-binary and the other which converts binary-to-XML, be inverses of each other (modulo irrelevant details such as indentation). It is quite easy (well, at least trivial, in the mathematical sense of the word) to create two such utilities that are injective, but not at all clear how to make one the inverse of the other.

Consider the nonsense example

<OMOBJ>
  <OMATTR>
    <OMATP>
      <OMS cd="altenc" name="LaTeX_encoding"/>
      <OMFOREIGN encoding="text/x-latex">
        That's of course the variable $x$.
        % In PresentationMathML, that would be
        % <mi> x </mi>
      </OMFOREIGN>
    </OMATP>
    <OMV name="x"/>
  </OMATTR>
</OMOBJ>

Though the and tags appear in a LaTeX comment, they are markup as far as the XML document are concerned, rather than the character data that was probably intended; that line should have been

        % &lt;mi&gt; x &lt;/mi&gt;

instead; fine so far. But what happens when we convert to the binary encoding? Will the text

        That's of course the variable $x$.
        % In PresentationMathML, that would be
        % <mi> x </mi>

appear raw? One could argue that it should, because that's the way one would expect to encode that silly piece of LaTeX code if originally encoding it using the binary encoding. So XML character entities should be decoded upon conversion. But then how can one tell character data from markup?

The matter becomes even trickier if one considers a hypothetical non-XML encoding T that still supports text-with-markup data. Should XML markup in foreign material be converted to native markup? Should markup be in XML format in the binary encoding, but then be converted to T markup during binary-to-T conversion?

I suspect these matters weren't considered when the standard was written.

@kohlhase
Copy link
Member

kohlhase commented Oct 2, 2017

moved to OpenMath/OMSTD#16

@kohlhase kohlhase closed this as completed Oct 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants