Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mathtype ole xml to mathml #8

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 1 addition & 16 deletions README
Original file line number Diff line number Diff line change
@@ -1,16 +1 @@
This is a library to parse MathType's MTEF binary format. Currently doesn't do very much at all.

My intention is that it should read the MTEF stream and, as a bare minimum, output an XML serialisation of it with each
MathType record mapping fairly directly to an XML element containing subrecords as subelements, and with record options
represented by attributes. As an extension, it could make the XML a bit nicer (e.g. instead of outputting a TMPL element
for the TMPL record, it could output a specific element based on the template selector).

Ultimately, I'd like to be able to convert a binary MTEF stream into presentational MathML. My thinking is that once
the binary MTEF is available in some kind of XML representation, it turns into a question of XSLT which personally I
find more tractable (and it can be tested with XSpec tests).

At the moment, this is my main project for working on in my spare time and on the train to / from where I work.
Unfortunately, I don't have a lot of spare time currently! All contributions and suggestions are welcome, but
my ability to respond is a bit limited at the moment.

My intention is to put this under the Apache License, version 2.0, unless anyone has any serious objections.
Moving forward the work done by Daniel Rendall towards the MathML target ...
5 changes: 5 additions & 0 deletions code/output/expected-MathML/math1.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<math>
<mi>x</mi>
<mo>=</mo>
<mi>α</mi>
</math>
9 changes: 9 additions & 0 deletions code/output/expected-MathML/math2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<math>
<mi>a</mi><msup><mi>x</mi><mn>2</mn></msup>
<mo>+</mo>
<mi>b</mi><mi>x</mi>
<mo>+</mo>
<mi>c</mi>
<mo>=</mo>
<mn>0</mn>
</math>
8 changes: 8 additions & 0 deletions code/output/expected-MathML/math3.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<math>
<mi>x</mi>
<mo>=</mo>
<mfrac>
<mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow>
<mrow><mi>c</mi><mo>+</mo><mi>d</mi></mrow>
</mfrac>
</math>
7 changes: 7 additions & 0 deletions code/output/expected-MathML/math4.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<math>
<msup><mi>x</mi><mn>2</mn></msup>
<mo>=</mo>
<msubsup><mi>y</mi><mn>2</mn><mn>3</mn></msubsup>
<mo>=</mo>
<mmultiscripts><mi>z</mi><mprescripts/><none/><mn>2</mn></mmultiscripts>
</math>
12 changes: 12 additions & 0 deletions code/output/expected-MathML/math5.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<math>
<mi>x</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mtable>
<mtr><mtd><mi>a</mi></mtd><mtd><mi>b</mi></mtd><mtd><mi>c</mi></mtd></mtr>
<mtr><mtd><mi>d</mi></mtd><mtd><mi>e</mi></mtd><mtd><mi>f</mi></mtd></mtr>
</mtable>
<mo>)</mo>
</mrow>
</math>
10 changes: 10 additions & 0 deletions code/output/expected-MathML/math6.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
<math>
<mi>x</mi>
<mo>=</mo>
<munderover>
<mi>∑</mi>
<mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
<mi>N</mi>
</munderover>
<msub><mi>a</mi><mi>i</mi></msub>
</math>
15 changes: 15 additions & 0 deletions code/output/expected-MathML/math7.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<math>
<mi>x</mi>
<mo>=</mo>
<msup>
<mrow>
<mo>[</mo>
<mfrac>
<mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow>
<mrow><mi>c</mi><mo>+</mo><mi>d</mi></mrow>
</mfrac>
<mo>]</mo>
</mrow>
<mn>2</mn>
</msup>
</math>
25 changes: 14 additions & 11 deletions code/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,14 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>uk.co.danielrendall</groupId>
<artifactId>metaphor</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<packaging>jar</packaging>
<name>metaphor</name>
<description>A parser for MathType's MTEF binary format</description>
<url>http://maven.apache.org</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand All @@ -45,7 +37,6 @@
</dependency>

<!-- Logging -->

<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
Expand All @@ -57,7 +48,19 @@
<artifactId>jcl-over-slf4j</artifactId>
<version>1.6.2</version>
</dependency>


<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.2</version>
</dependency>

<dependency>
<groupId>net.sf.saxon</groupId>
<artifactId>saxon</artifactId>
<version>8.7</version>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
Expand Down
93 changes: 78 additions & 15 deletions code/src/main/java/uk/co/danielrendall/metaphor/App.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,86 @@

package uk.co.danielrendall.metaphor;

import uk.co.danielrendall.metaphor.parsers.MTEFParser;
import uk.co.danielrendall.metaphor.records.MTEF;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PushbackInputStream;
import java.net.URISyntaxException;
import java.util.Map;
import java.util.Properties;

/**
* Hello world!
*
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;

import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.CommandLineParser;
import org.apache.commons.cli.HelpFormatter;
import org.apache.commons.cli.Options;
import org.apache.commons.cli.PosixParser;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.google.common.io.Files;

/*
* Entry Point to this Library
* @author Daniel Rendall
* @author Murugan Natarajan
*/
public class App
{
public static MTEF parse(InputStream is) throws ParseException, IOException {
PushbackInputStream pis = new PushbackInputStream(is);
// hack - throw away 28 byte OLE header
pis.read(new byte[28]);
return new MTEFParser().parse(pis);
public class App{
private static final Logger log = LoggerFactory.getLogger(App.class);
private Properties config = new Properties();
private File inputFile = null;
private File outputFile = null;

private App() throws IOException, TransformerConfigurationException{
this.config.load(App.class.getResourceAsStream("/cmdln/config.properties"));
}

private boolean doWithArgs(String[] args) throws ParseException, IOException, TransformerException, URISyntaxException{
Options options = new Options();
HelpFormatter formatter = new HelpFormatter();
options.addOption(this.config.getProperty("mathMl.commandLine.input.option"), true, this.config.getProperty("mathMl.commandLine.input.desc"));
options.addOption(this.config.getProperty("mathMl.commandLine.output.option"), true, this.config.getProperty("mathMl.commandLine.output.desc"));
CommandLineParser parser = new PosixParser();
CommandLine cmd = null;
try {
cmd = parser.parse(options, args);
} catch (org.apache.commons.cli.ParseException e) {
log.error(this.config.getProperty("mathMl.commandLine.invalid"));
e.printStackTrace();
}
if (cmd == null || cmd != null && !cmd.hasOption(this.config.getProperty("mathMl.commandLine.input.option"))){
formatter.printHelp(this.config.getProperty("mathMl.commandLine.help.desc"), options);
return false;
}
String inputPath = cmd.getOptionValue(this.config.getProperty("mathMl.commandLine.input.option"));
String outputPath = this.config.getProperty("mathMl.commandLine.output.default");
if (cmd.hasOption(this.config.getProperty("mathMl.commandLine.output.option"))){
outputPath = cmd.getOptionValue(this.config.getProperty("mathMl.commandLine.output.option"));
}
File inputFile = new File(inputPath);
File outputFile = new File(outputPath);
if(!inputFile.exists()){
log.error(this.config.getProperty("mathMl.commandLine.input.invalid"));
return false;
}
this.inputFile = inputFile;
this.outputFile = outputFile;
return true;
}

public static void main(String[] args)throws org.apache.commons.cli.ParseException, ParseException, IOException, TransformerException, URISyntaxException{
App obj = new App();
if(obj.doWithArgs(args)){
FilesExplorer explorer = new FilesExplorer();
BinFileHandler bin = new BinFileHandler();
MMLConverter converter = new MMLConverter();
explorer.setOutputExtention(".mml");
explorer.setSearchPattern("^([^.]*)\\.bin$");
explorer.collectFilePairs(obj.inputFile, obj.outputFile);
for (Map.Entry<File, File> pair : explorer.getFilePairs().entrySet()){
Files.write(converter.covertBin(bin.parseRequiredContent(Files.toByteArray(pair.getKey()))), pair.getValue());
}
log.info(obj.config.getProperty("mathMl.conversion.success"));
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package uk.co.danielrendall.metaphor;

import java.io.IOException;
import java.util.Arrays;

/*
* MTEF Bin parser to extract required and valid content from it
* @author Murugan Natarajan
*/
public class BinFileHandler {
public byte[] parseRequiredContent(byte[] bytes) throws ParseException, IOException {
// To identify and throw away the unwanted contents until MathType OLE Header position for higher version greater than MathType5
Integer[] startKeys = new Integer[]{28, 0, 0, 0, 2, 0}; // Header Starting 6 byte values
int startIndex = identifyKeys(startKeys, bytes);
if(startIndex == -1){
throw new ParseException("Expected Header Start Value \"1c00 0000 0200\" Not Found with this file.");
}
bytes = Arrays.copyOfRange(bytes, startIndex, bytes.length);

// To throw away 28 bytes of MathType OLE Header
if(28 >= bytes.length){
throw new ParseException("Expected Header Start Value \"1c00 0000 0200\" Not Found with this file.");
}
bytes = Arrays.copyOfRange(bytes, 28, bytes.length);

// To identify and throw away the unwanted contents within the body of the equation for higher version greater than MathType5
Integer[] unwatedBodyKeys = new Integer[]{69, 0, 113, 0, 117, 0}; // Unwanted Body Content Starting 6 byte values
int unwantedBodyIndex = identifyKeys(unwatedBodyKeys, bytes);
if(unwantedBodyIndex >= 0 && unwantedBodyIndex + 512 <= bytes.length){
byte[] frontBytes = Arrays.copyOfRange(bytes, 0, unwantedBodyIndex);
byte[] backBytes = Arrays.copyOfRange(bytes, unwantedBodyIndex + 512, bytes.length);
bytes = new byte[frontBytes.length + backBytes.length];
System.arraycopy(frontBytes, 0, bytes, 0, frontBytes.length);
System.arraycopy(backBytes, 0, bytes, frontBytes.length, backBytes.length);
}
return bytes;
}

private int identifyKeys(Integer[] keys, byte[] bytes) throws IOException{
int j = 0;
for(int i = 0; i<keys.length && j<bytes.length; j++){
if(keys[i] != (int)bytes[j]){
i = 0;
}else{
i++;
}
}
if(j < bytes.length){
return j - keys.length;
}
return -1;
}
}
Loading