Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some browsers do not support <?xml version="1.1"?> tag #467

Open
kimtuck opened this issue Jan 20, 2016 · 14 comments
Open

Some browsers do not support <?xml version="1.1"?> tag #467

kimtuck opened this issue Jan 20, 2016 · 14 comments

Comments

@kimtuck
Copy link

kimtuck commented Jan 20, 2016

I have some books which contain xml tags specified like this:

<?xml version="1.1"?>

At the present, I'm getting around this by patching the code to replace "1.1" with "1.0". I think eventually we'll deal with this at the time we acquire the books from the publishers, either by asking them to package them with strictly "1.0", or by replacing the version number before we process the books and put them online.

So, I'm not sure whether you want to patch readium to make this change, or leave it as an issue for content providers. If you're leaving it to content providers, you may want to trap the error and log it to the console, so that users will better understand what is going on. I think that, without some sort of patch, the code simply fails without any diagnostics.

@da70
Copy link
Contributor

da70 commented Jan 29, 2016

We ran into this issue too, specifically with Firefox, which was not loading certain books.
We ran the EpubCheck validator on our collection, which returned this error for the books in question (file info snipped):
ERROR(HTM-001): ... : Any publication resource that is an XML-based media type must be a valid XML 1.0 document. XML version found: 1.1.

In our case the books were EPUB version 2. According to OPF 2.0 spec: "An XML Document is a complete and valid XML document as defined in XML 1.0 (Fourth Edition) (http://www.w3.org/TR/2006/REC-xml-20060816/)."

We didn't have the option of going back to the vendor to have the EPUBs corrected, so we did it ourselves. After clearing this error the books loaded.
I've asked the department that does the outsourcing for the books that going forward they require vendor delivered EPUBs to pass EpubCheck without errors before we declare them acceptable.

When I was debugging, I did see console errors that hinted at the problem. Shortly before all console messages stopped, Firefox reported "XML declaration not well-formed" for what I believe was the OPF file. I agree Readium could enhance this error message. In fact, it seemed to fail to process the exception. The final console error was "TypeError: e is undefined" in a file I can't identify because I am looking at a screenshot of the console output, but it is definitely a readium file. Note that we are using a Sep 2, 2015 version of master, which is far behind HEAD now.

@kimtuck
Copy link
Author

kimtuck commented Feb 1, 2016

What I ended up doing was patching the zipReader -- at the point where the file contents have been extracted from the zip file, replace any 1.1 declarations with 1.0 declarations. We will be asking our publishers to send us files with the declaration as 1.0; but a number of my test files were using 1.1 and this was the quickest way to get past this particular issue. If you're interested, I can post the code that I modified.

@da70
Copy link
Contributor

da70 commented Feb 1, 2016

Yes I would like to see the code, just to learn more about Readium. We don't actually need it, though, because we fixed the declarations in our EPUBs. Maybe this is an option for you as well? That way you won't need to deal with patches that would be overwritten or even possibly invalidated when updating Readium from upstream.
It turned out to be very easy. First unzip, then edit, then re-package. Here's what I wrote in our internal wiki about re-packaging:


Zip 3.0 by Info-ZIP is available on Mac OS X on the command-line.

Following instructions from Epub Format Construction Guide - HXA7241 - 2007:

    zip -X0 EpubGuide-hxa7241.epub mimetype
    zip -Xur9D EpubGuide-hxa7241.epub *

The flags being used (see http://www.info-zip.org/mans/zip.html):

  • -9: maximum compression
    
  • -D | --no-dir-entries: do not create entries in the zip archive for directories.
    
  • -u | --update: replace (update) an existing entry in the zip archive if it had been modified more recently.
    
  • -r | --recurse-paths: travel the directory structure recursively.
    
  • -X | --no-extra: do not save extra file attributes.
    

We tested the correctness of this method in various ways:

  • Round-trip zipping and unzipping an exploded EPUB from our website and checking that files/directories are binary identical.
    
  • Opening the resulting EPUB file in Adobe Digital Editions and testing page turning, font size change, TOC, bookmarking, and full-screen mode.
    
  • Validating EPUB file using [epubcheck](https://github.com/IDPF/epubcheck).  No errors were introduced by the zipping process.
    

So that's an option you might want to consider.
Please do feel free to post the code though, as I would be curious to see it.

@danielweck
Copy link
Member

Thank you for this bug report. I am able to reproduce on Windows 10 with Edge, Internet Explorer, Firefox (only Chrome works with version="1.1" in package.opf).

@danielweck
Copy link
Member

Fixed in the feature/DOMParser branch(es), here are the pull requests (all ReadiumJS repositories are affected):

readium/readium-cfi-js#45
readium/readium-shared-js#252
readium/readium-js#134
#483

@kimtuck
Copy link
Author

kimtuck commented Feb 8, 2016

Do you need me to verify the fix, or are you just publishing the test cases in case anyone is interested?

From: Daniel Weck [mailto:[email protected]]
Sent: Monday, February 08, 2016 2:38 PM
To: readium/readium-js-viewer [email protected]
Cc: Robert Hanson [email protected]
Subject: Re: [readium-js-viewer] Some browsers do not support tag (#467)

Live demonstration of the XML v1.1 fix:
http://readium-html.surge.sh/?epub=https%3A%2F%2Fcdn.rawgit.com%2Freadium%2Freadium-js-viewer%2Ffeature%2FDOMParser%2Fepub_content%2Faccessible_epub_3&goto=%7B%22idref%22%3A%22id-id2442754%22%7D

OPF package:
https://github.com/readium/readium-js-viewer/blob/feature/DOMParser/epub_content/accessible_epub_3/EPUB/package.opf#L1

EPUB3 Navigation Document (TOC):
https://github.com/readium/readium-js-viewer/blob/feature/DOMParser/epub_content/accessible_epub_3/EPUB/bk01-toc.xhtml#L1

EPUB3 XHTML Content Documents:
https://github.com/readium/readium-js-viewer/blob/feature/DOMParser/epub_content/accessible_epub_3/EPUB/index.xhtml#L1
https://github.com/readium/readium-js-viewer/blob/feature/DOMParser/epub_content/accessible_epub_3/EPUB/ch01.xhtml#L1


Reply to this email directly or view it on GitHubhttps://github.com//issues/467#issuecomment-181524020.

@danielweck
Copy link
Member

I would appreciate if you could test the "live demo" link in your browsers, thanks! :)
Make sure to open the TOC to see if it all works fine there too.

@matwood
Copy link

matwood commented Sep 20, 2016

We just ran into this same issue. Is the plan to merge this as a permanent work around or should I fix it on our own readium branch?

@danielweck
Copy link
Member

The feature/DOMParser branches (see comment above: #467 (comment) ) implement support for non-XML HTML too, so maybe you only need the XML 1.1 fix?
At any rate, I personally wouldn't mind seeing this merged to the develop branch, but as this was originally implemented to demonstrate the feasibility of EPUB3.1 with relaxed XML serialisation rules (which hasn't happened), I understand why there isn't a strong case to introduce (potentially) disruptive changes to an already complex codebase.
If there is strong evidence that the XML1.1 fix is needed (as now seems to be the case), perhaps we should consider extracting the fix in its own PR? (I have to take a look at the code though...it's been a while)

@danielweck
Copy link
Member

PS: the XML1.1 fix basically boils down to XmlParse.preprocess()
https://github.com/readium/readium-cfi-js/pull/45/files#diff-9909b258b32fb35ba81727d9da6faaceR34

@matwood
Copy link

matwood commented Sep 20, 2016

We have tracked down that our problem book slipped through very early before we forced everything through EPUB check. I'll just go ahead and implement a 1.1 fix in our branch.

The issue is always "it works in iBooks", so failing EPUB check does not mean a whole lot to the authors it seems.

@danielweck
Copy link
Member

@matwood could you please let us know where in your fork's code you patched the XML 1.1 issue? (I am asking because XmlParse.preprocess() is specific to the feature/DOMParser branch ... I remember having to extract and centralize this code in order to fix all occurrences of XML parsing, e.g. spine item HTML documents, navdoc, etc.)

@matwood
Copy link

matwood commented Sep 20, 2016

@danielweck Here is the fix I put in place on BiblioLabs fork.
bibliolabs/readium-js@b5b765b

I added some better logging since cross browser error handling is hard with the parse function. If you want I can cherry pick it into the readium branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants