LAMPe2e: File Formats in Wild, Wild West - a review of eReaders for (not very) complex layouts

Having looked around at various ebook publishers, some common features emerged. Most supported the same set of distribution channels. All accepted MSWord manuscripts. Most could distribute in Kindle (AZW/Mobi, AZW3), Epub and PDF formats. And yet all had what seemed like Draconian formatting standards. Obviously I wasn't expecting any issues with a PDF as a final format, but I did expect that the other, HTML based formats, would have no trouble with a fairly restricted set of set of HTML with limited CSS – specifically:

H1, H2, H3 and H4 headings
Bold, italic and underlined normal text
A monospace text for code samples
PNG images (anchored as paragraphs)

In addition, having explicit page breaks at the top of each chapter was a nice to have.

While Calibre seemed to cope with a reasonable conversion of the original OpenOffice manuscript, there was a lot of artefacts in the generated epub, and problems with image sizing. Rather than tinker directly with the epub or OpenOffice file, I exported the file as html from OpenOffice and set to cleaning it up. I was able to script most of this, but it still required hand editing to produce sensible, well-formed HTML (and to clean up the non-visible artefacts from OpenOffice, such as unnecessarily interrupted tags). This approach had the desired outcome; converting the HTML to epub in Calibre eliminated all the conversion artefacts.

I now had something I could proof read in an e-reader.

In my research, I saw very few technical ebooks. I soon found out the reason why. And the reason for the restrictive style guides. It seems that very few readers are capable of visually representing the layout defined in the file – despite the file format explicitly supporting the HTML elements. In short, they did not properly implement the file format they claimed. Indeed missing it by a wide margin in many cases.

Here is a sample of the programs I tried on my Lenovo Yoga 10 (Android):

name	Monospace font?	sans font?	serif font?	borders on divs?	backgrounds	notes
Moon+ eReader	no	no	yes	no	sort of	unexpected page breaks injected
Aldiko Reader	no	no	yes	no	no	unexpected reformatting of paragraphs
FB reader	no	yes	no	no	no
eBook reader by Vadim Lopatin	yes	no	yes	no	no
eBooks.com eBook reader	no	no	yes	yes	no
eReader Prestigious	no	no	yes	no	no	random blocks of whitespace, particularly adjacent to imgs
Graphilos epub reader	yes	yes	yes	yes	yes	no margins, poor kerning and difficult navigation
UB reader	yes	yes	yes	yes	yes	does not ALIGN=CENTER
NeoSoar eBook reader	yes	yes	yes	yes	yes	Does not pre-render prev/next pages making turning pages slow and disconcerting. Like UB reader, does not ALIGN=CENTER
Gitden Reader	yes	yes	yes	yes	yes	seems to be making its own mind up about image sizes but making sensible choices. Otherwise strong compatability and mathml support

(I was unable to open books stored on the local filesystem using Scribd or Kindle)

Overall, Gitden reader stood out as the most capable eReader for rendering layouts, with UB reader in a close second.

Although all were able to visually represent different header levels, the problems with fonts, borders and backgrounds meant that I have had to use all three techniques to have a reasonable chance that code samples will be visually different from body text in the rendered document.

LAMPe2e

Thursday, 1 January 2015

File Formats in Wild, Wild West - a review of eReaders for (not very) complex layouts

1 comment: