Thursday, 1 January 2015

File Formats in Wild, Wild West - a review of eReaders for (not very) complex layouts

Having looked around at various ebook publishers, some common features emerged. Most supported the same set of distribution channels. All accepted MSWord manuscripts. Most could distribute in Kindle (AZW/Mobi, AZW3), Epub and PDF formats. And yet all had what seemed like Draconian formatting standards. Obviously I wasn't expecting any issues with a PDF as a final format, but I did expect that the other, HTML based formats, would have no trouble with a fairly restricted set of set of HTML with limited CSS – specifically:
  • H1, H2, H3 and H4 headings
  • Bold, italic and underlined normal text
  • A monospace text for code samples
  • PNG images (anchored as paragraphs)
In addition, having explicit page breaks at the top of each chapter was a nice to have.

While Calibre seemed to cope with a reasonable conversion of the original OpenOffice manuscript, there was a lot of artefacts in the generated epub, and problems with image sizing. Rather than tinker directly with the epub or OpenOffice file, I exported the file as html from OpenOffice and set to cleaning it up. I was able to script most of this, but it still required hand editing to produce sensible, well-formed HTML (and to clean up the non-visible artefacts from OpenOffice, such as <I>unnecessarily</I><I> interrupted</I> <I>tags</i>). This approach had the desired outcome; converting the HTML to epub in Calibre eliminated all the conversion artefacts.

I now had something I could proof read in an e-reader.

In my research, I saw very few technical ebooks. I soon found out the reason why. And the reason for the restrictive style guides. It seems that very few readers are capable of visually representing the layout defined in the file – despite the file format explicitly supporting the HTML elements. In short, they did not properly implement the file format they claimed. Indeed missing it by a wide margin in many cases.

Here is a sample of the programs I tried on my Lenovo Yoga 10 (Android):
name Monospace
sans font? serif font? borders
on divs?
backgrounds notes
Moon+ eReader no no yes no sort of unexpected page breaks injected
Aldiko Reader no no yes no no unexpected reformatting of paragraphs
FB reader no yes no no no
eBook reader
by Vadim Lopatin
yes no yes no no
eBook reader
no no yes yes no
eReader Prestigious no no yes no no random blocks of whitespace,
particularly adjacent to imgs
epub reader
yes yes yes yes yes no margins, poor kerning and difficult navigation
UB reader yes yes yes yes yes does not ALIGN=CENTER
eBook reader
yes yes yes yes yes Does not pre-render prev/next pages
making turning pages slow and disconcerting.
Like UB reader, does not ALIGN=CENTER
Gitden Reader yes yes yes yes yes seems to be making its own mind up about image sizes
but making sensible choices. Otherwise strong
compatability and mathml support

(I was unable to open books stored on the local filesystem using Scribd or Kindle)
Overall, Gitden reader stood out as the most capable eReader for rendering layouts, with UB reader in a close second.
Although all were able to visually represent different header levels, the problems with fonts, borders and backgrounds meant that I have had to use all three techniques to have a reasonable chance that code samples will be visually different from body text in the rendered document.