Producing digital books formatted for reading on electronic devices should be simple. But achieving the strict technical standards laid out for EPUB files can be difficult. Here’s our initial experience using the Sigil EPUB editing software.
While it’s possible to export EPUB files from document editors such as MS-Word, Libre Office and others, the results can be… mixed. That’s an understatement. The EPUB files they spit out can be bloated with a mish-mash of embedded styles, some of which are NOT compatible with XHTML formatting standards used by e-readers.
Clean XHTML required
The major e-book stores (Kobo, iBooks, Barnes and Noble) validate all EPUB files before they go on sale. Rejections for technical faults in the XHTML are common. This leaves you with two choices; try to clean your files to make compliance, or produce a clean file to begin with.
Using the Sigil EPUB editor to produce a clean file from scratch is one way to do this. Sigil is an Open source program, the support and maintenance provided by community volunteers. While it is an excellent tool, Sigil’s whole design and operation is based on web development tools from years gone by. Anyone who mastered Dreamweaver will feel right at home here; anyone less technical will struggle. In our view, Sigil is just not ready for prime-time with non-technical users.
The first step is to decide whether to produce epub 2.0, the older and most device-compatible, or epub 3.0 files, with some jazzy features only supported on newer e-readers. The epub format is a self-contained archive file with a specific structure which contains all the individual files making up your book – pages, style sheets, images, other media.
In order to manage this, Sigil uses three main views in the user interface. There’s a manifest view on the left showing all the files and assets in the epub. In the centre is the code editor. On the right is the preview window showing approximately what your content will look like in an e-reader. You cannot edit anything in preview, all the work is done in the edit pane. Sigil is a “what you see is mostly what you get” kind of editor. You need to check your epub in a dedicated e-reader to be confident that it lays out correctly on users’ screens.
Sigil constantly checks your XHTML code as you make changes to the current file, notifying the first error at the top of the preview window. A typical message could be something like:
This page contains the following errors: error on line 26 at column 1: expected '>' Below is a rendering of the page up to the first error.
That has a simple fix: insert the missing > symbol at the position indicated to make valid XHTML markup. Other errors are way more problematic:
This page contains the following errors: error on line 49 at column 8: Opening and ending tag mismatch: p line 0 and body Below is a rendering of the page up to the first error.
Firstly you have to adjust the preview window to be able to read the whole message because it falls off the right edge of the screen. Second you have know what the message means. Thirdly you have to find the actual source of the error. In this example, we have an opening paragraph tag but no closing paragraph tag:
<p>You will need:
That’s on line 16. However, Sigil has to run through the markup until it realises there’s a closing tag missing and points at line 49. Not where the error is. In a long file, the message and the error location could be hundreds of lines apart.
Anyone not familiar with editing HTML may find this intimidating. Furthermore, EPUBs are written in the stricter XHTML, which has syntax differences from the regular HTML used on web pages.
Getting content into Sigil
Any html file can be imported into Sigil, which it will attempt to convert and validate as XHTML.
If you start from scratch, use a template XHTML file like the one Sigil opens by default. This contains the bare minimum of valid header and metadata. You will crucially need a style sheet in order to render any sort of layout for fonts, spacing, lists, tables, graphics placement and so on. An epub style sheet is a standard CSS (Cascading Style Sheet) file.
Once in Sigil, you can split your file into chapters or sections, add a cover page (not the same as your marketing cover), add your images and other files, create a table of contents and adjust the layout of your epub.
For compatibility with EPUB readers, your XHTML HAS to be compliant. This is where EPUBs exported from programmes such as MS-Word, Libre Office and others often fail. Your EPUB file has have integrity, with proper linking and markup. That markup also has to be compliant with XHTML. While Sigil takes care of a lot of that for you, it is by no means complete.
The difficulty is you just don’t know how strictly each of the e-book stores validate the EPUB files; they all have their own programs running their own interpretation of XHTML. You would think there’d be one universal interpretation of XHTML. You’d be wrong.
Two key items in an EPUB are:
- the manifest (.opf file), which is the inventory of all the files and assets in the epub
- the index (a .ncx file) which ties files and assets together.
Certain edits that add or remove assets or change the linking relationship between them will create errors in one or both of these key files.
Validation exists at several levels in Sigil. From the Tools menu, you can choose to validate your style sheet through W3C. There is also an option to Reformat HTML. This can be benign, or it can be catastrophic depending on the state of your files. Mend all HTML files attempts to fix your markup, Mend and prettify all HTML files will fix markup and try to lay it our cleanly with indents in the code editor. Neither is selective, these are cluster-bombs that hit all the HTML files in your epub at once.
XHTML validation in Sigil is provided by a plug-in called Flight Crew. After installing the plug-in into Sigil, you can run validation on demand. Flight Crew lists its results in a validation pane across the bottom of the Sigil window. While Flight Crew is invaluable for checking your EPUB, the presentation of error messages is mired in Sigil-specific jargon as well as XHTML errors.
Flight Crew is good, but fallible. It picks up the common errors but will not guarantee your file will be accepted by the e-book stores.
For a second level of checking, it is best to use another tool such as the online EPUB Validator (beta) at http://validator.idpf.org. You can see in the example that EPUB Validator picked up a basic error in the style sheet that Flight Crew missed. The error was an unneeded quote mark on the end of a style declaration – background: #f5f5f5;”
EPUB validator will pick up errors that the smaller Flight Crew plug-in misses. So far, this two-pass validation has got all of our e-books through the checks made by Kobo.
Formatting and validation is just the tip of the Sigil iceberg. This is a capable but specialist tool with a difficult, complex interface and a steep learning curve. Despite its many features, it still fails to negotiate the intricacies of all possible EPUB XHTML formatting. In the right hands it can produce quality EPUB files, but as an author tool, it misses the mark by a mile.