I converted a picture-book to EPUB this week. I thought it would be super easy, but I ended up needing to use several tools to get it right. The next one will be much simpler (right), but for now I'll document the various tools I used/tried. My source file was a 156-page PDF file with one illustration per page. I work in Mac OS X 10.6, but several of these tools are available (or have close equivalents) for other operating systems.
I heard Calibre (open source) author Kovid Goyal speak at InDesign Secrets Live in May 2010, and what impressed me most about him was his attitude about ebook consumption: you should be able to read any ebook on any device. He made Calibre so that he could convert ebooks to different formats so that he could read them on his preferred devices. This is a tool made by a reader for other readers, publishers be damned! Naturally, it was my first thought when Kevin proposed this week's project. However Goyal cautions users that PDF does not generally convert well to most ebook formats. This is not entirely surprising given PDF's emphasis on the page and preservation of design, versus the emphasis of most ebook formats on re-flowability. I gave it a shot anyway and ended up with a massive file that produced 683 errors upon validation check. Oy. I did a little reading about converting comics to EPUB, and I found a tutorial where you start with separate image files (with names reflecting the image sequence), then you zip those files and rename the ZIP to CBZ (a compressed comic filetype). Then, you can use Calibre to convert from CBZ to EPUB. That got me a little closer, but I was still seeing some errors. As best as I can tell, Calibre does not let you edit the contents of the EPUB file directly. It offers tools for adding metadata, but you can't actually open up the file for troubleshooting via the Calibre interface. Much as I like this utility, it's definitely geared for consumption rather than creation.
The next tool I tried was Sigil, also open source. Sigil is a WYSIWYG EPUB editor offering multiple views (pure WYSIWYG, split WYSIWYG/code, pure code). This editor was immediately appealing because the split interface would help me understand the link between code and display. Sigil's instructions for converting from PDF specify exporting an html version from Adobe. You then open that html file in Sigil and save as an EPUB file. Simple, right? Well, it is, but the EPUB file created this way was displaying like... an EPUB. That's to say, the content was flowing to fit my display (I was testing in Adobe Digital Editions, and on and iPhone and iPad). This is exactly the behavior you want for a text-based EPUB, but not what you want from a picture book. My images were running together in the display. But Sigil also has the ability to insert "chapter breaks" -- essentially, forced page breaks. So I tried this. I added a chapter break for each and every image in the file, all 156 of them, and re-saved my EPUB. I opened the file in Digital Editions and on my iOS devices to find that my 156 "page" document had turned into a 700+ page document, with several blank pages between content, and with those content pages only showing a small corner of the full image. It was a mess. I noticed, though, that creating the "chapter breaks" divided the content, which was previously in one XHTML file, into many XHTML files -- one file per "chapter." I also noticed that each image element was wrapped in a paragraph element. My very limited HTML experience made me wonder if maybe the paragraph elements were forcing styling problems. I opened some other EPUB files with Sigil (samples of picture books -- I used Phone Disk to get them from my Apple devices to my desktop) and noticed that they were using div elements instead of paragraphs. So I figured I should do the same.
TextWrangler, Stuffit, Terminal
Enter TextWrangler and it's glorious multi-file search and replace function. Before I could start work in TextWrangler, though, I needed to unpack the EPUB file (EPUB is a compression of multiple files, kind of like a ZIP) but OS X's double-click to expand feature doesn't work on EPUBs. I used StuffIt Expander to unpack the file. TextWrangler made short work of this replacement, and it also allowed me to open up the other files inside the EPUB. Besides XHTML, there are CSS, OPF, image, XML, and mimetype files. By this point, I had also come across Liz Castro's Fixed Layouts Miniguide, so I decided to implement her instructions (which also required creating another XML file and tweaking the CSS file). Again, these are things I couldn't do from Sigil - Sigil only shows you and lets you edit the XHTML files (as best I can tell). Now that I was working on the files directly though, I could no longer merely save-as EPUB. Instead, I followed Liz Castro's instructions for using Terminal to recompile the EPUB.
I mentioned the process of validation earlier. I'll address those tools in another post.