Sunday, February 13, 2011

A k3 web browser enhancement I'd like to see: Save To Kindle

The K3's WebKit browser is a very useful complement to Kindle's core reading functionality. While not as nimble or functional as many other mobile browsers, it's great for reading a broad range of web content. One feature that makes it particularly pleasant is 'Article Mode', which strips out complex layout and styling and lets you focus on the main content of an article.

To take Article Mode to the next level, I'd like to see a 'Save To Kindle' menu option when viewing in Article mode. This would save the contents as an HTML file to Kindle's 'documents' folder for off line reading. This would enable the ability to resize text, annotate, sharing to Twitter/FB, and use Text To Speech, while retaining the article's hyperlinks. The article could also be later copied to a computer and imported to a word processing document for re-use.

Kindle can read simple HTML files directly without conversion, if the file is given a '.txt' extension. The HTML code represented by the contents viewed in article mode qualifies as 'simple'. At most, href values that jump to other locations within the same web page (many articles have a sort of TOC of links to sub topics) would need to be sanitized so that they jump to the location in the current item instead of launching the web browser.

This would be a low-cost, high-return enhancement that many Kindle users would come to appreciate.

Tuesday, February 8, 2011

Inside the new Kindle 'page numbers' feature

The new prerelease software for the Kindle 3 (v 3.1) has a feature called 'Real Page Numbers':
Real Page Numbers -- Our customers have told us they want real page numbers that match the page numbers in print books so they can easily reference and cite passages, and read alongside others in a book club or class. We've already added real page numbers to tens of thousands of Kindle books, including the top 100 bestselling books in the Kindle Store that have matching print editions and thousands more of the most popular books. Page numbers will also be available on our free "Buy Once, Read Everywhere" Kindle apps in the coming months. If a Kindle book includes page numbers, press the Menu key in an open Kindle book to display page numbers.
[For a more complete description on, click here.]
Page numbering corresponds to a specific print edition, as identified by the print edition's ISBN number.

I was curious about the implementation, so I downloaded "The Girl Who Kicked the Hornet's Nest" from my Amazon Archive and had a look.

I noticed that there is now a sidecar file with ".apnx" file extension. Hmm, could this have something to do with page numbers? As in 'Amazon Page Number indeX'?

Indeed, viewing the file in a hex viewer confirms this suspicion:

At the top, you can see a string table/dictionary at the top (this one is for 'The Girl Who Kicked the Hornet's Nest'):

{"contentGuid" : "78a941d9", "asin" : "B0031YJFCQ", "codeType" : "EBOK", "fileRevisionId" : "1"} - {"pageMap" : "(1, a, 1), "asin" : "030726999X"}

So we see both the Amazon ASIN and print edition ISBN here.

This is followed by an array of 16 byte values which appear to represent a sequence of numbers arranged in ascending order. I'm guessing that each of these defines an offset to the position that corresponds to the start of a given physical page number. The number of 16 byte values seems to be very close to the number of page numbers in the book (there are a few additional rows of bytes that precede the presumed 'page map' as such, and may have some special significance).

In the book I looked at, material before page "1" does not display a page number (such as i, ii, iii, iv etc.). (Wonder if that's a limitation of Amazon's page mapping scheme, or just what they did for this particular book?) I'd also note that the last page number (in this case '563') was applied to content that almost certainly spreads over more than one physical page, and indeed, is assigned to material not in the physical book. In this case, the ebook edition puts the copyright page at the end, as well as a cover image, these should not have been labeled as being page '563'. Okay, so it is not perfect, at least in this case.

Presumably this scheme also works with Topaz format books, a requirement Amazon would need to take on, and it's something they can do after material is submitted to them for publishing.

It's not clear how self-published books can get page numbers, since 'locations' don't exist until you bake the .azw file. Hopefully Amazon will clarify this for its KDP ('Kindle Direct Publishing') users.

I noticed that there are also two other file extensions associated with Kindle Store books now (not just those with page numbers):
.ea - this is an xml file that contains the data for the 'Customers who bought this book also bought' and 'More by this author' lists that now show up after the last page of the book, including ASIN so it can jump to the title's Kindle Store page.
.phl - is an XML file that identifies a position offset of popular highlights in the book, and the frequency number for each. That's probably been there for awhile, since the popular highlights feature was introduced for K2/DX.

I was curious as to when these files show up or are updated, so I turned off wireless, connected to my computer and deleted all three. Page numbers went away, the .ea lists went away (leaving only the 'tweet' and 'rate' links), and the popular highlights went away - as expected. Then I did 'Sync and Check for Items'. Still nothing. Finally, I removed the book and added it again. Everything's back!

So to take advantage of the new 'real pages' feature, it appears you must remove the item and download it again.

I trust this has been as educational for you as it has been for me!

Monday, February 7, 2011

'Send to Kindle' extension for Chrome browser

I've often wanted a simple way to send a web page to my Kindle for off-line reading. Many people like InstaPaper for this, but it requires signing up for an account, and does more than I need it to. 

'Send to Kindle' runs only on the Chrome browser, so you can stop reading unless you are willing to switch. I use a Mac most of the time, and had been more or less lazily running Safari, which has certainly improved quite a bit over the last year or two, but is not without its quirks (well what would a browser be without quirks?). I also use Firefox, but mostly in a development and testing context (Selenium, Firebug, ePub viewer etc.), not for everyday browsing. I'm a minimalist when it comes to plug-ins, also, so while Firefox probably has the largest selection of any browser, that's not high on my requirements. 

It didn't take much for me to switch over to Chrome, which I've been playing with for a number of months, and have been very impressed with. It seems faster and more robust, is at the forefront of HTML5 adoption, and I like the simplicity, the integrated search+address bar, built in translation, and security features (and many more features that I've yet to explore). Since it is a WebKit-based browser like Safari, I felt more confident that the web sites I typically visit would continue to work as well as they did in Safari.

So once I discovered what 'Send to Kindle' could do for me, I was ready to switch to Chrome.

Once you have installed Chrome, installing Send To Kindle is simple:
- choose 'Extensions' from the Window menu
- click 'Get more extensions...' link
- search for 'Kindle'; 'Send to Kindle' should be listed. (there are at least 2 other Chrome extensions that appear to do the same thing; I reserve the right to prefer one of those once I've had a chance to try them!)
- click to install - a box with a checkmark should appear next to the address bar
- right click on the icon and select Options
- follow the instructions to complete configuration (includes adding '' to your Kindle's 'approved email' list, specifying the Kindle email address to send to, etc.
- now when you want to send a web page to Kindle, click on the Send2Kindle icon (you can configure '1-click', otherwise you'll see a preview before clicking on 'Send')
If your Kindle is listening on Whispernet, it will soon receive a azw-format rendering of the web page.

The extension is under active development. I sent a couple of suggestions; one was already implemented, and the developer will implement the other in the next iteration.

UPDATE (10Feb2010): Another option for Chrome users is 'Later On Kindle'. It's very similar to 'Send To Kindle', but adds the option of sending a PDF that you're viewing in the browser to your Kindle (with an option to attempt 'convert' to azw). It seems to be a little more agressive in terms of cleaning up web page formatting, which may or may not be to your liking. I'd like to see it include the originating URL as a link in the resulting ebook, so it is easier to go back and look at images, etc. that get stripped out.

UPDATE (16Feb2010): 'Send To Kindle' has been updated and the two issues I had have been fixed. Still, I would install both extensions, as each offers features the other lacks.

It would be really nice if this function were added to Kindle's web browser for any page that can be viewed in Article Mode. Instead of needing to send it wirelessly, they could just save the HTML to the documents folder as a .txt file (assuming they cannot 'cook' an .azw file on the fly). Kindle renders such HTML with basic formatting.

UPDATE (20Feb2010): 'Send To Kindle' is also under development as an extension for Safari and Firefox, with plans to support images and deal better with 'formatted text.' At this point I'm using Later On Kindle only to send PDFs.