Yet another reverse engineering blog

Showing posts with label drm. Show all posts
Showing posts with label drm. Show all posts

Wednesday, December 12, 2007

Mobipocket books on Kindle

We've known for some time already that Amazon's AZW files are actually Mobi files, but Amazon didn't share Kindle's Mobi PID which would allow one to buy encrypted Mobi books for Kindle.
Well, I've discovered the algorithm used to generate the PID and was able to use it on Fictionwise, but there was another catch. AZW files have a flag set in the DRM info which is not present in books bought from other vendors. After fixing that, I could read the book on Kindle.

Linked archive includes two Python scripts.

kindlepid.py generates Mobi PID from Kindle serial number. You can then add this PID at a Mobipocket vendor site and redownload books with Kindle's PID enabled. It's possible that some vendors will refuse this PID, as it has an asterisk in place of the traditional dollar sign (Fictionwise works fine).
kindlefix.py "fixes" a Mobi book so that it can be read on Kindle. It should already include Kindle's PID (which you need to specify too). The script will output the fixed book with .azw extension.

Kindle Mobipocket tools 0.1 0.2
Mirror

Friday, November 23, 2007

Embiid Publishing

Short History

Embiid Publishing was an early e-pub company which started back in 2000. They published some midlist SF and Romance titles, most famous probably being Liaden series by Sharon Lee and Steve Miller. They offered nice prices ($5 in average), and free sampler bundles. At first their books were Windows-only, later they started to offer Rocket format and a book reader for Palm OS. In 2006 the company closed doors, leaving customers with books they could not convert.

The Reader

The Windows reader program could read two formats: UBK and EBK. The former was slightly scrambled but could be read by any reader. The latter was encrypted with a personalized key and could only be read by the personalized reader executable downloadable with the first purchase.
The Reader was written in Delphi and had pretty basic functionality: changeable font, bookmarks, navigation.



File format details

A pseudo-C description of the file header looks like following:
struct EmbiidFile {
/* 00 */ int32 file_seed; //the seed for decrypting header fields
/* 04 */ char type[5]; //file type (encrypted w/ file_seed)

#define FTYPE_UBK "Valid" //non-personalized (text encrypted with file_seed)
#define FTYPE_EBK "EBook" //personalized (text encrypted with user_seed)

/* 09 */ uint32 cover_off; //offset of the cover image (jpeg image)
/* 0D */ uint32 cover_len; //length of the cover image data
/* 11 */ byte version; //format version (encrypted w/ file_seed)

#define CURRENT_VERSION 1

/* 12 */ char title[50]; //book title (encrypted w/ file_seed), space-padded
/* 44 */ char author[954]; //book author (encrypted w/ file_seed), space-padded
/* 3FE */ uint16 nchapters; //number of chapters
/* 400 */ uint32 chap_lens[256]; //chapter lengths
/* 800 */ char book_text[]; //text of the book. UBK: encrypted with file_seed, EBK: encrypted with user_seed
}


The encryption uses a 1024-byte array to xor the data with. The array is initialized from the seed using a pseudo-random number generator. Here's pseudocode for its generation:
float a = seed/1000.0
for(int i=1;i<0x400;i++)
{
float b = int(a/127773);
float c = a - b*127773;
a = c*16807 - b*2836;
if (a<0) a+=2147483647;
xor_buf[i] = int(a/2147483647*256)&0xFF;
}


The decryption uses the file offset of the data to index the array, and it skips bytes that would decrypt to 0x1A (the EOF symbol):

xor_val = xor_buf[file_offset%0x400];
val_out = val_in^xor_val;
if (val_out==0x1A)
val_out = val_in;


While the file_seed is stored directly in the file, user_seed is calculated as Adler32 checksum of a 128-byte user ID, which is stored directly in the personalized EmbiidReader.exe.

t1 = 1;
sum = 0;
i = 0;
do{
t1 = (t1 + user_id[i++]) % 0xFFF1;
sum = (t1 + sum) % 0xFFF1;
}
while ( i <0x80 );
user_seed = (sum<<16) | t1;


The text of the book uses a small subset of HTML tags for formatting, but the paragraphs are delimited by newlines, not <br> or <p> tags.

Here's a small Python script to convert an Embiid book to HTML. A valid EmbiidReader.exe is necessary to decrypt personalized books.
Google Pages
You will need Python to run it.
Place your books, EmbiidReader.exe and embiid.py into the same directory and execute from command prompt:
embiid.py <book.ebk>
You should get a <book.html> file with decoded text.