Dec 31 2007

Yahoo! messenger archive file format

Tags: , , , , Rajiv @ 8:46 pm UTC

The first step to get my Yahoo! messenger (YMessenger) conversations into windows desktop search is to decode the conversations stored in YMessenger archive files.

If you enable message archiving, YMessenger saves all the conversations with your friends in C:\Program Files\Yahoo!\Messenger\Profiles\${userid}\Archive\Messages\ directory in files with the extension .dat. The menu option Contacts -> Message Archive shows all the archived conversations grouped by user.

Figuring out the format of this file was thrilling … but probably not as challenging as the codes faced by codebreakers in the real world. (Simon Sigh’s The code book did leave a lasting impression on me … as did his other books!)

If you consider the equation encoded-message = code(original-message), the real codebreakers have access to only the encoded message. They have to figure out the code and in the process figure out the original message. The biggest leverage I had was that I had to figure out only the codem given encoded-message and original-message.

For example, I could send myself the message “a” and look at the contents of the .dat file. Then I could send myself the message “aa” and look at the contents of the .dat file. Then send the message “b”, followed by “bb”, followed by “ab”. Looking at the changes to the .dat file after every step.

Each message (from your to your buddy or vice-versa) in the .dat file is represented by a Record. Every Record has a timestamp of when it was sent, whether it was from you or from your buddy and the formatted (bold, italic etc) message. Your profile name and the the name of your buddy can be derived from the name of the .dat file and the name of its parent directory respectively.

The format for a Record is:

  • The first int (four bytes) represents the number of seconds from Java epoch (Jan 1, 1970)
  • The second int … i don’t know what it is
  • The third int indicates that the message is from you to your buddy if it is zero or from your buddy to you if it is non-zero
  • The fourth int (msgLen) represents the length of the encoded formatted-message. There is no encoding/encryption till this point.
  • Next msgLen bytes represent the encoded formatted-message

The formatting of the message is indicated by some special tokens in the .dat file. These font-attribute tokens always start with 0x1B5B. So if you type the message “this is NOT acceptable”, the formatted message would be “this is {[0x1B5B][0x31]}NOT{[0x1B5B]x[0x31]} acceptable” (where {} indicate one token and [] indicates bytes shown in hex value instead of ASCII values). {[0x1B5B][0x31]} is a token that indicates begin bold and {[0x1B5B]x[0x31]} indicated end bold.

There are tokens for (begin and end of) bold, italic, underline. Then there are tokens that mark the begining of custom and standard(/palette) colors. The ending of colors is indicated by a token that indicates begining of standard color: black! The only peculiarity I could notice was that they use HTML like tags (instead of tokens starting with 0x1B5B) when you send messages with color gradient.

The formatted-message is then encoded before saving into the record. One of the challenges with a web-desktop application is that while storing the encrypted data on the desktop, what key do we use for encryption? The key used for encryption has to be different for each user, should not be guessable by other users and should not be stored on the PC. One option could be to use the password of the user as the key. But, whenever the user changes the password the archive has to be decrypted with the old password and encrypted with the new one.

The alternate solution is to have a autogenerated key stored per user on the website. An authenticated user can download his key and decrypt the archive. Changing the password does not change this key.

YMessenger uses the simple XOR cipher to encrypt the messages. The key used for the cipher is highly guessable: your user-id! Every byte of the formatted-message is XOR’ed with a byte from the user-id. For example if you message was “Hello World!” and your user-id was “doofy“, then the encrypted bytes would be:
[H^d][e^o][l^o][l^f][o^y][ ^d][W^o][o^o][r^f][l^y][d^d][!^o]

The beauty of XOR cipher is that if encoded-message = xor-cipher(original-message, key) then original-message = xor-cipher(encoded-message, key)

Parser.java documents the file format in more detail. Main.java demonstrates how one can use the Parser to convert the .dat files to HTML or plain text format.

How did you spend your new year’s eve?! :D


Dec 21 2007

Addicted to search

Tags: , , , , , Rajiv @ 11:39 am UTC

For as long as I can remember, I have been too lazy to use: my fingers to type and my brain to remember things. When I used to work on linux, I used to rely heavily on the locate command to search and open files vi `locate math.h`. This was something I missed a lot in windows. Finally I started using Launchy as a replacement for locate.
Locating sources files using launchy

Searching through mails also worked pretty well when was using Evolution on Linux. But search in Outlook sucks, especially if you are using IMAP. The amazing Lookout plugin for Outlook was my saviour. Unfortunately, it had its own problems. It used to crash my Outlook 2K often; and once Microsoft bought them, there was no hope of getting things fixed. Microsoft has been pushing its own Windows Desktop Search instead of Lookout. Though not as fast as Lookout … it is the compromise solution I have been using for the sake of stability.

I know there are other desktop search products out there, including the one from google. But the thing I like about windows desktop search is that I can do Outlook operations on the search results (like forward the mail or move them into folders). Now I am so addicted to search that I move mails to folders only when it has huge attachment. Otherwise, it is pretty much the gmail model. I only use Inbox.yyyy and sent-mail.yyyy folders actively (yyyy being the year). I considering setting up a rule in Outlook to save sent-mail in Inbox, that way I would have only Inbox.yyyy folders and I can do a threaded view of the conversations (i.e. the gmail’s “All Mail” label)!

Apart from email, our other significant communication medium at work is Yahoo! messenger. I archive all my conversations and refer to them very often. The unfortunate side effect of this is that some conversations that start on email are concluded on chat and six months later when I search mails, I do not find the mail with the conclusion. Over the years I have wizened up to search conversations on email and followup the search in ymessenger. Unfortunately, search in ymessenger also sucks! You have to do a manual search based on the timestamp of email conversation and people involved.

YMessenger saves conversations in C:\Program Files\Yahoo!\Messenger\Profiles\${userid}\Archive\Messages\ directory in files with the extension .dat. It would be nice to have Windows Desktop Search (WDS) index these files and show my conversation results when I search for communications. I can think of couple of approaches to achieve this:

  • Convert ymessenger archives to Outlook mailbox format (.pst) and let WDS index it
  • Convert ymessenger archives to RSS and import the RSS into Outlook using RSS Popper and once the messages are in Outlook, WDS will index it
  • Convert the ymessenger archive files (.dat) to html format and have WDS index these. Probably the easiest integration, but the limitation would only issue will not be able to run searches of the type “customer requirements from:myYahooBuddy date:last month”
  • WDS supports plugging-in IFilters to search new file types. I could implement IFilter to index the ymessenger archive files (.dat).

All of these presume there is some API to decode the content in the ymessenger archive files (.dat). The search is on!


Jun 02 2004

Surviving windoze …

Tags: Rajiv @ 6:18 am UTC

Over the past couple of years I have learned some tricks to survive on windows …

Keyboard shortcuts

  • Ctrl+Esc: Same as Win
  • Ctrl+Shift+Esc: Brings up the task manager. [It is faster than Ctrl+Alt+Del and then choosing task manager.]
  • Win+D: Toggle show desktop
  • Win+E: Start explorer
  • Win+F: Start find in files
  • Win+M: Minimize all windows
  • Win+shift+M: Restore minimized windows
  • Win+R: Run a program
  • F2: Rename a file/dir in explorer
  • F3: Find in the select dir of explorer

Know thy cmd

  • You can start one using Win+R cmd
  • start <dir-name>: opens an explorer pointing to that directory
  • start <file-name>: opens the file with its default handler. For example start <src.zip> will open the src.zip file in IZArc if it is your default zip file handler
  • F7: View command history
  • F8: Complete the command based on history
  • Right click on the title bar, choose properties, in the options tab switch on the “QuickEdit Mode” and “Insert Mode”
  • Making a selection on the cmd [by dragging the mouse over the text in the cmd when ``QuickEdit Mode'' is on] will make the scrolling stop. The application might also stop if it cannot write any more data to its outputstream, making it look like a “hung” application.
  • You can enable file/dir name completion in cmd on typing tab

File name completion in cmd

I always missed the tab completion feature of *nix shells in windows. Long ago Sachin told me how to acheive the same in windows. Do a Win+R regedit. In the tree navigate to : HKEY_CURRENT_USER/Software/Microsoft/CommandProcessor. On the right panel you will see CompletionChar. Double click on it and set the value to hex 9. Next time you start a cmd, you can use the tab character to for file/directory name completion.

Adding a “command” menu item to explorer folders

Wouldn’t you want to add a right click menu item in explorer selecting which you could open a cmd with the directory appropriately set?!

Screen grab showing a new menu item called command to Explorer's popup menu.

Do a Win+R regedit. In the tree navigate to :HKEY_CLASSES_ROOT/Folder/shell. Right-click on the shell folder and choose New > Key. Give it a name cmd. The name you give is the name of the menu item. Choose the cmd folder and choose New > Key. Give it the name command [has to be command ... nothing else]. Choose the command folder and double click on the (Default) item in the right hand panel. Set the value to cmd. Next time you start explorer, right click on any folder and you will get the cmd menu item. Choosing the menu item will open a command prompt with its directory set to the selected folder in explorer.

Favourite command line utilities …

There is nothing like it if you can install cygwin. However if you are constrained for time/bandwidth, try the 3MB download from http://unxutils.sourceforge.net. It has all the utilites like tail, diff, and compression tools like gzip/bzip2 … natively compiled on windows.

Things I still badly miss …

One thing I really still miss is the locate command. Honestly, I haven’t tried the windows indexing service. So I have no clue how good a replacement it is. Nor have I tried the tools like: X1 or OmniMea. I have also that google’s “puffin project” is going to bring google to desktop. There seems to be some action regarding this even in the M$ Longhorn camp.