Gravatar – Identity Theft and Abuse

June 13th, 2008
Tech types: the first part of this post is a non-technical description of how Gravatar works. If you’re down wit’ dat you can skip straight over to the good stuff.

The blogosphere has been adopting a new meme over the last while, the cross-site avatar as implemented by Gravatar, MonsterId and others. What this amounts to is a small image displayed next to a blog comment which is meant to socially identify the person making the comment. The point being that it lets you pick a consistent identity for your conversations across the blogosphere.

Gravatars

The key to success for something like this is that it has to be easy to make work on just about any blog or other social site. If it is difficult to implement its uptake will be low and the whole thing doomed to failure. Gravatar certainly is very easy to work with, mainly due to its underlying design, and is therefore well positioned to succeed.
Gravatar’s design is built around the the email address that most blogs require you to provide when commenting and it uses this as the commentator’s unique identifier. The process is as follows: you register your email address with Gravatar and pick one or more images that you want Gravatar to dish up when you comment on an enabled blog or other site. When you then provide that email address for a comment the blog sites sends it off to Gravatar to get the image you picked.
In order to protect the commentator’s email address from spam harvesters the blog site mangles the entered email address using a commonly available one-way hash algorithm named MD5.
A one-way hash works like a very reliable tumble dryer; you stick some data[1] into it and it scrambles it up in a very random way to produce an unrecognisable tangle of stuff which it spits out the other end. What makes a one-way hash special is that it will from any given piece of data produce another piece of data which (a) is very very (99.999999%) likely to be unique and (b) cannot be unscrambled.
The result is that you can with great certainty take an email address and produce a string of text that is unique and you can protect that email address from being collected for spam by never using the actual address but just its hash value – a good thing.

The power of One-way Hashing

So, in order to provide a simple implementation that can quickly be adopted by the blogosphere, Gravatar uses MD5 to hash the commentator’s email address and produce a unique identifier – how can this lead to identity theft and abuse?
Well, one-way hashes are still susceptible to brute force attacks. The brute force approach to cracking a one-way hash is to simply start guessing values and then hashing them. If the hashes match you know that you’ve guessed the right value – easy. A brute force attack is typically streamlined by using a dictionary on the assumption that the hashed data is likely to contain actual human-readable words. And due to its relatively simple nature MD5 is more susceptible to brute force attacks than other, more modern, algorithms. Some common techniques can realistically crack an MD5 hashed email address on a PC. What’s more, there are several which will collectively crack and store MD5 hashes. Several million hashes have already been cracked and are publicly available to search. If a hash has been cracked once by one of these databases it instantly becomes available to the entire internet.
Of course, the more random the data that you put into the hash the more difficult it is to guess[2]. And this is where Gravatar has its first fatal flaw; email addresses are not very random at all. Consider your average email address, it starts out with one or more legible words (say, joetaylor), it then has an @ character followed by a valid, registered domain name (say, hotmail.com). Therefore if you want to crack Gravatar hashes you can limit your dictionary[3] to only include valid domain names preceded by an @ preceded by a relatively human-readable string.
In short, a Gravatar identifier (the hash) is likely to be one of the easiest MD5 hashes to crack. This means that, from your Gravatar identity, a Bad Person© could recover your email address without even breaking a sweat.

So where’s the identity theft? Well, if I have your email address and nickname from one blog comment it means that I can post other comments using those and reasonably impersonate you – flame war city. Secondly, if I get involved in a flame war with you I can recover your email address from your comment and abuse you directly via email – bad vibe city. To illustrate this point I’ve harvested some Gravatars from across the internet and faked up a bit of a flame war in the comments below. Have a look, it’s a lot of fun!

A second flaw in Gravatar’s scheme is that it encourages its users to use a high-value email address[4] to identify themselves. This means that once your Gravatar email has been cracked you are likely to be exposed for a long time. And Gravatar makes no effort to warn its users of the potential for their email addresses to be cracked. It should, at the very least, encourage its users to use throw-away email addresses and allow them to rotate those.

The final flaw in Gravatar’s scheme is that it does not require enabled sites to request your permission[5] before trying to load up a Gravatar image for your comment. So even if you are not registered with Gravatar your email address will still be hashed (against your will) and displayed for harvesting and abuse.

Where does this leave Gravatar as a global identifier for blog comments? I understand that a combination of email address and MD5 was used as a means to encourage fast adoption of the idea across the blogosphere, but it is irresponsible to open up all commentators (whether registered with Gravatar or not) to a totally viable form of identity theft and personal abuse by not allowing commentators to opt out of Gravatar and not warning them of the dangers of using a high-value email address to identify themselves.
Gravatar, if you’re listening, do something about it! You owe it to the blogosphere.

a laundry basket full or socks etc
which is why many sites require you to pick a password containing uppercase and lowercase letters and numbers etc
the set of data from which you randomly guess
one that you’re not very likely to want to change
through a checkbox with a disclaimer or some similar means