Introduction
As you might have heard (or not) YouPorn Chat had a huge information leak on February 21st 2012. One of their servers served a directory with all registration log files from the last couple of years (http://chat.youporn.com/tmp). Apparently this chat server is not serviced by the YouPorn guys but by the YouPorn chat guys according to their blog post. Nevertheless, I assume that there will be a huge overlap for the passwords between the chat service and YouPorn in general and as well as to other accounts. The files were world-readable and could be downloaded. Some Swedish guys at flashback.org discussed what they could do with the log files. They discovered that the logs contained all registration details and account creation details of all users from 2008 to 2012 and they shared it with the world. Soon thereafter Anders Nilsson published an analysis of all passwords on his blog. His analysis shows the top passwords and he shows some statistics about the different passwords. Non surprisingly the 10 top passwords are 123456, 123456789, 12345, 1234, password, qwert, 12345678, 1234567, 123, and 111 111. But analyzing passwords is only the first step. When I read about the password breach on twitter I thought that we could do more with the available data and I got my hands on a copy of the logs. The log files show detailed information about username, password, email address, country of origin, date of birth, and user id. So let's play!
Log format
The log format is really simple and consists of logged registration attempts and server responses. A registration attempt is logged in the following format:
<user_register.php: 2010-01-11 00:00:03 POST username=MyFunnyUsername email=Mail@Foo email_confirm=Mail@Foo password=1234 password_confirm=1234 country=US msisdn= isyp=0 isPremiumChat= dob=1990-08-17 sub1=1 sub2=1 is3g= >
I guess that username, email, password, country, and dob (date of birth) are self explanatory. The server responds with either a reply that there was an error or with a new user id. A registration is unsuccessful if either the email or the password do not match or if the username is already taken. The error code is encoded as the following message:
<REPLY username =fucking whore status =207 err_msg =202 >
A successful registration contains a correct status and a new userid:
<REPLY username =LaraDWT28SL status = OK user_id =3565583 >
After downloading the files I had to parse them using some script foo. A dirty little python script {link file} did the trick. As I wanted to do some heavy number crunching and I did not want to spend days reevaluating the same data over and over again I imported all the logs directly into a not so small MySQL database. The database has the following layout:
CREATE TABLE accounts ( date DATE, username varchar(128), password varchar(128), email varchar(128), dob DATE, country varchar(2), userid INT DEFAULT -1, INDEX (date), INDEX (username), INDEX (password), INDEX (email), INDEX (dob), INDEX (country), INDEX (userid) ) TYPE=InnoDB;
The complete import, i.e., parsing, formatting, MySQL import, and index generation took a couple of hours. Due to weird formatting I lost some accounts during the import, so the numbers are a lower bound to the total numbers.
Analysis
With so many raw data sets (5290696 registration attempts led to 1202040 unique user accounts) it is hard to work with text files only. So the MySQL database was a good choice to start with. One of the most interesting analyses is the password analysis. Andres already published a breakdown of the passwords in his blog and the full results on pastebin. I assume he filtered the raw data for the raw passwords. Using a database I have the advantage that I can select more detailed combinations of data. In the following analysis I will look at country specific details, registration attempts, email addresses, and the age distribution of the YouPorn users.
By country
The top country with most registrations is the US (27%), followed by Germany (12%), the UK (9%), Italy (5%), the Philippines (4%), Canada (3%), France (3%), India (3%), Australia (2%), and Mexico (2%). The graph shows a pie chart of the 20 countries with most registrations.
If we look at registration attempts then the picture is a little different. The log files contain a total of 4,088,656 registration attempts and 1,202,039 successful registrations, so on average a user tried to register more than 3.4 times until he/she was successful. Typing captchas with one hand must be hard (pun intended).
The number of total registrations actually seems to scale by country. There is no country that has significantly more failed registration attempts than an other country. India, Indonesia, and the Philippines have a slightly higher amount of registration attempts than the other countries. The table shows the number of registered accounts and the number of registration attempts.
Age distribution
The age distribution graph shows the fraction of total registrations per year of birth. The average porn registrant is 31.04 years old, with a tendency of getting younger, most registrants are 24 years old and represent 7.96% of all registrations. The two peaks in the graph are around the ages 32 (year of birth 1980) with 5.72% of total registrants and 24 (1988) with 7.96%. The older people get (above 30) the less likely they are to register.
The graph shows that there is a high rising edge around age 20 that drops off slowly. The question remains if younger registrants just enter fake birth dates or if they do not register at all. Apparently the website did not impose any age restrictions as the year of birth span starts in 1908 and goes up to 2007. Unfortunately real work calls right now, stay tuned for more results!