Introduction
As you might have heard (or not) YouPorn Chat had a huge information
leak on February 21st 2012. One of their servers served a directory with
all registration log files from the last couple of years
(http://chat.youporn.com/tmp). Apparently this chat server is not
serviced by the YouPorn guys but by the YouPorn chat guys according to
their blog post. Nevertheless, I assume that there will be a huge
overlap for the passwords between the chat service and YouPorn in
general and as well as to other accounts. The files were world-readable
and could be downloaded. Some Swedish guys at flashback.org discussed
what they could do with the log files. They discovered that the logs
contained all registration details and account creation details of all
users from 2008 to 2012 and they shared it with the world. Soon
thereafter Anders Nilsson published an analysis of all passwords on
his blog. His analysis shows the top passwords and he shows some
statistics about the different passwords. Non surprisingly the 10 top
passwords are 123456, 123456789, 12345, 1234, password, qwert, 12345678,
1234567, 123, and 111 111. But analyzing passwords is only the first
step.
When I read about the password breach on twitter I thought that we could
do more with the available data and I got my hands on a copy of the
logs. The log files show detailed information about username, password,
email address, country of origin, date of birth, and user id. So let's
play!
Log format
The log format is really simple and consists of logged registration
attempts and server responses. A registration attempt is logged in the
following format:
<user_register.php: 2010-01-11 00:00:03
POST
username=MyFunnyUsername
email=Mail@Foo
email_confirm=Mail@Foo
password=1234
password_confirm=1234
country=US
msisdn=
isyp=0
isPremiumChat=
dob=1990-08-17
sub1=1
sub2=1
is3g=
>
I guess that username, email, password, country, and dob (date of birth)
are self explanatory. The server responds with either a reply that there
was an error or with a new user id.
A registration is unsuccessful if either the email or the password do
not match or if the username is already taken. The error code is encoded
as the following message:
<REPLY username =fucking whore
status =207
err_msg =202
>
A successful registration contains a correct status and a new userid:
<REPLY username =LaraDWT28SL
status = OK
user_id =3565583
>
After downloading the files I had to parse them using some script foo. A
dirty little python script {link file} did the trick. As I wanted to do
some heavy number crunching and I did not want to spend days
reevaluating the same data over and over again I imported all the logs
directly into a not so small MySQL database. The database has the
following layout:
CREATE TABLE accounts (
date DATE,
username varchar(128),
password varchar(128),
email varchar(128),
dob DATE,
country varchar(2),
userid INT DEFAULT -1,
INDEX (date),
INDEX (username),
INDEX (password),
INDEX (email),
INDEX (dob),
INDEX (country),
INDEX (userid)
) TYPE=InnoDB;
The complete import, i.e., parsing, formatting, MySQL import, and
index generation took a couple of hours. Due to weird formatting I lost
some accounts during the import, so the numbers are a lower bound to the
total numbers.
Analysis
With so many raw data sets (5290696 registration attempts led to 1202040
unique user accounts) it is hard to work with text files only. So the
MySQL database was a good choice to start with. One of the most
interesting analyses is the password analysis. Andres already published
a breakdown of the passwords in his blog and the full results on
pastebin. I assume he filtered the raw data for the raw passwords.
Using a database I have the advantage that I can select more detailed
combinations of data. In the following analysis I will look at country
specific details, registration attempts, email addresses, and the age
distribution of the YouPorn users.
By country
The top country with most registrations is the US (27%), followed by
Germany (12%), the UK (9%), Italy (5%), the Philippines (4%), Canada
(3%), France (3%), India (3%), Australia (2%), and Mexico (2%). The
graph shows a pie chart of the 20 countries with most registrations.
If we look at registration attempts then the picture is a little
different. The log files contain a total of 4,088,656 registration
attempts and 1,202,039 successful registrations, so on average a user
tried to register more than 3.4 times until he/she was successful.
Typing captchas with one hand must be hard
(pun intended).
The number of total registrations actually seems to scale by country.
There is no country that has significantly more failed registration
attempts than an other country. India, Indonesia, and the Philippines
have a slightly higher amount of registration attempts than the other
countries. The table shows the number of registered accounts and the
number of registration attempts.
Age distribution
The age distribution graph shows the fraction of total registrations per
year of birth. The average porn registrant is 31.04 years old, with a
tendency of getting younger, most registrants are 24 years old and
represent 7.96% of all registrations. The two peaks in the graph are
around the ages 32 (year of birth 1980) with 5.72% of total registrants
and 24 (1988) with 7.96%. The older people get (above 30) the less
likely they are to register.
The graph shows that there is a high rising edge around age 20 that
drops off slowly. The question remains if younger registrants just enter
fake birth dates or if they do not register at all. Apparently the
website did not impose any age restrictions as the year of birth span
starts in 1908 and goes up to 2007.
Unfortunately real work calls right now, stay tuned for more results!