You’ve heard about the Netflix prize, right? (If you haven’t, the short version is: Netflix is offering $1 million prize for anyone who can come up with a substantial improvement to their recommendations engine.)
I’m especially interested because 1) I am a former Netflix employee*, and I loved my work there; 2) I am a longtime Netflix customer (since before I worked there, in fact) and a heavy user of the ratings and recommendations features; 3) I am a data geek. I love this type of problem. I wish I had the skills to participate in this challenge, but instead, I’m watching from the sidelines.
So I’ve been perusing the forum for a few minutes to see what the contestants were talking about, and I happened upon a brilliant digression by one Benji Smith about exploring the most-loved, most-hated, and most-contested movie titles in the database through intelligent analysis. Here’s an excerpt:
Now, where is ‘Miss Congeniality’? Evidently, she’s number 171 on the most-loved list. But…Huh? What does that mean? How can a movie be #195 on the most-hated list and also be #171 on the most-loved list? Who’s to blame?
Standard deviation, I’m looking in your direction.
To get a look at the movies that are both universally loved, and universally hated (by different subgroups of people, of course) Let’s write a query that amplifies standard deviation and de-amplifies population, pointing out the sources of contention in our dataset
If this sort of thing looks fun to you, clicky the linky and go see what movies came back from his results. It is data geekery at its finest.
(Edited to add: I emailed Benji Smith to let him know I was talking about him, and he suggested adding a link to benjismith.net, so we can all go read his entertaining essays. Go! Enjoy!)
* I was the Content Manager, circa 2000-2001. I oversaw all content on the web site, its relationships within the database, its timely entry on the site, how it got sourced, etc. It was a super-fun job.