Data geekery, movie style

You’ve heard about the Netflix prize, right? (If you haven’t, the short version is: Netflix is offering $1 million prize for anyone who can come up with a substantial improvement to their recommendations engine.)

I’m especially interested because 1) I am a former Netflix employee*, and I loved my work there; 2) I am a longtime Netflix customer (since before I worked there, in fact) and a heavy user of the ratings and recommendations features; 3) I am a data geek. I love this type of problem. I wish I had the skills to participate in this challenge, but instead, I’m watching from the sidelines.

So I’ve been perusing the forum for a few minutes to see what the contestants were talking about, and I happened upon a brilliant digression by one Benji Smith about exploring the most-loved, most-hated, and most-contested movie titles in the database through intelligent analysis. Here’s an excerpt:

Now, where is ‘Miss Congeniality’? Evidently, she’s number 171 on the most-loved list. But…Huh? What does that mean? How can a movie be #195 on the most-hated list and also be #171 on the most-loved list? Who’s to blame?

Standard deviation, I’m looking in your direction.

To get a look at the movies that are both universally loved, and universally hated (by different subgroups of people, of course) Let’s write a query that amplifies standard deviation and de-amplifies population, pointing out the sources of contention in our dataset

If this sort of thing looks fun to you, clicky the linky and go see what movies came back from his results. It is data geekery at its finest.

(Edited to add: I emailed Benji Smith to let him know I was talking about him, and he suggested adding a link to benjismith.net, so we can all go read his entertaining essays. Go! Enjoy!)

* I was the Content Manager, circa 2000-2001. I oversaw all content on the web site, its relationships within the database, its timely entry on the site, how it got sourced, etc. It was a super-fun job.

This entry was posted in metrics & analytics and tagged , , , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

3 Comments

  1. hitchhiker
    Posted October 12, 2006 at 9:58 am | Permalink

    wow!

  2. jaxn
    Posted October 12, 2006 at 2:11 pm | Permalink

    I have another programmer friend who I am working on this with. We wrote a recommendation system for another ecommerce company, but I don’t think our chances of winning are all that great.

    However, we have a registered team, and if you think you have some valuable insight to offer maybe you could join our team.

    -Jackson

  3. kateo
    Posted October 12, 2006 at 2:17 pm | Permalink

    Aww, Jackson, that’s sweet of you to offer! I’d love to. Unfortunately:

    Current and former employees [...] are ineligible to enter and participate in the Contest or be awarded or retain any Contest Prize.

    But thanks! I’ll definitely be interested to hear how you do with it.