Applying the Ising model to another data set

Audioscrobbler is this really cool data set from a few years ago; back then, Audioscrobbler had not yet been rolled into the last.fm but it had about the same functionality as it does now. Basically, it’s a little plugin for iTunes et al. that lets someone keep track of all the artists you listen to. The listening habits of several thousand people were collected and distributed under a creative commons license.

After some normalization/cleanup, we end up with a set of artists each user is liable to listen to.
This is the sort of co-occurrence statistic which Ising models are good at capturing. The Ising model contains a matrix of parameters which indicate the correlations between artists — that is, the relative likelihood that a given user will end up listening to both artists.

Because this is a rather high-dimensional problem, we can employ some L1 + L2 penalization; what we end up learning is a relatively sparse parameter matrix that is often easier to interpret.
With some magic (cough cough) we can learn this parameter matrix fairly quickly. I thought I’d post some of the correlations between artists here for your {be/a}musement.

Now the actual parameter matrix consists of several thousand artists. Here, I’m selecting the 10 artists with the highest total correlations. You might say that these are the artists which tug most fiercely on other artists (the most cliquey artists if you want). For each of these 10 artists, I show the 5 most highly correlated artists.

The results make pretty good sense; it’s actually kind of disturbing how predictable people’s musical tastes are. And for some reason the main cliques at the top of the list are all either metal bands or the sort of indie bands likely to populate OC soundtracks =). I should point out that if you go further down the list you eventually find a few other cliques such as trip hop (Portishead, Massive Attack, Lamb, Tricky, et al. [note to self: how cool would “et al.” be as a band name?]), 80s rock with remarkable staying power (Aerosmith, Bon Jovi, Guns N’ Roses), wuss rock (Counting Crows, DMB, Goo Goo Dolls), and just plain bad music (3DD, Hoobastank, Staind, Nickleback).

Artist… …is correlated with
Metallica Iron Maiden Megadeth Pantera Slayer Nightwish
In Flames Dark Tranquillity Soilwork Children of Bodom Arch Enemy Dimmu Borgir
The Arcade Fire The Fiery Furnaces Broken Social Scene The Go! Team Bloc Party Stars
Nightwish Within Temptation Sonata Arctica Blind Guardian Stratovarius Therion
Rammstein Nightwish Apocalyptica KoЯn Marilyn Manson Metallica
Belle and Sebastian The Magnetic Fields Neutral Milk Hotel Yo La Tengo Elliott Smith Camera Obscura
Iron Maiden Judas Priest Iced Earth Helloween Manowar Bruce Dickinson
Elliott Smith Iron & Wine The Decemberists Bright Eyes Sufjan Stevens Belle and Sebastian
Bright Eyes Rilo Kiley Death Cab for Cutie Desaparecidos Cursive The Good Life
Death Cab for Cutie The Postal Service Bright Eyes The Shins Rilo Kiley Cursive
Advertisements

2 Comments

Filed under Uncategorized

2 responses to “Applying the Ising model to another data set

  1. Matt

    Hi,

    I’ve been looking for a large dataset to test an algorithm I’ve developed for MRF’s. Do have anything with more details on how you delt with Auidoscrobbler or some sample code?

    Thanks,

    Matt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s