Monthly Archives: August 2009

The mysteries of twitterburg

The other day I paid a nice visit to Alex and Yan. We got around to talking about how bit.ly (a link shortener) can be used to track things on twitter. Anyhow, I’m sure they will blow you away with their analysis soon enough, but I thought I’d post some results from a really simple analysis.

The cool thing about bit.ly is that there’s an API that allows us to find out how many clickthrus there were on each link. This makes basic website analytics available to everyone and gives us the ability to start looking at what drives traffic. So we can try to figure out what motivates people to click on links posted on twitter: content, network, or something else?

Here’s what I did: I took the the last 3200 tweets by theonion and extracted all the bit.ly links therein (there were about 1200). I then got the number of clicks for each of the links as well as relevant metadata through the bit.ly API. There’s a tiny bit of noise there, but here’s what it looks like when I plot the number of clicks (as measured by bit.ly) versus the date when the link was tweeted:

theonion clickthrus by date

You can see how phenomenal theonion’s twitter account has taken off in the last year, eventually reaching this weird cyclical pattern, a valley of which we currently seem to be in. (I don’t really have a good explanation for why that pattern is occurring.) But what’s also phenomenal is how closely clicks tend to track with the mean. That is, there isn’t a whole lot of variance at any given time. I’d guess that there is a set of regular readers who click on pretty much everything that theonion posts. And while there is an ebb and tide of regular readers, it’s not like within some time slice there are a few articles which really take off (“go viral”) and a bunch which languish. This is totally strange to me; my intuition based on diggs is that there’d be a polya-urn rich-get-richer type of distribution for link clickthrus but there doesn’t appear to be.
This is also strange to me because followers of this account are basically treating it like an RSS feed of onion articles, which makes me wonder: why are they using twitter at all?

I broke down the data a few other ways to see if I could tease out other trends. I tried breaking it down by time of day. And as expected posting stops at night and beings to pick up again at noon GMT = 8 am Eastern. But there isn’t a huge amount of variation based on when the urls get tweeted: once it gets into people’s queue it seems that they’ll get around to it eventually.
theonion clickthrus broken down by time of day

Finally, I tried breaking it down by day of the week. Not much news to report here. There are fewer tweets on Saturday and Sunday (although it sort of picked up on those days during July). And there isn’t any significant difference in terms of number of clickthrus per link on any given day of the week.
theonion clickthrus broken down by date and day of the week

So there you have it. theonion has basically co-opted twitter as a news feed. And its readers faithfully read (or at least click on) the posted bit.ly links and any content or network effects seem to average out in the end.

Major thanks to Eytan for introducing me to the bit.ly API and lots of pro-tips on navigating/understanding the twitterverse.

Advertisements

3 Comments

Filed under Uncategorized

LDA for the masses (who use R)

Long time no post. I’ve been busy with lots of stuff: writing my thesis, renaming this blog to pleasescoopme.com, and other stuff which I’ll post soon enough. Another thing I’ve been working on is an R package that implements collapsed Gibbs samplers (written in C) for some of the models I’ve been using: latent Dirichlet allocation (LDA), the mixed-membership stochastic blockmodel (MMSB), and supervised LDA (sLDA). It’s still somewhat experimental but I’ve found it to be immensely useful already. Here some included demos to show off what you can already do out of the box (plots made with the fantastic ggplot2 package):

You too can make all these pretty pictures by downloading the package here. Then simply run ‘R CMD INSTALL lda_1.0.tar.gz’ to install the package and you’ll be ready to go! All of you out there who work with these models, or want to start working with these models, give it a shot and gimme any feedback you have. I hope to improve things and add more models in some upcoming releases.

7 Comments

Filed under Uncategorized