May 8, 2010

A new home for Facebook data team publications

Brendan (who will be joining us this summer, natch!) pointed out yesterday that we don’t have a repository for the papers we’ve published here at Facebook. We moved fast. Now you can find them by clicking on the “Papers” tab of the Facebook Data page. Happy reading everyone!

May 7, 2010

Oh god, now there’s another video of me online

Recently I got to participate on a panel / give a talk as a part of the NAE Seattle Grand Challenges Summit. Let me thank Ed Lazowska for putting together such a great panel — Alon Halevy, Larry Smarr and Catharine van Ingen. I think I got a contact high just from being around such awesome researchers.

Anyhow, a video has surfaced of my talk. I would recommend against watching it, unless you want to see me nebbish my way through a five minute talk.

There’s also some more coverage here.

April 28, 2010

Slides from some recent talks

Recently I had the honor of being invited to give a couple of talks in the Boston area. One at NESCAI and one at NESS. I had a great time and the feedback from the audiences was awesome. A shout out to Jeff of search engine cafe is in order. I also want to especially thank David/Sameer and Edo for inviting me and for putting together such great programs!

I have uploaded the slides for these talks here.

I’m also going to be on a panel for NAE’s Grand Challenges Summit next Monday. If you’re going to be in the Seattle area, stop by and say hi!

March 31, 2010

Using jjplot to explore tipping behavior

In this post, I’ll show off some recent changes to jjplot that we think are really cool. To help motivate these changes, I’ll walk through them using the tips dataset included with the reshape package.

  • Improved faceting along multiple dimensions. This shows a scatter plot of how much males and females tip on each day of the week, along with a best-fit lines. The black, dashed line shows the best-fit across all data points. Points/lines are otherwise colored by day. I’ll leave it to you to guess why the slope is higher for men on Saturday, but lower on Sunday.

    jjplot(tip ~ (abline() : group(fit(), by = day: sex) +
    point(alpha = 0.5)) : color(day) +
    abline(lty = "dashed") : fit() + total_bill,
    data = tips,
    facet.y = day, facet.x = sex)

  • New stats/geoms such as area/density. Here we’ll make a density plot of the tip fraction, that is, the tip amount over the total bill. The black density shows the overall density, while each each overlaid density shows the density just for points in that panel.

    jjplot(~ area() : group(density(), by = day:sex) : color(day, alpha = 0.5) +
    area() : group(density(), by = day) +
    I(tip / total_bill),
    data = tips,
    facet.y = day, facet.x = sex,
    xlab = "tip fraction",
    ylab = "")

  • Custom geoms/stats. We want to make it easier for the community to augment the system. Right now, the syntax is still sort of opaque and we’re working on it, but you can already get a custom stat just be naming your function jjplot.stat.*. For example, below we define a new kmeans stat. We then cluster the points and draw a best-fit line for each cluster.

    jjplot.stat.kmeans <- function(state, K, use.y = FALSE) {
    if (use.y) {
    km <- kmeans(cbind(state$data$x, state$data$y), K)
    } else {
    km <- kmeans(state$data$x, K)
    }
    state$data$cluster <- factor(km$cluster)
    state
    }
    jjplot(tip ~ point() +
    abline() : group(fit(), cluster) : kmeans(3) +
    total_bill,
    data = tips)

  • Coloring on derived statistics. You may have noticed in the earlier examples that the color syntax has changed. We figured color should be kind of like sort — it’s a pseudo-statistic which can be inserted anywhere in a statistics stack. This means that it becomes easy to color based off of derived statistics. In this example, we make the previous plot much more useful by coloring the fits and points according to the assigned cluster.

    jjplot(tip ~ (point() +
    abline() : group(fit(), cluster)) : color(cluster) : kmeans(3) +
    total_bill,
    data = tips)

Let us know what you think! P.S. A release on CRAN is coming very soon…

March 22, 2010

ePluribus: Ethnicity on Social Networks

is the name of the paper I wrote with Lars, Itamar, and Cameron. It will appear at this year’s ICWSM. You may commence bating those breaths.