Daily Archives: March 31, 2010

Using jjplot to explore tipping behavior

In this post, I’ll show off some recent changes to jjplot that we think are really cool. To help motivate these changes, I’ll walk through them using the tips dataset included with the reshape package.

  • Improved faceting along multiple dimensions. This shows a scatter plot of how much males and females tip on each day of the week, along with a best-fit lines. The black, dashed line shows the best-fit across all data points. Points/lines are otherwise colored by day. I’ll leave it to you to guess why the slope is higher for men on Saturday, but lower on Sunday.

    jjplot(tip ~ (abline() : group(fit(), by = day: sex) +
    point(alpha = 0.5)) : color(day) +
    abline(lty = "dashed") : fit() + total_bill,
    data = tips,
    facet.y = day, facet.x = sex)

  • New stats/geoms such as area/density. Here we’ll make a density plot of the tip fraction, that is, the tip amount over the total bill. The black density shows the overall density, while each each overlaid density shows the density just for points in that panel.

    jjplot(~ area() : group(density(), by = day:sex) : color(day, alpha = 0.5) +
    area() : group(density(), by = day) +
    I(tip / total_bill),
    data = tips,
    facet.y = day, facet.x = sex,
    xlab = "tip fraction",
    ylab = "")

  • Custom geoms/stats. We want to make it easier for the community to augment the system. Right now, the syntax is still sort of opaque and we’re working on it, but you can already get a custom stat just be naming your function jjplot.stat.*. For example, below we define a new kmeans stat. We then cluster the points and draw a best-fit line for each cluster.

    jjplot.stat.kmeans <- function(state, K, use.y = FALSE) {
    if (use.y) {
    km <- kmeans(cbind(state$data$x, state$data$y), K)
    } else {
    km <- kmeans(state$data$x, K)
    state$data$cluster <- factor(km$cluster)
    jjplot(tip ~ point() +
    abline() : group(fit(), cluster) : kmeans(3) +
    data = tips)

  • Coloring on derived statistics. You may have noticed in the earlier examples that the color syntax has changed. We figured color should be kind of like sort — it’s a pseudo-statistic which can be inserted anywhere in a statistics stack. This means that it becomes easy to color based off of derived statistics. In this example, we make the previous plot much more useful by coloring the fits and points according to the assigned cluster.

    jjplot(tip ~ (point() +
    abline() : group(fit(), cluster)) : color(cluster) : kmeans(3) +
    data = tips)

Let us know what you think! P.S. A release on CRAN is coming very soon…



Filed under Uncategorized