Using jjplot to explore tipping behavior

In this post, I’ll show off some recent changes to jjplot that we think are really cool. To help motivate these changes, I’ll walk through them using the tips dataset included with the reshape package.

  • Improved faceting along multiple dimensions. This shows a scatter plot of how much males and females tip on each day of the week, along with a best-fit lines. The black, dashed line shows the best-fit across all data points. Points/lines are otherwise colored by day. I’ll leave it to you to guess why the slope is higher for men on Saturday, but lower on Sunday.

    jjplot(tip ~ (abline() : group(fit(), by = day: sex) +
    point(alpha = 0.5)) : color(day) +
    abline(lty = "dashed") : fit() + total_bill,
    data = tips,
    facet.y = day, facet.x = sex)

  • New stats/geoms such as area/density. Here we’ll make a density plot of the tip fraction, that is, the tip amount over the total bill. The black density shows the overall density, while each each overlaid density shows the density just for points in that panel.

    jjplot(~ area() : group(density(), by = day:sex) : color(day, alpha = 0.5) +
    area() : group(density(), by = day) +
    I(tip / total_bill),
    data = tips,
    facet.y = day, facet.x = sex,
    xlab = "tip fraction",
    ylab = "")

  • Custom geoms/stats. We want to make it easier for the community to augment the system. Right now, the syntax is still sort of opaque and we’re working on it, but you can already get a custom stat just be naming your function jjplot.stat.*. For example, below we define a new kmeans stat. We then cluster the points and draw a best-fit line for each cluster.

    jjplot.stat.kmeans <- function(state, K, use.y = FALSE) {
    if (use.y) {
    km <- kmeans(cbind(state$data$x, state$data$y), K)
    } else {
    km <- kmeans(state$data$x, K)
    }
    state$data$cluster <- factor(km$cluster)
    state
    }
    jjplot(tip ~ point() +
    abline() : group(fit(), cluster) : kmeans(3) +
    total_bill,
    data = tips)

  • Coloring on derived statistics. You may have noticed in the earlier examples that the color syntax has changed. We figured color should be kind of like sort — it’s a pseudo-statistic which can be inserted anywhere in a statistics stack. This means that it becomes easy to color based off of derived statistics. In this example, we make the previous plot much more useful by coloring the fits and points according to the assigned cluster.

    jjplot(tip ~ (point() +
    abline() : group(fit(), cluster)) : color(cluster) : kmeans(3) +
    total_bill,
    data = tips)

Let us know what you think! P.S. A release on CRAN is coming very soon…

14 Comments

Filed under Uncategorized

14 responses to “Using jjplot to explore tipping behavior

  1. Shane

    Jonathan,

    This is all very neat.

    Out of curiosity, are the performance benefits of jjplot derived from memoization, or are there other things at work here?

    Also, why is it called jjplot? Shouldn’t it be called either ggplot3 or ffplot? 🙂

    Shane

  2. It’s not ffplot or ggplot3 because it’s not a successor to ggplot2. I should also note that jjplot is more fun to say and it has nothing to do with Jon being a narcissist.

  3. Hehe, it probably has a lot to do with narcissism…

    As for the performance, I don’t think I understand ggplot2’s code enough to assess what it’s spending its time on. FWIW, here’s a quick Rprof summary for a 10000 point scatter plot:

    ggplot2 ( 1.643 s )
    self.time self.pct total.time total.pct
    “.Call.graphics” 0.22 15.5 0.26 18.3
    “deparse” 0.12 8.5 0.12 8.5
    “.Call” 0.08 5.6 0.10 7.0
    “cbind” 0.06 4.2 0.22 15.5
    “inherits” 0.06 4.2 0.18 12.7
    “[<-.data.frame" 0.06 4.2 0.08 5.6
    "llply" 0.04 2.8 0.50 35.2
    "list_to_array" 0.04 2.8 0.38 26.8
    "match.fun" 0.04 2.8 0.06 4.2
    "vector" 0.04 2.8 0.06 4.2

    jjplot ( 0.361 s )

    self.time self.pct total.time total.pct
    ".Call.graphics" 0.32 80 0.32 80
    "gc" 0.06 15 0.06 15
    "!" 0.02 5 0.02 5
    "system.time" 0.00 0 0.40 100
    ".subplot" 0.00 0 0.34 85
    "drawGrob" 0.00 0 0.34 85
    "grid.draw.grob" 0.00 0 0.34 85
    "grid.draw" 0.00 0 0.34 85
    "jjplot" 0.00 0 0.34 85
    "recordGraphics" 0.00 0 0.34 85

  4. Update?! I just ran across jjplot, made my way to the Rforge site, and saw that commits are still happening (less than 30 minutes ago, in fact!). Any word on future or immediate prospects for jjplot?

    “As for the performance, I don’t think I understand ggplot2′s code enough to assess what it’s spending its time on.”
    See those calls to llply and list_to_array? Those are plyr functions. I’ve noticed that ggplot2 can really bog on tasks that plyr would bog on, and this is why. Splitting something a thousand-and-one ways gives you a thousand-and-one function calls and a thousand-and-one objects that need to get glued back together. I love plyr’s semantics, but sometimes the backend is inconvenient…

    • Thanks for the tips on ggplot2. Hadley tells me that there have been some improvements in ggplot2’s performance so I should re-evaluate at some point.

      And yes, jjplot is still being developed. It’s pretty stable and I’ve been using it day to day without problem (the occasional commit is usually me trying to fix an immediate bug). I encourage you to give it a whirl!

  5. is jjplot still being developed??

    rgds
    A

  6. Where can I get documentation on using jjplot? The docs on googlecode are way out of date

    • I moved this to r-forge at some point (it is here: https://r-forge.r-project.org/R/?group_id=835). The docs haven’t been updated unfortunately but with any luck the demo should still work and serve as a good starting point for learning how to use it. I will add a TODO to add more docs!

      • Thanks, got it. I get an error during the demo though, and not sure what it means:

        > ## Heatmaps with tile.
        > ## Also shows off themes, and axis parameters
        > jjplot(Sepal.Width ~ point() +
        + tile(border = NA) : color(z) :
        + group(density2d(n = 32), by = Species) + (Petal.Width + 0.25),
        + data = iris,
        + theme = jjplot.theme(“bw”,
        + x.axis.type = “exact”,
        + y.axis.type = “exact”),
        + facet.x = Species)
        Error in jjplot.stat.density2d(n = 32, state = list(data = list(Sepal.Length = c(5.1, :
        could not find function “bandwidth.nrd”

        PS: Say hi to Cameron.

      • Ah, I guess you need to require(MASS) for that one to work.

      • Also clearly jjplot needs some love =/.

Leave a reply to Philip Tellis (@bluesmoon) Cancel reply