I’ve been on a bit of a R tear lately. Today you should see a new version of the R lda package. This version has lots of fixes including a working mmsb demo with the latest version of ggplot2, corrected RTM code, improved likelihood reporting, better documentation, and much more. Grab it from CRAN today! Special thanks to the following people for bug reports/feature requests (sorry if I forgot anyone):
- Edo Airoldi
- Jordan Boyd-Graber
- Khalid El-Arini
- Roger Levy
- Solomon Messing
- Joerg Reichardt
One of the new features is a method to make sLDA predictions on response variables conditioned on documents. In the demo accompanying the package, I fit an sLDA model to a corpus of political blogs tagged as being either liberal or conservative. With this fitted model, I can now use the new predict method to predict the political bent of each of the blogs within a continuous space. The density plot of these predictions is given below, broken down by the the original conservative/liberal label (color of shading).
I like how there’s some bimodality for each contingency — a moderate group and a more extreme group. The model also predicts a heavy tail of super-conservative blogs. There is a real notable bump down by -3. I dunno if this represents reality; it’s probably worthwhile to do more extensive model checking.