Your favorite package for running topic models in R has been updated! This one not only has bugfixes and more utility functions, it also has two new models:
- The Networks Uncovered by Bayesian Inference (NUBBI) model which discovers connections between entities in free text (run
demo(nubbi)
, note that because of licensing reasons, I could not include the data for this demo in the package);
- the Relational Topic Model (RTM) for discovering patterns which account for both document content and connections between documents (run
demo(rtm)
).
And because it’s on CRAN, everyone (including windows users) can install by simply executing install.packages("lda")
. Please install, play with it, and let me know if you find any bugs.
Where can I get the NYT data set so that I can test the demo? Thanks
You need to acquire it from the LDC.
Hi Jonathan,
Thanks for publishing this, looks very cool. I have a question about the RTM implementation — it doesn’t seem like training returns any value for the regression weights beta. In the demo, you just specify the scalar 3 in this slot for (predictive.link.probability). Does the code not support learning this parameter? Is it important?
Thanks!
The parameter isn’t learned at the moment (because you can’t collapse it AFAIK so you have to resort to EM). I’m actually going to upload an updated version of the package soon and I might include this feature. Stay tuned.
Hey, I’ve been really curious to mess around with Nubbi for ages, and wanted to check with you about two things. (And sorry for attaching this to such an old blog post…)
I’ve got the LDC NYT corpus, but it doesn’t seem to naturally be formatted for use with Nubbi (e.g. the data doesn’t have pair contexts picked out, is all XML, etc.) Do you have a guide/rference for getting it prepped?
Thanks for sharing such a valuable —. I need to make a bubble chart like the first chart in R . Could you let me know what package of R I have to use? thanks.
Plots were made using ggplot2. The documentation for that package is quite good and has lots of examples:
http://docs.ggplot2.org/current/