How to read 16,700 journal articles: studying German Studies with topic models
31 March 2012
Topic models
(NB: This is a partial selection. I’ve tried to focus on material of interest to those working in the human and social sciences)
- Blei, David. Introduction to probabilistic topic models
- Mimno, David. “Computational Historiography in a Century of Classics Journals”
- Hall, et al. “Studying the History of Ideas Using Topic Models.” (Hall, Jurafsky, and Manning 2008)
- Hall, David. “Studying the History of Ideas Using Topic Models.” MS Thesis. (Hall 2008)
- Chang et al. (2009) “Reading Tea Leaves: How Humans Interpret Topic Models”
- Wallach, Hanna, et al. (2009) "Rethinking LDA: Why Priors Matter."
- Wallach, Hanna (2008) "Structured Topic Models for Language."
- Political Science: Justin Grimmer, Arthur Spirling’s US Treaties paper
- Block and Newman (2011) “What, Where, When, and Sometimes Why: Data Mining Two Decades of Women’s History Abstracts”
Software implementing LDA
- topicmodels (R package)
- MALLET (Java software)
Text Analysis
- Unix for Poets
- Lecture 1 and Lecture 2 from Cosma Shalizi’s data mining course.
- Introduction to Information Retrieval (Manning and Schütze) (open-access)
Probability
- Hacking, Ian. An introduction to probability and inductive logic
- Grinstead and Snell. Introduction to probability (open-access)
LDA
- Edwin Chen’s Introduction to Latent Dirichlet Allocation
- Gregor Heinrich (2004) “Parameter estimation for text analysis.”
- Griffiths (2004) “Finding scientific topics.”
Bayesian Statistics
- Peter Hoff (2009) A First Course in Bayesian Statistical Methods
- Kadane. Principles of Uncertainty (open-access)
Machine Learning
References
Block, Sharon, and David Newman. 2011. “What, Where, When, and Sometimes Why: Data Mining Two Decades of Women’s History Abstracts.” Journal of Women’s History 23: 81–109. http://muse.jhu.edu/journals/journal_of_womens_history/v023/23.1.block.html.
Chang, J., J. Boyd-Graber, S. Gerrish, C. Wang, and D. M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” http://umiacs.umd.edu/%7Ejbg/docs/nips2009-rtl.pdf.
Griffiths, T. L. 2004. “Finding scientific topics.” Proceedings of the National Academy of Sciences 101 (jan): 5228–5235. doi:10.1073/pnas.0307752101. http://www.pnas.org/content/101/suppl.1/5228.abstract.
Hall, David. 2008. “Studying the History of Ideas Using Topic Models.” Stanford University. http://symsys.stanford.edu/theses/thesis1.pdf.
Hall, David, Daniel Jurafsky, and Christopher D. Manning. 2008. “Studying the History of Ideas Using Topic Models.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 363–371. Honolulu, Hawaii: Association for Computational Linguistics.
Heinrich, Gregor. 2004. “Parameter estimation for text analysis.” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.1327.
Hoff, Peter D. 2009. A First Course in Bayesian Statistical Methods. Springer.