Last month I did a workshop on text analysis in Python for a computational text analysis group we started in Duke Sociology (see workshop GitHub page), which was a follow-up to a workshop I did for the UCSB Broom Center for Demography last spring. The tutorial covers some basics of parsing text in SpaCy and using matrices to manipulate document representations for analyses.
I had two thoughts while creating this workshop: (1) around 60% of most text analysis projects are the same. The key is to come up with a system and design pattern that works for you. (2) There aren’t that many new tools for text analysis in the social sciences. Most algorithms we’ve picked up are simply more efficient or fancier (read: Bayesian) versions of very old algorithms. Now I’ll elaborate.
Continue reading “Intro to Text Analysis”
I created a public GitHub repo to share a cleaned version of the US National Security Strategy documents in plain text. Each presidential administration since 1987 is required to produce at least one document per term, so you can easily compare the documents by administration or party. By adding it to a public repo, I’m hoping to make it easier to use for text analysis demos. Use the download_nss() function in the example script to download and read all or some of the NSS documents into python.
The choice of NSS documents was motivated by one of my all-time favorite articles co-authored by my former advisor John Mohr, Robin Wagner-Pacifici, and Ronald Breiger. In addition to the documents analyzed in that piece, I also copy/pasted the Trump 2017 NSS document to make the new document.
Mohr, J. W., Wagner-Pacifici, R., & Breiger, R. L. (2015). Toward a computational hermeneutics. Big Data & Society, (July–December), 1–8. (link)