Last month I did a workshop on text analysis in Python for a computational text analysis group we started in Duke Sociology (see workshop GitHub page), which was a follow-up to a workshop I did for the UCSB Broom Center for Demography last spring. The tutorial covers some basics of parsing text in SpaCy and using matrices to manipulate document representations for analyses.
I had two thoughts while creating this workshop: (1) around 60% of most text analysis projects are the same. The key is to come up with a system and design pattern that works for you. (2) There aren’t that many new tools for text analysis in the social sciences. Most algorithms we’ve picked up are simply more efficient or fancier (read: Bayesian) versions of very old algorithms. Now I’ll elaborate.