Fieldwork in Colombia

I spent Summer of 2017 with my colleague Marcelle Cohen living in and studying the conflict and peace process in Colombia. Our objective was to explore how political discourse as cultural practice creates entrenched ideologies and contentious politics there, and how those discourses relate to other populist movements happening around the world. From a methodological perspective, I’m interested to see how we can use interview data in tandem with computational text analysis and quantitative network methods. We performed interviews with politicians and diplomats, attended political rallies in Bogota and more rural communities, and made connections with some local peace organizations and universities. Our interviews will allow us to give agency to the political elite and understand discourse at a point of production as it is embedded in a political institution. Ultimately I had a great experience that allowed me to test the lenses of cultural and political theory, learn about qualitative methods, and dive deeper into the political culture in Colombia.colombia_overview1

I took this photo after the last disarmament event at a FARC camp in rural Colombia. The after-event scne felt like a foreshadowing of post-accords politics.

This article is more about my meta-impressions – see the academic presentation Political Culture in Colombia for some depth.

Continue reading “Fieldwork in Colombia”

Advertisements

Summer Mentorship – Topic Modeling Secretary of Education News

This summer I had the opportunity to work with sociology undergraduate student Emma Kerr as part of her summer research internship with the UCSB IGERT program. Emma proposed a project investigating whether or not news coverage of Betsy DeVos was more focused on her personal life or her policy initiatives relative to other SoE. The summer program is designed to introduce big data and network science to students with interdisciplinary backgrounds. Emma had taken a computational sociology class at UCSB with John Mohr working on Twitter analysis and really enjoyed it, so I thought she would be a good fit for the program.

Continue reading “Summer Mentorship – Topic Modeling Secretary of Education News”

Word2Vec Server Python Library

Word2Vec models that have been pre-trained on large corpora are invaluable because they contain all of the semantic and contextual information in a lookup dictionary of only a few million words. They tend to perform well on synonym and analogy tests at around 300 dimensions, and can be applied to a number of machine learning applications. The challenge with these large models is that they take a long time to load into memory when your program starts and the lookup algorithms are intense to the point where you may not want to run them on your desktop computer. I’ve written a python library called word2vecserver that allows one to load a pre-trained model onto a server and use the client library to make requests for vector representations or analogy tests from another computer.

Word2VecServer GitHub Page

To use the library, download the pre-trained Google News file and load it into memory using Gensim.

I’ll add updates as I begin to use it in different contexts. Feel free to update as needed – if you make useful commits I’ll accept them!

 

Comparative Semantic Analysis: Operational Strategies

Intuitive demonstrations of Word2Vec like synonym generation and analogy tests provide compelling evidence that semantic representations are not only possible but also meaningful. While they may hold many opportunities for machine learning on text data, little work has gone into exploring texts on a smaller scale. If one is interested in comparing how texts use concepts in different contexts, small and contextually sparse corpuses may be sufficient. In this work I propose several methodological advancements for comparative semantic analysis and look at some of the biggest challenges that have yet to be addressed.

Continue reading “Comparative Semantic Analysis: Operational Strategies”

Word2Vec for Comparative Semantic Spaces

I’ve recently become interested in Word2Vec as a way to represent semantic relationships between words in a corpus. In particular, I’m interested in making comparisons between corpuses: how do different texts organize concepts differently? Here I attempt to sketch a theoretical basis for word2vec drawing from early structural linguistics and sociology. Then I examine some basic results from training a word2vec model on the Gutenberg texts built into the nltk python library. Might this approach have utility for understanding how authors organize different concepts in a text?

Continue reading “Word2Vec for Comparative Semantic Spaces”