Summer Mentorship – Topic Modeling Secretary of Education News

This summer I had the opportunity to work with sociology undergraduate student Emma Kerr as part of her summer research internship with the UCSB IGERT program. Emma proposed a project investigating whether or not news coverage of Betsy DeVos was more focused on her personal life or her policy initiatives relative to other SoE. The summer program is designed to introduce big data and network science to students with interdisciplinary backgrounds. Emma had taken a computational sociology class at UCSB with John Mohr working on Twitter analysis and really enjoyed it, so I thought she would be a good fit for the program.

Using a python scraper written by another intern, Emma grabbed all of the NYT news articles with the search terms “Betsy DeVos”, “Margaret Spellings”, and “Arne Duncan” and placed them into text files. The research question was based on quantity of content in news articles, so I thought topic modeling might be a good way to start. I wrote a couple python scripts to do Latent Dirchlet Allocation and Nonzero Matrix Factorization and gave them to Emma to run on the data. These scripts output a spreadsheet with two tables: one for the topic content distributions and the other for document topic content. Using these tools, Emma was able to extract and quantify a) the types of topics that news articles covered about Betsy DeVos, b) the quantity of topics about personal vs policy issues, and c) the comparison of these quantities with previous Secretaries of Education. Emma ran the topic modeling algorithms on the datasets with n=40, then proceeded to summarize and label each topic and also classify them as either ‘personal’ or ‘policy’. The research questions were a) what topics were discussed about Betsy DeVos, b) was there more coverage of DeVos’ personal attributes or policy initiatives, and c) how does this compare with previous SoE Margaret SPellings and Arne Duncan? Emma processed the result of NMF topic modeling using excel, and was able to run some statistical analysis that could be visualized and used to answer the research questions quantitatively.

I had a great time working with Emma – it was nice to learn about how she was thinking about the results of text analysis tools from a sociological perspective. It took some back and forth before we were able to decide on research questions and methods that were a good match. It is commonly accepted in the social sciences that the questions should drive the methods, but I would argue that this is neither realistic nor is it the only way to provide rigorous results. The methods we have access to are limited both in terms of expertise we’ve developed as researchers and in terms of the accessibility of tools, particularly in computational analysis. All researchers must either remain within a small range of research questions or continually adjust research questions so that the methods and results can answer the questions. This mentorship relationship was an exciting opportunity for me to go through this dialectic process with Emma. Ultimately I think we were able to come up with a good match that helped us answer important questions about news coverage of Betsy DeVos.

The result of her hard work was the poster – Kerr, Emma- IGERT Final Poster! If you have any questions, email Emma directly at emmakerrcares[at]

(poster provided by Emma Kerr with permission)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s