EasyText Python Package

I just published a Python package called “EasyText” that was funded as part of a UCSB undergraduate instructional development grant with John W. Mohr. The project came about as a follow-up of Dr. Mohr’s Introduction to Computational Sociology course I helped with in Spring 2016 (more about that). This project was created with the goal of bringing a broad range of text analysis tools into a single interface, particularly one that can be run from the command line or using a minimal amount of Python code. Try it out using pip: “pip install easytext” (see PyPi page).

The command line interface is particularly focused on generating spreadsheets that students can then view and manipulate in a spreadsheet program like Excel or LibreOffice. Students can perform interpretive analysis by going between EasyText output spreadsheets and the original texts, or feed the output into a quantitative analysis program like R or Stata. The program supports features for simple word counting, noun phrase detection, Named Entity Recognition, noun-verb pair detection, entity-verb detection, prepositional phrase extraction, basic sentiment analysis, topic modeling, and the GloVe word embedding algorithm.

Continue reading “EasyText Python Package”

Advertisements

My DocTable Python Package

I recently created my first Python library, called DocTable, a library which I’ve been using for most of my text analysis projects. I’ve uploaded it to PyPi, so anyone can install it using the command “pip install doctable”. You can think of DocTable as a class-based interface for working with tables of data.

Using DocTable, the general flow of my text analysis projects involves (1) parsing and formatting raw data and metadata into a DocTable, (2) preprocessing the text into token lists at the document or sentence level, (3) storing the processed Python objects back into the DocTable, and (3) querying the DocTable entries for ingestion into word embedding, topic modeling, sentiment analysis, or some other algorithm. This process makes storing large-ish datasets fairly easy, combining the ease of working with Pandas DataFrames with the advanced query and storage capabilities of sqlite.

Continue reading “My DocTable Python Package”