Word2Vec Server Python Library

Word2Vec models that have been pre-trained on large corpora are invaluable because they contain all of the semantic and contextual information in a lookup dictionary of only a few million words. They tend to perform well on synonym and analogy tests at around 300 dimensions, and can be applied to a number of machine learning applications. The challenge with these large models is that they take a long time to load into memory when your program starts and the lookup algorithms are intense to the point where you may not want to run them on your desktop computer. I’ve written a python library called word2vecserver that allows one to load a pre-trained model onto a server and use the client library to make requests for vector representations or analogy tests from another computer.

Word2VecServer GitHub Page

To use the library, download the pre-trained Google News file and load it into memory using Gensim.

I’ll add updates as I begin to use it in different contexts. Feel free to update as needed – if you make useful commits I’ll accept them!



Developing Your Research Process

Many articles have been posted about this topic, I just wanted a quick reference for my peers who are interested. I provide links to the UCSB resources, but the advice can generalize to access at other universities.

Motivation: The ability to develop good research questions and understand bodies of previous work can be greatly aided by having a good workflow process. Once you become familiar with all the tools the process for developing proposals or writing papers will be much less stressful, and you can devote mental energy to thinking about your actual research.

Continue reading “Developing Your Research Process”