Word2Vec Server Python Library

Word2Vec models that have been pre-trained on large corpora are invaluable because they contain all of the semantic and contextual information in a lookup dictionary of only a few million words. They tend to perform well on synonym and analogy tests at around 300 dimensions, and can be applied to a number of machine learning applications. The challenge with these large models is that they take a long time to load into memory when your program starts and the lookup algorithms are intense to the point where you may not want to run them on your desktop computer. I’ve written a python library called word2vecserver that allows one to load a pre-trained model onto a server and use the client library to make requests for vector representations or analogy tests from another computer.

Word2VecServer GitHub Page

To use the library, download the pre-trained Google News file and load it into memory using Gensim.

I’ll add updates as I begin to use it in different contexts. Feel free to update as needed – if you make useful commits I’ll accept them!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s