Ben Schmidt

Rejecting the gender binary: a vector-space operation Oct 29 2015

My last post provided a general introduction to the new word embedding of language (WEMs), and introduced an R package for easily performing basic operations on them. It was geared mostly towards people in the Digital Humanities community. This post looks more closely at a single word2vec model I’ve trained, on about 14 million reviews of faculty members from ratemyprofessors.com,

To be precise: it is a 500-dimensional skip-gram model with window of about 12 on lowercased, punctuation-free text using the original word2vec C code. I’ve then heavily culled the vocabulary to remove words that usually appear uppercased, on the assumption that they are proper nouns.

The point of this one is to provide a more concrete exploration of how these models can help us think about gendered language. I hope it will be interesting even to people who aren’t interesting in training a machine learning model themselves; there’s code in here, but it’s freely skippable.

Posts with tag RateMyProfessor