Semantic Text Analysis with Word Embeddings
Lisa Baer University of Guelph
Word embeddings are a relatively new method that has been embraced by digital humanists as a means of understanding language in context. Word embedding models (WEM) use vector space modeling to create multi-dimensional “physical” spaces that house the text inside of a collected corpus, making it possible to measure the distance between words inside of said corpus. This distance is then analyzed in order to characterize the deeper meanings of particular words based on the semantic features of its closest neighbours. Unlike topic modeling, which uses vector spaces to classify and categorize documents, word embeddings use vector spaces to focus on the meanings of the words inside of a text corpus. This course provides participants with a hands-on introduction to word embeddings, utilizing Python programming and the NLTK and CLTK Python packages for text processing, as well as the gensim Python package for matrix construction. Participants are encouraged to bring their own texts for analysis.
Anyone interested in using semantic text analysis to help them understand and contextualize the way a particular word or term was understood and used at any given time in recorded human history. While no prior programming experience is necessary, some familiarity with Python might prove useful in the text preprocessing stages.