Semantic Text Analysis

Instructor: Mark Graves

Classroom: McLaughlin Library Room 120A

Semantic text analysis tools facilitate the extraction of meaning from texts. Although valuable to examine texts for keywords or word frequencies, one can extract more meaningful information by creating a mathematical or computational model of the text semantics and then examine the model for insights into the text’s meaning. The course will introduce semantic analysis in the context of text analysis, discuss methods for preparing and analyzing texts, explain algorithms for semantic analysis, and demonstrate both simple programs and best practice software with hands-on exercises. Students will work with example historical texts to learn to use the semantic text analysis software in python, and well-prepared students may bring their own texts for analysis.

The workshop is oriented toward those who wish to understand texts using digital methods, can use basic web tools (like Google ngram), and are interested in extending those techniques using simple programs to enable focused scholarship. I will introduce and explain how to use the python programming packages nltk and gensim to write simple programs, of at most a few lines of code, which can perform analyses not readily accessible in existing user-friendly tools. Software developers have written hundreds, if not thousands, of freely available text processing packages, but rarely have an incentive to make them easily usable by non-programmers. The workshop helps close the gap between the basic programming skills one learns in an introductory course (or tutorial) and the specific skills needed to use some else’s program for a digital humanities project.

The Natural Language ToolKit (NLTK) packages over a hundred corpora and lexical resources with dozens of tools for processing text, and gensim packages several sophisticated algorithms for semantic analysis. The workshop will focus on semantic analysis techniques and latent semantic analysis (LSA) in particular, which can be used to compute semantic similarity between texts, has been shown to have psychological plausibility, and is a precursor to more sophisticated techniques. Lectures will provide an overview of material; conceptual, philosophical, and mathematical foundations of the algorithms; and best practices for analysis. Demonstrations will show simple python programs performing text processing and analysis as well as interfaces to semantic analysis tools. Students will have the opportunity to perform simple exercises within a software environment installed on their computer to facilitate learning of new concepts, analyze texts using semantic analysis to learn that software and how to use it, and explore software outputs to initiate discussions about how to interpret analysis results.

Intended Audience:

As a prerequisite, students should have basic programming experience equivalent to what is taught in an introductory programming course or covered in online tutorials (such as https://docs.python.org/3/tutorial/, sections 1-7, or the Basic modules of https://www.learnpython.org/) and should plan to bring a laptop for the hands on exercises. For extended exercises, students may analyze one of several provided texts or may optionally arrange with the instructor before the workshop to prepare and use a particular text of their choice. At the end of the workshop, students will not only be able to use the covered tools to analyze additional texts of their choosing, they will have the foundation to begin learning to use hundreds of additional text analysis programs for their own scholarship.

Semantic Text Analysis

Instructor: Mark Graves

Departments and Schools

Centres, Institutes and Labs