Topic Modeling and Historic Texts

Indexes (along with introductions) are the very best parts of books when doing research. However, primary sources, especially handwritten one, do not typically carry such easily searchable components that labels the book for the reader or researcher. This often makes the work harder while simultaneously forcing one to focus on the text as a whole.

A prevalent theme throughout the texts for this week was a question of finding a middle ground between rigid, generalizing tools such as topic modeling and the extremely parsed textual work that some scholars engage in. I think, and the authors seemed to hint, that this middle ground is reachable by not assuming conclusions based solely on results produced from digitally searching the text. Together, the weaknesses of the researcher and the tool cancel each other out while the strengths combine.

Take for instance the need to label the groupings of words brought up by topic modeling. This is mentioned by both Blevins and Witmore. The program does not care about meaning, only relationships, as Rhody argued, and so the researcher must provide meaning to the categories of words. Blevins lists some of the topic categories that emerged from his search. Some, such as gardening and midwifery, were immediately apparent. However, there was also two sets that both seemed to mean housework. The differences hear may simply be the choice of words made by Martha Ballard on a particular day, or how the program is internally constructed, or perhaps it could point to a reacher understanding of colonial ideas of work, the home, and the organization of chores. This is where the researcher becomes vital. Without the meaning that person can provide, such digital tools become meaningless in any real way.

Furthermore, on a practical level, topic modeling and other digital tools can catch things that researchers may have missed or found insignificant at the time. It can make finding specific types of events easy in a document as large as Martha Ballard’s diary that spans over 27 years. That being said, can it miss the more nuanced and critical elements of research. Is the handwriting messy or neat? Does it change? Is one particular page stained? Textability makes for easy and massive addressability but knowing what to address is the expertise of the researcher.

Student in the World History M.A. program with interests in the intersections of imperialism and gender in the Atlantic World.

