LING 575: NLP for Cultural Analytics

Winter 2023
University of Washington, Department of Linguistics

Canvas: Link

Description

Surveys tools, frameworks, and skills needed to apply natural language processing methods to applications in the humanities and social sciences, with a focus on the analysis of large digital text corpora, including social media, literature, and historical documents. Topics will include data collection, text processing and machine learning techniques, data visualization, and ethical considerations.

Readings

Week 1: Introduction

How to (seriously) read a scientific paper

Reading Machines: Toward an Algorithmic Criticism (available on Canvas)
Stephen Ramsay (2011)

Seven ways humanists are using computers to understand text
Ted Underwood (2015)

Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus
Grant Storey and David Mimno (2020)

Week 2: Sentiment & Affect

Emotion and Reason in Political Language
Gloria Gennaro and Elliott Ash (2021)

Quantifying Intimacy in Language
Jiaxin Pei and David Jurgens (2020)

An Analysis of Emotions and the Prominence of Positivity in #BlackLivesMatter Tweets
Anjalie Field, Chan Young Park, Antonio Theophilo, and Yulia Tsvetkov (2022)

Week 3: Patterns Over Time

Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers
Sandeep Soni, Lauren F. Klein, and Jacob Eisenstein (2021)

Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts
Chenhao Tan, Dallas Card, and Noah A. Smith (2017)

Biodiversity is not declining in fiction
Andrew Piper (2022)

Week 4: Social Biases

A Framework for the Computational Linguistic Analysis of Dehumanization
Julia Mendelsohn, Yulia Tsvetkov, and Dan Jurafsky (2020)

Social Bias Frames: Reasoning about Social and Power Implications of Language
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi (2020)

Race, Writing, and Computation: Racial Difference and the US Novel, 1880-2000
Richard Jean So, Hoyt Long, and Yuancheng Zhu (2019)

Week 5: Online Communities

The Goodreads “Classics”: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism
Melanie Walsh and Maria Antoniak (2021)

Explain like I am a Scientist: The Linguistic Barriers of Entry to r/science
Tal August, Dallas Card, Gary Hsieh, Noah A. Smith, Katharina Reinecke (2020)

Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities
Diyi Yang, Robert E. Kraut, Tenbroeck Smith, Elijah Mayfield, Dan Jurafsky (2019)

Week 6: Data Ethics

Data Is the New What? Popular Metaphors & Professional Ethics in Emerging Data Culture
Luke Stark, Anna Lauren Hoffmann (2019)

Ethical and Privacy Considerations for Research Using Online Fandom Data
Brianna Dym and Casey Fiesler (2020)

Semantics derived automatically from language corpora contain human-like biases
Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (2017)

Week 7: Politics & Public Opinion

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
Justin Grimmer and Brandon Stewart (2013)

(Mis)alignment Between Stance Expressed in Social Media Data and Public Opinion Surveys
Kenneth Joseph, Sarah Shugars, Ryan J. Gallagher, Jon Green, Alexi Quintana Math’e, Zijian An, David Lazer (2021)

Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media
Chan Young Park, Julia Mendelsohn, Anjalie Field, Yulia Tsvetkov (2022)

Week 8: Narrative

Modeling Reportable Events as Turning Points in Narrative
Jessica Ouyang and Kathleen McKeown (2015)

“Let Your Characters Tell Their Story”: A Dataset for Character-Centric Narrative Understanding
Faeze Brahman, Meng Huang, Oyvind Tafjord, Chao Zhao, Mrinmaya Sachan, and Snigdha Chaturvedi (2021)

Narrative Paths and Negotiation of Power in Birth Stories
Maria Antoniak, David Mimno, and Karen Levy (2019)

Week 9: Reproducibility & Evaluation

Evaluating the Stability of Embedding-based Word Similarities
Maria Antoniak and David Mimno (2018)

Datasheets for Datasets
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, Kate Crawford (2018)

Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict
Burt L. Monroe, Michael P. Colaresi, and Kevin M. Quinn (2009)

Week 10

TBD