Text analysis for comparative politics

Political Analysis

Political Analysis chronicles the exciting developments in the field of political methodology, with contributions to empirical and methodological scholarship outside the diffuse borders of political science. It is published on behalf of The Society for Political Methodology. Political Analysis is ranked #5 out of 157 journals in Political Science by 5-year impact factor, according to the 2012 ISI Journal Citation Reports. Like Political Analysis on Facebook and follow @PolAnalysis on Twitter.

By R. Michael Alvarez, Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley
September 16^th 2015

Introduction

By R. Michael Alvarez

Text has long been an important, but difficult to use, source of data for social scientists. Back when I wrote my Ph.D. thesis, for example, I sat for weeks with abstracts from the New York Times — finding newspaper articles relating to past presidential campaigns, and content-analyzing those articles to determine whether they had substantive or “horse-race” information in them. Back in the day, most scholars analyzed text like this manually, and despite the fact that vast amounts of text were available for study, very little of that information became data that was amenable to sophisticated quantitative analysis.

How the world has changed! Because of vast improvements in computational capabilities (both in terms of data accessibility, storage, and analytic power), tools and methods for the automated analysis of text have proliferated. Some of the most innovative new tools and methods are being developed by social scientists, and in recent years we have seen many important papers on the analysis of text published in Political Analysis.

One of the important developments in this area is the Structural Topic Model. This methodology for analyzing text has recently seen rapid development, and I asked the authors of “Computer-Assisted Text Analysis for Comparative Politics” (Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley) to discuss in a bit more detail their paper and its contribution to the field. Their essay is below.

* * * * *

Text Analysis for Comparative Politics

By Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley

Every two days, humans produce more textual information than the combined output of humanity from the dawn of recorded history up through the year 2003. Much of this text is directly relevant to questions in political science. Governments, politicians, and average citizens regularly communicate their thoughts and opinions in writing, providing new data from which to understand the political world and suggesting new avenues of study in areas that were previously thought intractable. However, in order to access the value in this textual data we need methods to conduct a principled, systematic analysis.

Preparing textual data for analysis presents unique challenges, particularly for comparativists working with non-English text. Though statistical methods for text analysis are often language agnostic, tools for pre-processing the texts are not. We provide software packages to help users preprocess text in multiple languages and translate text, accompanied by an overview of the various steps necessary to prepare textual data for analysis. Because the Structural Topic Model (STM) allows users to incorporate document metadata into the analysis, investigators can treat the language in which the document was written as a variable and can model systematic differences in topical content across languages.

As a proof of concept, we examined thousands of social media posts in Arabic and Chinese in June 2013 about Edward Snowden. Our analysis reveals that Chinese posts about Snowden during this time period are more likely to address issues of hypocrisy in US foreign policy, suggesting that the United States violates the human rights of its citizens while simultaneously advocating for better human rights protection abroad. By contrast, Arabic posts were more likely to deal with the issue of asylum, addressing the question: where will Snowden go next?

Work on Structural Topic Models continues to evolve. A forthcoming book chapter “Navigating the Local Modes of Big Data: The Case of Topic Models” addresses model stability in the STM and provides support for the use of a deterministic initialization strategy based on spectral methods. A recent working paper “Matching Methods for High-Dimensional Data with Applications to Text” demonstrates how the STM machinery can be used to facilitate causal inference from observational data where the pre-treatment confounders are documents. Two new software packages on CRAN have been released; stmBrowser and stmCorrViz provide interactive visualizations of STM models. The core software package stm has also been updated to increase speed and introduce numerous new features described in the papers above. The papers, software, and vignette detailing how to get started are available at structuraltopicmodel.com.

These methods open the world of text analysis to scholars of international relations and comparative politics. The possibilities are many, and we demonstrate but a few.

Featured image: Folded newspapers. (c) Mitrija via iStock.

R. Michael Alvarez is a professor of political science at Caltech, and along with Jonathan N. Katz is the co-editor of Political Analysis. His research interests include methodology and statistics, electoral politics, voting behavior, and the technology of elections.

Christopher Lucas is a graduate student in the Department of Government at Harvard University where he researches the politics of cybersecurity and cyberterrorism; the political economy of immigration, trade, and finance; causal inference and experimental design; and the analysis of audio, video, and text as data.

Rich Nielsen is an Assistant Professor in the Department of Political Science at the Massachusetts Institute of Technology where he studies and teaches on Islam, political violence, human rights, economic development, and research design.

Margaret Roberts is an Assistant Professor in the Department of Political Science at the University of California, San Diego. Her research focuses on political communication, Chinese politics, and computational social science.

Brandon Stewart is an Assistant Professor in the Department of Sociology at Princeton University where he works on developing new statistical methods with a focus on applications in computational social science.

Alex Storer is a Research Consultant at the Stanford University Graduate School of Business. His interests include open data, automated extraction and organization of historical documents, and data visualization.

Dustin Tingley is Professor of Government in the Department of Government at Harvard University. Dustin’s research interests include international relations, statistical methodology, and experimental social science.

Posted In:
Journals
Politics

Political Analysis

Introduction

By R. Michael Alvarez

Text Analysis for Comparative Politics

By Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, and Dustin Tingley

Related posts:

Recent Comments