Oxford University Press's
Academic Insights for the Thinking World

Using web search data to study elections: Q&A with Alex Street

Social scientists made important contributions towards improving the conduct and administration of elections. A paper recently published in Political Analysis continues that tradition, and introduces the use of web search data to the study of public administration and public policy.

This paper, written by Alex Street, Thomas A. Murray, John Blitzer, and Rajan S. Patel, is titled “Estimating Voter Registration Deadline Effects with Web Search Data”, and it was recently published as an open access paper in Political Analysis. The authors use web searches for “voter registration” just before, and just after registration deadlines for the 2012 US presidential elections, to gauge questions about, and interest in, voter registration. As the authors note in their paper’s abstract, “Combining web search data with evidence on the timing of registration for 80 million Americans, we model the relationship between search and registration. Extrapolating this relationship to the post-deadline period, we estimate that an additional 3-4 million Americans would have registered in time to vote, if deadlines had been extended to Election Day.”

This paper is important for two reasons.

First, on the particular question of pre-election registration deadlines and voter participation, they introduce and test an innovative new way to model the potential effects of pre-election registration deadlines on voter turnout. This has long been an important question for social scientists, ranging from the 1980 book, Who Votes?, by Raymond E. Wolfinger and Steven J. Rosenstone to the recent 2013 book by Jan E. Leighley and Jonathan Nagler, Who Votes Now? work in this field has typically relied upon post-election voter surveys, and some recent studies published in Political Analysis have begun to ask methodological questions about this long-used approach for studying the turnout implications of pre-election voter registration deadlines (Glynn and Quinn 2011, Keele and Minozzi 2013).

Second, this paper demonstrates the importance of web search data as a tool for social scientific research. There have been only a few studies so far that have used web search data, and this paper establishes the rationale for using this data to study important social science problems as well as validating their use. I anticipate that we will see an increasing use of web search data in the near future, as other researchers realize the potential uses for this new type of data. Recently I sent a series of questions about this paper to the lead author, Alex Street. Below are the questions, and his answers.

Keyboard by Life-Of-Pix. CC0 via Pixabay.
“Keyboard” by Life-Of-Pix. CC0 via Pixabay.

One of the central contributions of your paper is your use of data from Google searches. How does the search data that you used differ from the publicly available Google Trends data?

The Trends site allows anyone to study how the frequency of web queries such as “register to vote” varies across time and space. The data reach back to 2004 and are available for many countries and, increasingly, for smaller areas such as states or cities. The actual number of queries on a given topic is sensitive, since it can reveal how Google algorithms work, which might allow people to manipulate search results. So the data made available on the Trends site are normalized, relative to the total number of queries in a given location and time period. The Trends site focuses on popular searches. In our case, “register to vote” was not always among the most popular searches, especially in less populous states in the period several months before the election. In order to include the small states, we used internal Google data rather than the Trends data for this paper, though the numbers we used are very similar to what is publicly available.

What are the primary benefits of using Google search data, what can you learn from these data that you cannot learn otherwise? And what are the weakness of these data?

One huge advantage is that the Google data are user-generated. Social scientists often rely on surveys. But surveys are expensive and thus tend to be quite limited in size, which is a problem for comparisons across time and space. Perhaps even more important, when researchers decide the questions to ask, we risk imposing our own agenda, rather than following the interests of the public. Web search data allow us to observe people following their own interests. For example, one long-standing worry about democracy is that most people don’t know much about politics. Surveys have allowed us to confirm that. But there are also plenty of people who do manage to learn about politics. As scholars we still have a lot to learn about how people go about acquiring political information, on the (relatively rare) occasions when they do so. Usergenerated web search data provide new opportunities for us to study what people are interested in, and how events, laws or contextual factors shape their interests.

One big limitation for researchers is that a private company holds the original data. Google is cautious about what they make publicly available, because they worry about manipulation of search results or violating the privacy of their users. We were lucky to be able to access internal Google data for this paper, so that we could cover all 50 states. The Google authors summarized data from their archives, and made the aggregated data available to the other authors. The aggregated data were the basis of our analysis, and we also posted those data with the journal so that others can check our results. But not everyone has this kind of opportunity. This issue is much wider than our paper. As Gary King has pointed out, one of the open questions about “big data” in the social sciences is whether researchers can set up a framework for working with the governments and companies that hold almost all of the original data. For that to happen, we will have to allay concerns over privacy and commercial interests.

What were the origins of this research project? How did you and your co-authors get the idea to use web search data to study election administration?

This paper is the result of brotherly collaboration. Three of the authors are related by marriage. We were inspired by the work of the fourth author, who helped set up a system for using Google searches to provide real-time measures of the spread of flu epidemics. We wanted to learn more about how Google searches are related to other behavior, such as voting. Eventually we hit on the idea of not just correlating web searches with voter registration numbers, but also using post-deadline web searches to make counterfactual predictions about the effects of early voter registration deadlines.

What are other applications for the Google search and Google Trends data and the methodology you use in your paper, especially in political science and public policy, other than to election administration?

I expect scholars to exploit variation across time and space. The sheer volume of the underlying search data means that we can study daily changes in public interests across regions, states or cities. For instance, Cindy Kam and colleagues have used Google Trends data to assess the importance of perceived candidate viability, by tracking how web search volume changes as candidates rise or fall in the polls. Another opportunity is to measure how information-seeking changes, across administrative borders or deadlines. If we see a sudden change at the border or after the deadline, that can help us understand how policies affect the information environment. Even broader, we may be able to study big social changes in unprecedented detail. For example, we could study migration on a weekly or daily scale by tracking the spread of new words or languages across countries, states or cities. Social scientists are still getting used to the availability of this kind of data, but I am excited to see where our imagination takes us.

Image Credit: “Google” by Simon. CC0 via Pixabay.

Recent Comments

There are currently no comments.