Introduction from Michael Alvarez, co-editor of Political Analysis:
Questions about data access, research transparency and study replication have recently become heated in the social sciences. Professional societies and research journals have been scrambling to respond; for example, the American Political Science Association established the Data Access and Research Transparency committee to study these issues and to issue guidelines and recommendations for political science. At Political Analysis, the journal that I co-edit with Jonathan N. Katz, we require that all of the papers we publish provide replication data, typically before we send the paper to production. These replication materials get archived at the journal’s Dataverse, which provides permanent and easy access to these materials. Currently we have over 200 sets of replication materials archived there (more arriving weekly), and our Dataverse has seen more than 13,000 downloads of replication materials.
Due to the interest in replication, data access, and research transparency in political science and other social sciences, I’ve asked a number of methodologists who have been front-and-center in political science with respect to these issues to provide their thoughts and comments about what we do in political science, how well it has worked so far, and what the future might hold for replication, data access, and research transparency. I’ll also be writing more about what we have done at Political Analysis.
The first of these discussions are reflections from Nathaniel Beck, Professor of Politics at NYU, who is primarily interested in political methodology as applied to comparative politics and international relations. Neal is a former editor of Political Analysis, chairs our journal’s Advisory Board, and is now heading up the Society for Political Methodology’s own committee on data access and research transparency. Neal’s reflections provide some interesting perspectives on the importance of replication for his research and teaching efforts, and shed some light more generally on what professional societies and journals might consider for their policies on these issues.
Research replication in social science: reflections from Nathaniel Beck
Replication and data access has become a hot topic throughout the sciences. As a former editor of Political Analysis and the chair of the Society for Political Methodology‘s Data Access and Research Transparency (DA-RT) committee, I have been thinking about these issues a lot lately. But here I simply want to share a few recent experiences (two happy, one at this moment less so) which have helped shape my thinking on some of these issues. I note that in none of these cases was I concerned that the authors had done anything wrong, though of course I was concerned about the sensitivity of results to key assumptions.
The first happy experience relates to an interesting paper on the impact of having an Islamic mayor on educational outcomes in Turkey by Meyerson published recently in Econometrica. I first heard about the piece from some students, who wanted my opinion on the methodology. Since I am teaching a new (for me) course on causality, I wanted to dive more deeply into the regression discontinuity design (RDD) as used in this article. Coincidentally, a new method for doing RDD was presented at the recent (2014) meetings of the Society for Political Methodology by Rocio Titiunik. I want to see how her R code worked with interesting comparative data. All recent Econometrica articles are linked to both replication and supplementary materials on the Econometrica web site. It took perhaps 15 minutes to make sure that I could run Stata on my desktop and get the same results as in the article. So thanks to both Meyerson and Econometrica for making things so easy.
I gained from this process, getting a much better feel for real RDD data analysis so I can say more to my students than “the math is correct.” My students gain by seeing a first rate application that interests them (not a toy, and not yet another piece on American elections). And Meyerson gains a few readers who would not normally peruse Econometrica, and perhaps more cites in the ethnicity literature. And thanks to Titiunik for making her R code easily accessible.
The second happy experience was similar to the first, but also opened my eyes to my own inferior practice. At the same Society meetings, I was the discussant on a paper by Grant and Lebo on using fractional integration methods. I had not thought about such methods in a very long time, and believed (based on intuition and no evidence to the contrary) that using fractional integration methods led to no changes in substantive findings. But clearly one should base arguments on evidence and not intuition. I decided to compare the results of a fractional integration study by Box-Steffensmeier and Smith with the results of a simpler analysis. Their piece had a footnote saying the data were available through the ICPSR (excellent by the standards of 1998). Alas, on going to the ICPSR web site I could not find the data (noting that the lots of things have happened since 1998 and who knows if my search was adequate). Fortunately I know Jan so I wrote to her, and she kindly replied that the data were on her Dataverse at Harvard. A minute later I had the data and was ready to try to see if my intuitions might indeed be supported by evidence.
This experience made me think: could someone find my replication data sets? For as long as I can remember (at least back to 1995), I always posted my replication data sets somewhere. Articles written until 2003 sent readers my public ftp site at UCSD. But UCSD has changed the name and file structure of that server several times since 2003, and for some reason they did not feel obligated to keep my public ftp site going (and I was not worried enough about replication to think of moving that ftp site to NYU). Fortunately I can usually find the replication files if anyone writes me, and if I cannot, my various more careful co-authors can find the data. But I am sure that I am not the only person to have replication data on obsolete servers. Thankfully Political Analysis has required me to put my data on the Political Analysis Dataverse so I no longer have to remember to be a good citizen. And my resolution is to get as many replication data sets from old pieces on my own Harvard Dataverse. I will feel less hypocritical once that is done. It would be very nice if other authors emulated Jan!
The possibly less happy outcome relates to the recent article in PNAS on a Facebook experiment on social contagion. The authors, in a footnote, said that replication data was available by writing to the authors. I wrote twice, giving them a full month, but heard nothing. I then wrote to the editor of PNAS who informed me that the lead author had both been on vacation and was overwhelmed with responses to the article. I am promised that the check is in the mail.
What editor wants to be bothered by fielding inquiries about replication data sets? What author wants to worry about going on vacation (and forgetting to set a vacation message)? How much simpler the world would have been for the authors, editor, and me, if PNAS simply followed the good practice of Political Analysis, the American Journal of Political Science, the Quarterly Journal of Political Science, Econometrica, and (if rumors are correct) soon the American Political Science Review of demanding that authors post, either on the journal web site or the journal Dataverse, all replication materials before an article is actually published? Why does not every journal do this?
A distant second best is to require authors to post their replication on their personal website. As we have seen from my experience, this often leads to lost or non-working URLs. While the simple solution here is the Dataverse, surely at a minimum authors should provide a standard Document Object Identifier (DOI) which should persist even as machine names change. But the Dataverse solution does this, and so much more, that it seems odd in this day and age for all journals not to use this solution. And we can all be good citizens and put our own pre-replication standard datasets on our own Dataverses. All of this is as easy (and maybe) easier than maintaining private data web pages, and one can rest easy that one’s data will be available until either Harvard goes out of business or the sun burns out.
Featured image: BalticServers data center by Fleshas CC-BY-SA-3.0 via Wikimedia Commons.