At the Conference on Open Access Scholarly Publishing in Paris last month, Claudio Aspesi, Senior Analyst at Sanford C. Bernstein, raised an uncomfortable question. Did the continuing financial health of traditional publishers like Elsevier indicate that open access had “failed”? According to Aspesi, “Expectations that OA will address the serial costs crisis are fading away.”
Is Aspesi right? Has open access failed? I certainly don’t think so – but that doesn’t mean the job is done…
When we launched BioMed Central in 2000, the goal was a simple and positive one – it wasn’t to fix library budgets or to destroy the businesses of existing publishers, but to introduce a new form of publishing that would help researchers communicate their findings more effectively by challenging the established notion that published results “belonged” to the publisher.
The success of the open access movement is demonstrated most clearly by the extent to which it is simply no longer acceptable to the world’s largest science funders for the results of their funding to end up trapped indefinitely behind publisher paywalls.
For example, the pilot for open access to research outputs in the EU’s 7th Framework program (FP7) has become, with Horizon 2020 (the successor to FP7), a mandatory policy covering all funded research. If this is failure, it is a strange-looking form of failure, with the results of the full €80 billion of H2020 funding now set to be made publicly available. This is in addition to similar mandatory public access policies already in force or announced for the major UK research funders, all US government funded research, two of the largest funders in China, and many others.
Of course, existing publishers have found ways to adapt to funders’ expectations of open access, and even to grow their businesses thanks to the additional funding made available to cover “Gold” open access fees. This isn’t such a bad thing; traditional publishers are responsible for many good journals. As more of them convert to a fully open access model (Nature Communications being one recent example), we are seeing a ratchet mechanism at work which is progressively shifting an ever higher fraction of the scientific literature to being open access immediately on publication and in authoritative final form.
While some bemoan the fees associated with “Gold” open access publishing, the model has the powerful advantage of providing a funding mechanism which scales with the increasing volume of research funding, unlike library budgets. By making the costs of publishing visible to authors, it also has the potential to eventually save costs by creating a more efficient market for publishing services (though this does depend on authors showing at least some degree of price sensitivity).
As for the perceived failure of open access to knock the incumbents from their perch, this seems a curious metric of success. We don’t regard AirBnB as a failure because hotel chains still exist, and nor is the continued existence of national airline carriers seen as a fundamental failure of the budget airline model. In both cases, the landscape has been profoundly transformed by the new model, and existing players are having to adapt, improve, and refocus their offering to compete.
Perhaps the most important success of the open access movement is not what it has already achieved, but the foundation it provides for further improvements to scientific communication.
Open access to open data
Establishment of open access to research articles as a basic norm has paved the way for the EU pilot of Open Research Data within H2020, seeking to ensure that not only the published article but the underlying data resulting from scientific research should be routinely made available in a form that facilitates reuse and further analysis.
The need for improved access to research data is not a new idea. Funders such as the Wellcome Trust and NSF already require grantees to specify Data Management Plans to indicate how the results of funded research will be made accessible. Unfortunately these plans often aren’t worth the (virtual) paper they are printed on. With suitable data standards and data management infrastructure either absent or not widely used, obtaining a copy of the underlying data from a particular research project can still be tortuous or even impossible.
Even if the data can be obtained, often the metadata and experimental details available are insufficient to make the experimental results reliably reproducible. With luck, the lab member who carried out the experiment may still be around and may be able (with considerable effort) to dig out such information, but the longer that has elapsed since publication, the more likely it is that the exact details of what was done will have vanished forever.
To address this problem, it is not enough to change the way science is published, we also need to look upstream at how scientific experiments are carried out, and how the results are analysed and prepared for publication.
Improving the scientific method
In computational research, significant steps have been made towards improving the situation. The Galaxy platform allows researchers to share genomic analysis pipelines in a form which can be readily reused by other researchers, and the journal GigaScience (published by BioMed Central in collaboration with the BGI) has embraced this approach by making available a public Galaxy server to ensure that data and analyses associated with the articles it publishes are readily accessible. More generally, there is strong enthusiasm amongst computational researchers for the use of new tools such as Docker to allow arbitrary ‘experimental setups’ for in silico research to be shared efficiently.
At the lab bench, sharing full experimental details and data descriptions is more challenging due to the wide range of data types and experimental descriptions which need to be represented. The NIH’s Big Data 2 Knowledge (BD2K) initiative, which recently announced its first round of funding awards, shows that this challenge is now receiving serious attention, and the use by journals such as Scientific Data and GigaScience of (Investigation, Study, Assay Tabular format) as a general high-level metadata standard shows promise, though there is some way to go before we have tools in place to make preparing data for publication in such a formats a pleasure rather than a chore.
At Riffyn, we are working with academic and industrial partners to develop cloud-based tools which will help researchers design their experiments up front using a friendly modern user interface, and to capture data from those experiments in a way which retains a connection to the experimental design and context, with the goal of making results more reliable and reproducible, and helping to rapidly distinguish genuine insights in the data from artefacts and noise. The long term goal is to ensure that making data well-described and reusable isn’t an afterthought or a tedious additional step, but is at the heart of the experimental process.
Any attempt to improve the way science is done will only succeed through collective effort and widespread adoption of shared standards. We hope to see a whole ecosystem of tools and standards emerge which will support the smooth flow of data and accompanying descriptive metadata all the way from experimental design, through data capture, to analysis, visualization, authoring and publication, retaining as much as possible of the provenance information and structure at every stage.
This will not be easy, but the building blocks do seem to be falling into place. The Force11 conference (spawned from earlier Beyond the PDF workshops), which next takes place in Oxford in January 2015, has created a useful framework for such collaboration. This weekend (25-26 October 2014) at the Mozilla Festival in London, the science track will offer a data-driven authoring workshop. Tools providers including Authorea, WriteLaTeX, Papers, Pensoft and F1000 will run demos and tutorials, whilst discussing how best such tools can be made to work together and integrate smoothly with lab data systems and publisher platforms, so that ultimately data can flow freely and in meaningful form through the entire process. If you can, please join us there to work with us on the next chapter of the open access success story.
The opinions and other information contained in this blog post and comments do not necessarily reflect the opinions or positions of Oxford University Press.