The Oxford DNB at 10: new research opportunities in the humanities

September 2014 marks the tenth anniversary of the publication of the Oxford Dictionary of National Biography. Over the next month a series of blog posts consider aspects of the ODNB’s online evolution in the decade since 2004. Here the literary historian, David Hill Radcliffe, considers how the ODNB online is shaping new research in the humanities.

The publication of the Oxford Dictionary of National Biography in September 2004 was a milestone in the history of scholarship, not least for crossing from print to digital publication. Prior to this moment a small army of biographers, myself among them, had worked almost entirely from paper sources, including the stately volumes of the first, Victorian ‘DNB’ and its 20th-century print supplement volumes. But the Oxford DNB of 2004 was conceived from the outset as a database and published online as web pages, not paper pages reproduced in facsimile. In doing away with the page image as a means of structuring digital information, the online ODNB made an important step which scholarly monographs and articles might do well to emulate.

Database design has seen dramatic changes since 2004—shifting from the relational model of columns and rows, to semi-structured data used with XML technologies, to the unstructured forms used for linking data across repositories. The implications of these developments for the future of the ODNB remain to be seen, but there is every reason to believe that its content will be increasingly accessed in ways other than the format of the traditional biographical essay. Essays are not going away, of course. But they will be supplemented by the arrays of tables, charts, maps, and graphs made possible by linked data. Indeed, the ODNB has been moving in this direction since 2004 with the addition of thousands of curated links between individuals (recorded in biographical essays) and the social hierarchies and networks to which they belonged (presented in thematic list and group entries)—and then on to content by or about a person held in archives, museums or galleries worldwide.

Online the ODNB offers scholars the opportunity to select, group, and parse information not just at the level of the article, but also in more detailed ways—and this is where computational matters get interesting. I currently use the ODNB online as a resource for a digital prosopography attached to a collection of documents called ‘Lord Byron and his Times’, tracking relationships among more than 12,000 Byron-contemporaries mentioned in nineteenth-century letters and memoirs; of these people a remarkable 5000 have entries in the ODNB. The traditional object of prosopography was to collect small amounts of information about large numbers of persons, using patterns to draw inferences about slenderly documented lives. But when computation is involved, a prosopography can be used with linked data to parse large amounts of information about large numbers of persons. As a result, one can attend to particularities, treating individuals as members of a group or social network without reducing them to the uniformity of a class identity. Digital prosopography thus returns us to something like the nineteenth-century liberalism that inspired Sir Leslie Stephen’s original DNB (1885-1900).

The key to finding patterns in large collections of lives and documents, the evolution of technology suggests, is to atomize the data. As a writer of biographies I would select from documentary sources, collecting the facts of a life, and translating them into the form of an ODNB essay. Creating a record in a prosopography involves a similar kind of abstraction: working from (say) an ODNB entry, I abstract facts from the prose, encoding names and titles and dates in a semi-structured XML template that can then be used to query my archive, comprising data from previous ODNB abstractions and other sources. For instance: ‘find relationships among persons who corresponded with Byron (or Harrow School classmates, or persons born in Nottinghamshire, etc.) mentioned in the Quarterly Review.’ An XML prosopography is but a step towards recasting the information as flexible, concise, and extensible semantic data.

While human readers can easily distinguish the character-string ‘Oxford’ as referring to the place, the university, or the press, this is a challenge for computation—like distinguishing ‘Byron’ the poet from ‘Byron’ the admiral. One can attack this problem by using algorithms to compare adjacent strings, or one can encode strings by hand to disambiguate them, or use a combination of both. Digital ODNB essays are good candidates for semantic analysis since their structure is predictable and they are dense with significant names of persons, places, events, and relationships that can be used for data-linking. One translates character-strings into semantic references, groups the references into relationships, and expresses the relationships in machine-readable form.

A popular model for parsing semantic data is via ‘triples’: statements in the form subject / property / object, which describe a relationship between the subject and the object: the tree / is in / the quad. It is powerful because it can describe anything, and its statements can be yoked together to create new statements. For example: ‘Lord Byron wrote Childe Harold’, and ‘John Murray published Childe Harold’ are both triples. Once the three components are translated into semantically disambiguated machine-readable URIs (Uniquely Referring Identifiers), computation can infer that ‘John Murray published Lord Byron.’

Now imagine the contents of the ODNB expressed not as 60,000 biographical essays but as several billion such statements. In fact, this is far from unthinkable, given the nature of the material and progress being made in information technology. The result is a wonderful back-to-the-future moment with Leslie Stephen’s Victorian DNB wedded to Charles Babbage’s calculating machine: the simplicity of the triple and the power of finding relations embedded within them. Will the fantasies of positivist historians finally be realized? Not likely; while computation is good at questions of ‘who’, ‘what’, ‘where’, and ‘when’, it is not so good at ‘why’ and ‘how’. Biographers and historians are unlikely to find themselves out of a job anytime soon. On the contrary, once works like the ODNB are rendered machine-readable and cross-query-able, scholars will find more work on their hands than they know what to do with.

So the publication of the ODNB online in September 2004 will be fondly remembered as a liminal moment when humanities scholarship crossed from paper to digital. The labour of centuries of research was carried across that important threshold, recast in a medium enabling new kinds of investigation the likes of which—ten years on—we are only beginning to contemplate.

