How little we (can) know about the history of the English language

As a historical linguist, I devote much of my teaching, research, and even recreational time to trying to understand what English is today and how it got here. Some things are easily noticed: that Beowulf uses words (“æþelingas”), morphemes (“monegum”), and graphs (þ, æ, ð) that English no longer has. Or that later English, because of its orthography and word order, often looks more remote than it really is. “Whan that Aprill with his shoures soote,” the opening of Chaucer’s Canterbury Tales, is not much more than “When April, with its sweet showers.”

In examples like these, it’s easy to distinguish what English was from what it is. But many cases are less clear, and one of the reasons for this is that, as perhaps with all historical inquiry, the farther back in time one goes, the less substantive the evidence becomes. By which I mean the fewer the attested examples of written or spoken English. Some evidence hangs by a thread. The 3182 lines of just one poem (Beowulf), for example, constitute about 10% of the entire corpus of Old English poetry. If one were to graph the number of surviving examples of English against each successive year, that graph would show a steady increase in extant material from the date of the Beowulf manuscript (around 1000) until about the year 1600; a significant rise at that point due to increases in literacy and printed documents and an expansion of the kinds of works (such as personal letters) that begin to survive in abundance; and a precipitous rise after 1900, due to new media and the spread of English as a global language. For the first part of this graph the line might incline at about 20 degrees, for the second part at about 45 degrees, and for the last part, when messages, texts. and digital files of all kinds offer accessible data in the cloud, at perhaps 60 degrees.

“Any language history is not a history of what happened linguistically but a history of what survived.”

Historical English data are skewed by more than survival rates, however. The kinds of English that survive also vary significantly from one era to another. Prior to 1400 the data set of English (as my Beowulf example suggests) is comparatively small; before 1700 much of it is rhetorically crafted in some way, whether by legal figures, theologians, administrators, novelists, or poets; before 1800 it is mostly composed by males; and before 1900 it is virtually all written and offers little evidence of the actual language of entire groups, such as children or second language learners.

Preservation of unambiguously spoken English, then, is very much a recent phenomenon. The earliest extant English oral data occur in Thomas Edison’s 1877 recording of “Mary Had a Little Lamb,” and after that recordings are sporadic for at least a half century, with many of the earliest being stylized exercises in political oration or poetry reading and all of these (with the possible exception of a disputed recording of Queen Victoria) being by males. The primary reason for any widening of the linguistic record is of course technology, and because of this a true proliferation of genuine oral data is even more recent. The preservation of casual speech, as ubiquitous and inexpensive as it may be today, is very much a recent phenomenon.

“In the history of English, historiography is as important as grammar, usage, and speakers.”

To make up for gaps in the record, historical linguists, like all historians, rely on reconstructions based on what they know of the language’s larger history or on documented processes of change for other languages. While methods of reconstruction have been refined for hundreds of years, they remain fundamentally tied to a very partial historical record: one without speech or children for most of its history, without women for much of it, without ephemeral examples for nearly all of it, without much documentation of the early contact varieties that developed globally in the age of exploration, and so forth. Ultimately, any language history is not a history of what happened linguistically but a history of what survived, or what linguists believe themselves able to reconstruct.  And an account like this cannot explain the whole of the language because so much of the language’s history does not survive.

Which shows that in the history of English, historiography is as important as grammar, usage, and speakers. It is historiographic frames that put these pieces together into a cohesive narrative that then can be used to explain still other linguistic pieces. Significantly, whether through heuristics like genealogy, social criticism, usage, or language families, different historiographies will identify and categorize the same data in different ways, each of which sustain its own objectives. This means, in essence, that writing the history of English (or any language) is not so much a matter of connecting the pre-arranged dots of some paint-by-numbers picture as it is in laying out those dots and assigning an order to them.

