Oxford University Press's
Academic Insights for the Thinking World

OED updates

Waaaay back in December 2006(remember those good old days?) the Online Oxford English Dictionary added an astounding 2600 new and revised words. Below is a full report by John Simpson, the OED’s Chief Editor, on significant words, and statistics about pronunciations, quotations, and etymology in the recent update.

Plotting the effect of revision

John Simpson, Chief Editor, OED

pomander – prajnaparamita

Pomander to prajnaparamita represents the twenty-eighth sequential instalment
of revised text to be published since the OED went online with one thousand entries
from M to mahurat (‘an auspicious moment for beginning an enterprise’) in March 2000.

Pomander to prajnaparamita contains 2,658 main entries (the equivalent range of OED2 contained 1,937). 208 of these entries are entirely new to the dictionary, and a further 191 are terms which used to be nested within other related entries in OED2. The range covers 203 pages in OED2 and it is estimated that the revised range, if printed to the same triple-column format, would span at least 350 pages.

The breakdown by part of speech for main entries in the range is as follows: nouns (1,793); adjectives (759); verbs (251); adverbs (103); interjections (18); combining forms (11); phrases (4);
prepositions (4); prefixes (1)

The single prefix is the substantial entry for post-. Related main entries form a significant part of this publication batch (from postabdomen to postzygapophysis).

The 2,658 main entries contain 8,216 subsenses (including nested compounds, etc.), giving an average of just over three subsenses per entry. 283 of these subsenses are entirely new to the dictionary.

6,242 main and nested terms are entered (and, except in those cases where the meaning of a compound or derivative is self-evident, defined) in this revised range. (Further statistical references relate specifically to the current range; all figures should be taken as provisional.)

Words and definitions

Any substantial range of the alphabet contains many important English words. The present range is no exception, and some of the most significant include: pomp, pompous, pond, ponder, pony, pool, poor, pop, popular, population, porch, pore, pork, port, porter, portion, pose, position, positive, possess, possession, possessive, possible, post, posture, pot, potato, potential, potion, pouch, pounce, pound, pour poverty, powder, power, practical, practice, practise, pragmatic, prairie, praise

The work of revising a ‘large’ OED entry, as are many of these, is quite different from the work involved in smaller entries, where the editor is less concerned with the overall structure of the entry and its relationship with derivatives and associated words. A glance at any of these entries in the revised range will reveal the issues.

Within the range, almost any entry demonstrates something of interest. Here are a few suggestions for further reading online:

portentious: formerly a single sense (‘pretentious, pompous; portentous’), with quotations from 1857 until 1975, but now with three meanings presented in separate sections (the earliest of which is new to the dictionary) illustrated from 1549, ?c1550, and 1859 respectively.

post n.3 and post n.5 give substantial information on sense development in French and Italian, which make it possible to see how the English uses concerned with postal services and soldiers’ postings came about.

pomegranate has a new sense, as well as an etymological note, which addresses the origin of pommy and pom n.2

poodle (as well as having earlier attestations taking the term in English back from 1825 to 1773) now contains a note explaining that the word could have been around much earlier as the name of a dog during the English Civil War.

potassium: early 20th century description now replaced by modern definiton.

potato: describes more fully how the word spread through various European languages and how both the word and the plants came to Britain.

practical is noteworthy for a number of reasons. Firstly, the new documentation (such as the earliest reference from ?a1425 rather than 1570) now justifies chronologically the ‘logical’ ordering of , both as regards the overall branch structure and individual senses within this. Secondly, the new definitions take the reader through these developing senses in a convincing sequence. Thirdly, the etymology ties this up by deriving the term from post-classical Latin practicalis (13th century, but 14th century in a British source), whereas OED1 had to derive it from French practique plus English -al. Modern dictionaries seem unfamiliar with the late Latin word.

Spellings, currency, and types of use

OED2 includes 2,557 variant spellings, whereas that figure has now risen to 6,008. As a result, readers with problematic spellings are much more likely to be taken to the appropriate entry online.The increase marks one area where the revision policy seeks to be more inclusive.

One indicator of how the language has changed since this range was originally published in 1907 is the number of words or senses now marked as obsolete. OED2 (largely reflecting OED1 practice) marks 372 main entries as obsolete, whereas the new figure is 507. The percentage rises further at subsense level, with 1,035 subsenses marked as obsolete in OED2 and 1,618 in OED3. The statistics raise a number of questions: was OED1 less inclined to label a term obsolete than it might have been; do today’s editors have better evidence with which to make the decision; are more of our older words dying out than we knew?

Another change occurs in systematic labelling (by subject, region, register, etc.). OED2 shows 3,788 ‘labels’ in this section of the text, whereas the revised entries demonstrate a substantial rise to 8,395. Does this simply reflect today’s more comprehensive labelling policy; or are editors nowadays also more likely to compartmentalize?

Within the sphere of labelling, we are much more likely to label a term as ‘colloquial’ than were the editors of OED2 (175 as opposed to 53). Much the same can be said of the label ‘slang’ (162 as opposed to 108). Is this evidence of the growing informality of language?

And what about our historical perspective? OED3 labels 396 entries in this range ‘historical’, as against only 55 in OED2. Are we more likely nowadays to use words in a historical context? Or has our sense of ‘history’ changed? Note too that we label 57 subsenses as ‘archaic’ (as opposed to 32 in the equivalent range before).

Pronunciations, quotations, and etymology

The number of pronunciation forms given in OED3 rises from 1,333 to 2,142. The increase in the number of illustrative quotations is also steep: from 22,604 in OED2 to 37,034 in OED3. This represents earliest attestations, quotations of intermediate date, postdatings, and quotations for new words and subsenses.

On the issue of antedatings, the present revised range contains 1,027 antedated main entries (41.6% of all entries), and a remarkable 4,072 antedated subsenses (51.3%). The significant increase is partially due to the availability of online historical corpora, but also to the OED’s own systematic reading programmes.

The distribution of first usages within this range is also worth looking at closely. The table below shows how many senses (and then words) entered English in each of the listed time frames, according to OED evidence. Note the low figure for Old English (few words in Old English begin with the letter p), and the drop in the 18th century (a real fact of language use or a curiosity of OED‘s data, or maybe the figure for the 17th century is anomalously high?).

senses words
Old English 44 27
Middle English 779 348
1500-99 870 343
1600-99 1141 444
1700-99 859 257
1800-99 2556 688
1900- 1967 551

The revised data contains a much stronger system of etymological tagging than did previous versions of the data available to editors. As a result of this, we can see that out of 2,658 main entries, 820 are borrowed (wholly or partly) from another language (= 31% of the total).

200612_jas1_1

In addition, 103 main entries are borrowed (wholly or partly) from personal or place names, and 70 further entries are (wholly or partly) calqued on models in foreign languages.

Within this revised range the number of borrowings from various languages is as follows:

Latin 347
French 265
Latin and French jointly 47
Greek 44
German 29
Italian 22
Spanish 16
Dutch 15
Portuguese 9
Other 65 (in no case more than 5)

200612_jas2

It is also possible to examine the data by time period. The following chart shows activity in the 16th century (351 main entries, of which 125 (36%) are borrowings) contrasted with the twentieth century (538 entries, of which 56 (10%) are borrowings):

200612_jas3

There has been some curiosity in the past about the relative number of quotations accorded celebrated authors in the OED. In the present range 979 authors are cited over five times; but 2,587 authors are cited between two and five times, and 7,380 are cited once only. The strength of the OED‘s evidence nowadays derives from the great mix of texts cited, not from citing consistently from a small number of authors.

Below is a list of the most-cited authors in the current range:

William Shakespeare 151 occurrences
Walter Scott 118
Geoffrey Chaucer 110
Charles Dickens 87
John Lydgate 80
Philemon Holland 75
John de Trevisa 72
Ben Jonson 56
William Caxton 56
John Milton 55

As ever, Shakespeare leads the field, with Scott and Chaucer in close pursuit. However, concentration on the most-cited authors does injustice to the mass of authors cited occasionally in the dictionary.

The revision of the OED is not an activity driven by statistics, but the figures that arise as a result of the editorial revision of the text are fascinating for the view they give of the dictionary and, more importantly, of the language. We do not necessarily know yet how to interpret some of the statistics that arise. Often, several factors go towards the creation of a single statistic. But if you don’t want to follow this arithmetical line, just read the entries as entries, and see what they tell you about the language.

Recent Comments

There are currently no comments.