Ammon Shea recently spent a year of his life reading the OED from start to finish. Over the next few months he will be posting weekly blogs about the insights, gems, and thoughts on language that came from this experience. His book, Reading the OED, has been published by Perigee, so go check it out in your local bookstore. In the post below Ammon looks at how many words are in the English language.
I was contacted recently by a journalist who is writing a story about the claim that the size of the English language is feverishly approaching one million words. This claim has been promulgated with varying degrees of self-righteousness over the past few years by a fellow who seems to be armed with little more than a purported algorithm and an inflated sense of importance. The notion of ‘one million words of English’ has been debunked by others who are far more educated than I (Ben Zimmer, Jesse Sheidlower, Geoffrey Nunberg), so I’ll not waste time in addressing it, except to note that calling it clap-trap would be unfair to the clap.
Yet it does make me wonder – why is it that we are so often interested in putting a number on something as inherently unquantifiable as vocabulary? The journalist who was writing the story was an educated and interesting person, and seemed genuinely curious about the subject. I wish that I’d had better answers for her but I can think of few things about which I have less knowledge than how many words our language has, and how many of them I (or anyone else) might know. I might as well be asked to state how many memories a person has.
I don’t mean to imply that this is not an area that deserves serious study; it obviously does, and there is a great deal of research done by linguists on many types of language acquisition (the phrase ‘vocabulary size’ yields approximately 11,000 results on Google Scholar). But there are so many awkward examples of people trying to come up with a sure-fire measurement for things such as the exact size of an average person’s vocabulary, or the total number of words in a language; the one commonality in these attempts seems to be a marked disparity.
There appears to be a certain degree of difficulty in ascertaining what a word is, at least for purposes of counting (is set one word or hundreds, do regional variants count as additional words, are obsolete terms to be counted?). Similarly, there seems to be no easy way to judge what it means to say that someone ‘knows’ what a word means – do they have to be able to define it, to know how to spell it, or can they pull a Potter Stewart, and simply ‘know it when they see it’?
I was thinking of all this today as I rode my bicycle through New York City traffic, trying to decide whether the word what would be one word or many, and how I would define it if I were so queried. I accomplished nothing except to give myself a headache, a near miss with a large truck, and a resolve to leave the counting of words to professionals, who are smart enough to stay away from algorithms.