Dictionary droids write definitions untouched by human hands
By Dennis Baron
There’s a new breed of dictionary, untouched by human hands. The New York Times reports that teams of programmers have developed software that automates the making of dictionaries, eliminating the need for human lexicographers, who may favor some words and neglect others. These new dictionary droids comb the web, selecting words in context, defining them automatically based on that surrounding context, and tabulating the definitions and citations for subscribers to consult online. And they do it all faster than you can say Google.
The web has made possible a democratizing of the dictionary. There are no editors with their annoying biases to stand in the way, so with just a couple of clicks users can see words in their natural habitat and choose exactly which one best suits their purpose. To paraphrase the old New Yorker cartoon, on the internet, everybody’s a lexicographer.
No human dictionarian sifts through the massive online corpus to figure out the various senses and connotations of each word, its history, etymology, or pronunciation. This leaves users free to do the job of lexicography themselves. They can even assign a word to any part of speech they want, or make up a new part of speech entirely if they like. There are no usage labels warning that a particular word might not be national, current, or reputable, or that some readers might find it stuffy or offensive. And there’s no grammar nazi shaking a minatory finger and muttering, “dictionary droid ain’t a word.” I just used dictionary droid online. It will soon be collected by a dictionary droid. Ergo, dictionary droid is a word. And if you don’t know what dictionarian or minatory mean, you can find them in the OED, a dictionary compiled by all-too-fallible humans.
What would the old lexicographers think about the web’s new dictionary droids? Back in the eighteenth century, Dr. Johnson’s ’net was “any thing reticulated or decussated, at equal distances, with interstices between the intersections.” That definition sounds like it was created by a droid, and if Johnson actually had to define internet today, he’d probably come up with something equally convoluted.
The nineteenth-century lexicographer Noah Webster had his own word quirks. Webster preferred bridegoom to bridegroom because it comes from the Old English word guma, meaning ‘man,’ not groom, which refers to ‘someone charged with caring for horses,’ and he wanted to respell deaf as deef, to reflect how it was pronounced by his fellow New Englanders. So I imagine Webster would have changed lots of the spellings he found online and taken out all the dirty words, which is what he did when he translated the Bible after he finished making dictionaries. Finally, James Murray, the first editor of the Oxford English Dictionary, would probably give up the 3×5 slips on which he wrote each word, together with a context illustrating it, and make a PowerPoint stack for every word instead.
Above: Dr. Johnson’s definition of network, from his Dictionary of the English Language (1755). Below: Noah Webster’s definition of bridegoom, from An American Dictionary of the English Language (1828). In 1833 Webster published his translation of the Bible, which used euphemisms instead of dirty words, “language which cannot be uttered in company without a violation of decorum,” so that women and children could read the scriptures without blushing.
New technologies give rise to the fear that they’ll render human workers obsolete. Computer-driven robots build our cars, and the ranks of autoworkers have diminished. But we still need people to figure out how to make the kinds of cars that drivers will want to buy. Newspapers downsize as readers get their news online. But just because someone uploads an eyewitness video from their phone doesn’t mean we don’t need professional journalists to gather facts, conduct interviews, and actually report a breaking story or interpret its events in retrospect. So it is with dictionaries.
Lexicographers today benefit greatly from the massive databases of words-in-context that the web provides, and all the major dictionary makers, along with other language researchers, are hard at work figuring out how the web can help them better understand the history and current state of language.
Computers can sift and sort all this word data in nanoseconds. They can pull out of an online corpus, for example, every use of the word the, together with the words that surround it, plus metadata about the source text (magazine, novel, television show, website, Tweet, phone call; when and where it was published or uttered; who might have written or said it, and to whom). But while they’re great at pattern recognition, computers don’t deal well with lexical nuance. We still need human lexicographers to evaluate the data gathered by the dictionary droids and interpret it for dictionary users, amateurs who appreciate the convenience of clicking on a word for its meaning, but don’t want to assume the role of professional word nerd for themselves.
The web is full of the kind of linguistic data that makes real lexicographers drool, so crunching all those online words in the service of dictionary-making is a worthy task. And most dictionarians recognize the importance of publishing their dictionaries on the web, because online is now where readers go to look up words. But although writing algorithms to automate the process of defining words, creating entire dictionaries untouched by human hands, might save on labor costs, it’s not likely to give dictionary users the word histories, the accurate definitions, or the other kinds of lexical guidance that they really need. Of course the real downside to online dictionaries, both those generated by web-crawling software and those created by professional lexicographers, is that you can’t use them to press flowers or have them double as booster seats when small children come to dine.
James Murray, first editor of the Oxford English Dictionary, shown here in his study at Hogwarts. Were Murray making dictionaries today, he would probably give up the 3×5 slips on which he wrote each word, together with a context illustrating it, and make a PowerPoint stack for every word instead.
This article originally appeared on The Web of Language.
Dennis Baron is Professor of English and Linguistics at the University of Illinois. His book, A Better Pencil: Readers, Writers, and the Digital Revolution, looks at the evolution of communication technology, from pencils to pixels. You can view his previous OUPblog posts here or read more on his personal site, The Web of Language, where this article originally appeared. Until next time, keep up with Professor Baron on Twitter: @DrGrammar.