By Dennis Baron
A computer at Carnegie Mellon University is reading the internet and learning from it in much the same way that humans learn language and acquire knowledge, by soaking it all up and figuring it out in our heads.
People’s brains work better some days than others, and eventually we will all run out of steam, but the creators of NELL, the Never Ending Language Learner, want it to run forever, getting better every day in every way, until it becomes the largest repository imaginable of all that’s e’er been thought or writ.
Since the first “electronic brains” began to appear in the late 1940s, it has been the goal of computer engineers and the occasional mad scientist to fashion machines that think and learn like people do. Or at least machines that perform functions analogous to some aspects of human thought, and which also self-correct by analyzing their mistakes and doing better next time around.
Setting out to create an infinite and immortal database is a big task: there’s a lot for NELL to learn in cyberspace, and a whole lot more that has yet to be digitized. But since NELL was activated a few months ago it has learned over 440,000 separate things with an accuracy of 74% which, to put it in terms that any Carnegie Mellon undergraduate can understand, is a C. In contrast, I have no idea how to count what I’ve learned since my own brain went on line, and no idea how many of the things that I know are actually correct, which suggests that all I’ve got on my cerebral transcript is an Incomplete.
NELL’s programmers seeded it with some facts and relations so that it had something to start with, then set it loose on the internet to look for more. NELL sorts what it finds into categories like mountains, scientists, writers, reptiles, universities, web sites, or sports teams, and relations like “teamPlaysSport, bookWriter, companyProducesProduct.”
NELL also judges the facts it finds, promoting some of them to the higher category of “beliefs” if they come from a single trusted source, or if they come from multiple sources that are less reliable. According to the researchers, “More than half of the beliefs were promoted based on evidence from multiple [i.e., less reliable] sources,” making NELL more of a rumor mill than a trusted source. And once NELL promotes a fact to a belief, it stays a belief: “In our current implementation, once a candidate fact is promoted as a belief, it is never demoted,” a process that sounds more like religion than science.
Sometimes NELL makes mistakes: the computer incorrectly labeled “right posterior” as a body part. NELL proved smart enough to call ketchup a condiment, not a vegetable, a mislabeling that we owe to the “great communicator,” Ronald Regan. But its human handlers had to tell NELL that Klingon is not an ethnic group, despite the fact that many earthlings think it is. Alex Trebek would be happy to know that, unlike Sean Connery, NELL has no trouble classifying therapists as a “profession,” but the computer trips up on the rapists, which it thinks could possibly be “awardtrophytournament” (confidence level, 50%).
NELL knows that cookies are a “baked good,” but that caused the computer to assume that persistent cookies and internet cookies are also baked goods. But that’s not surprising, since it still hasn’t learned what metaphors are—NELL is only 87.5% confident that metaphors are “tools” (plus, according to NELL, there’s a 50-50 chance that metaphors are actually “book writers”).
Told by its programmers that Risk is a board game, NELL predicts with 91.4% confidence that security risk is also a board game. NELL knows that a number is a character, but then incorrectly classifies the plural, numbers, as a character trait (93.8% confidence). The computer is also 99.9% confident that business is an academic field, which may be reassuring to those in the b-school if not to those small business owners worrying about the continuation of the Bush tax cuts.
Most recently, NELL learned that grain products is also a “baked good” and anti-American cleric Muqtada al Sadr is a “terrorist organization.” But First Amendment proves a stumper: NELL with weak confidence calls the First Amendment a musical instrument, classifies the Second Amendment as a ‘hobby,’ and is completely unwilling to confess any knowledge of the fifth amendment at all.
But NELL’s programmers weren’t at all surprised that they needed to perform some minor tweaks to get the computer back on track, since as they put it, “One might expect a nonnative reader of English to make similar mistakes.” In their view, NELL is only human.
It remains to be seen exactly how life-like NELL’s language learning really is. For one thing, the computer is reading its input, while most human language learners acquire language by listening and talking. Putting our love or fear of anthropomorphic computers aside for the moment, it’s clear that while NELL may have a bigger and more accurate memory than any human, it’s still a long way from being able to parse a question like, “What has four wheels and flies?”—something children learning language find both easy and funny, but machines don’t.
Dennis Baron is Professor of English and Linguistics at the University of Illinois. His book, A Better Pencil: Readers, Writers, and the Digital Revolution, looks at the evolution of communication technology, from pencils to pixels. You can view his previous OUPblog posts here or read more on his personal site, The Web of Language.