By Kees van Deemter
Alan Turing’s work was so important and wide-ranging that it is difficult to think of a more broadly influential scientist in the last century. Our understanding of the power and limitations of computing, for example, owes a tremendous amount to his work on the mathematical concept of a Turing Machine. His practical achievements are no less impressive. Some historians believe that the Second World War would have ended differently without his contributions to code-breaking. Yet another part of his work is the Turing test — Turing’s answer to a momentous question: What’s essential about human intelligence?
The inspiration for the Turing test came from a conversation game in which one player (the deceiver) tries to fool another player (the detective) about the deceiver’s gender. To win, a male deceiver would need to answer the detective’s questions in a way that suggests that he, the deceiver, is female. It doesn’t suffice for the deceiver to answer direct questions about gender. He should also show good knowledge about feminine topics and get properly upset over male chauvinism. What’s more, he should use turns of phrase that are typical of women. All this without overdoing it, of course.
Turing realised that this conversation game could be turned on its head if the role of the deceiver is played by a computer, not a person. The task for a computer deceiver is to fool the detective into believing that the deceiver is a person of flesh and blood. Analogous to the original game, the computer can win by thinking like a human. Now suppose that, playing this modified game, a computer was able to fool human detectives into believing it to be human. (A deceiver wins if the detectives are unable to get the computer/human decision correct more often than would be expected by chance.) Surely, so Turing argued, this would mean that the computer has managed to think like a human. Hence, if this happened, one would have to conclude that the computer displays real human thinking; the makers of the deceiver program would have captured human intelligence. The link between the Turing Test and intelligence has often been questioned, but the idea of the Test itself is very much alive.
Natural Language Generation (NLG) systems are computer programs that convert numerical or symbolic information into ordinary language. Weather forecasting, medical decision support, and other applications are starting to use systems of this kind. How should NLG programs be tested? No single method has all the answers, but human behaviour is still a gold standard to which many of these systems aspire. As in the Turing Test, researchers try to make their NLG systems produce text that resembles human-written text, partly because they believe that this may be the shortest route to making them effective. More and more often, NLG systems are tested in international evaluation contests that focus on one or more particular aspects of language use.
One of the most important challenges for NLG is to let computers talk in a human way about numbers. Numbers play an important role in many areas, including the medical domain. When nurses write about a patient — producing a shift report for instance — they have many numbers at their disposal (body temperature, oxygen saturation rates, etc.). However, they frequently suppress these numbers, replacing them by terms that are qualitative and vague. Instead of citing concrete oxygen saturation figures, they simply write “The SATS have remained OK”, for example. When talking about episodes of decreased heart rate, they throw in words like “temporary,” “prolonged,” and “significant.. Interestingly, doctors suppress numbers even more than nurses. Computational NLG systems in this area, by contrast, tend to stick with the numbers, producing stilted bits of text like the following: “By 10:40 SaO2 had decreased to 87. As a result, Fraction of Inspired Oxygen (FIO2) was set to 36%. SaO2 increased to 93.” The challenge is to do better, emulating human writers.
Unfortunately, the writings of doctors are rather difficult to mimic. The challenge is not just to decide when numerical information is useful. (Texts written by doctors contain numbers too, though fewer.) The hardest challenge for the NLG system is to “interpret” the numbers and this can involve difficult judgment calls, deciding whether a certain pattern of numbers should be summarized as “OK,” for instance, and deciding whether an episode of slow heart rhythm is merely “temporary” or “prolonged.” The medics’ texts are not dumbed-down versions of the computer-generated ones. They are highly sophisticated, despite their apparent simplicity. It will take research in NLG years before its computer programs stand a chance at winning a Turing Test in this area.
Kees van Deemter is a Reader in Computing Science at the University of Aberdeen. He is interested in getting computers to speak or write, and in the logical, linguistic, and philosophical issues that this raises. His book, Not Exactly: In Praise of Vagueness, puts the spotlight on vague and qualitative concepts, viewing them from a variety of angles and making a highly technical literature easily accessible to a wide audience. It explores how vague and qualitative concepts play a role in all areas of life, including even the exact sciences, where they are mostly unwelcome; how vague concepts fit into our current understanding of language and logic; the practical applications; and when and why vague language can be effective. Find out more about Not Exactly: In Praise of Vagueness. Kees van Deemter is also the author of approximately 120 peer-reviewed research publications.
OUPblog is celebrating Alan Turing’s 100th birthday with blog posts from our authors all this week. Read our previous posts on Alan Turing including: “Maurice Wilkes on Alan Turing” by Peter J. Bentley, “Turing : the irruption of Materialism into thought” by Paul Cockshott, “Alan Turing’s Cryptographic Legacy” by Keith M. Martin, and “Turing’s Grand Unification” by Cristopher Moore and Stephan Mertens.