It is fair to say that the field of linguistics is hardly ever in the news. That is not the case for language itself and all things to do with language—from word of the year announcements to countless discussions about grammar peeves, correct spelling, or writing style. This has changed somewhat recently with the proliferation of Large Language Models (LLMs), and in particular since the release of OpenAI’s ChatGPT, the best-known language model. But does the recent, impressive performance of LLMs have any repercussions for the way in which linguists carry out their work? And what is a Language Model anyway?
At heart, all an LLM does is predict the next word given a string of words as a context —that is, it predicts the next, most likely word. This is of course not what a user experiences when dealing with language models such as ChatGPT. This is on account of the fact that ChatGPT is more properly described as a “dialogue management system”, an AI “assistant” or chatbot that translates a user’s questions (or “prompts”) into inputs that the underlying LLM can understand (the latest version of OpenAI’s LLM is a fine-tuned version of GPT-4).
“At heart, all an LLM does is predict the next word given a string of words as a context.”
An LLM, after all, is nothing more than a mathematical model in terms of a neural network with input layers, output layers, and many deep layers in between, plus a set of trained “parameters.” As the computer scientist Murray Shanahan has put it in a recent paper, when one asks a chatbot such as ChatGPT who was the first person to walk on the moon, what the LLM is fed is something along the lines of:
Given the statistical distribution of words in the vast public corpus of (English) text, what word is most likely to follow the sequence “The first person to walk on the Moon was”?
That is, given an input such as the first person to walk on the Moon was, the LLM returns the most likely word to follow this string. How have LLMs learned to do this? As mentioned, LLMs calculate the probability of the next word given a string of words, and it does so by representing these words as vectors of values from which to calculate the probability of each word, and where sentences can also be represented as vectors of values. Since 2017, most LLMs have been using “transformers,” which allow the models to carry out matrix calculations over these vectors, and the more transformers are employed, the more accurate the predictions are—GPT-3 has some 96 layers of such transformers.
The illusion that one is having a conversation with a rational agent, for it is an illusion, after all, is the result of embedding an LLM in a larger computer system that includes background “prefixes” to coax the system into producing behaviour that feels like a conversation (the prefixes include templates of what a conversation looks like). But what the LLM itself does is generate sequences of words that are statistically likely to follow from a specific prompt.
It is through the use of prompt prefixes that LLMs can be coaxed into “performing” various tasks beyond dialoguing, such as reasoning or, according to some linguists and cognitive scientists, learn the hierarchical structures of a language (this literature is ever increasing). But the model itself remains a sequence predictor, as it does not manipulate the typical structured representations of a language directly, and it has no understanding of what a word or a sentence means—and meaning is a crucial property of language.
An LLM seems to produce sentences and text like a human does—it seems to have mastered the rules of the grammar of English—but at the same time it produces sentences based on probabilities rather on the meanings and thoughts to express, which is how a human person produces language. So, what is language so that an LLM could learn it?
“An LLM seems to produce sentences like a human does but it produces them based on probabilities rather than on meaning.”
A typical characterisation of language is as a system of communication (or, for some linguists, as a system for having thoughts), and such a system would include a vocabulary (the words of a language) and a grammar. By a “grammar,” most linguists have in mind various components, at the very least syntax, semantics, and phonetics/phonology. In fact, a classic way to describe a language in linguistics is as a system that connects sound (or in terms of other ways to produce language, such as hand gestures or signs) and meaning, the connection between sound and meaning mediated by syntax. As such, every sentence of a language is the result of all these components—phonology, semantics, and syntax—aligning with each other appropriately, and I do not know of any linguistic theory for which this is not true, regardless of differences in focus or else.
What this means for the question of what LLMs can offer linguistics, and linguists, revolves around the issue of what exactly LLMs have learned to begin with. They haven’t, as a matter of fact, learned a natural language at all, for they know nothing about phonology or meaning; what they have learned is the statistical distribution of the words of the large texts they have been fed during training, and this is a rather different matter.
As has been the case in the past with other approaches in computational linguistics and natural language processing, LLMs will certainly flourish within these subdisciplines of linguistics, but the daily work of a regular linguist is not going to change much any time soon. Some linguists do study the properties of texts, but this is not the most common undertaking in linguistics. Having said that, how about the opposite question: does a run-of-the-mill linguist have much to offer to LLMs and chatbots at all?
Featured image: Google Deepmind via Unsplash (public domain)