There’s been a lot of hype recently about the emergence of technologies like ChatGPT and the effects they will have on science and society. Linguists have been especially curious about what highly successful large language models (LLMs) mean for their business. Are these models unearthing the hidden structure of language itself or just marking associations for predictive purposes?
In order to answer these sorts of questions we need to delve into the philosophy of what language is. For instance, if Language (with a big “L”) is an emergent human phenomenon arising from our communicative endeavours, i.e. a social entity, then AI is still some ways off approaching it in a meaningful way. If Chomsky, and those who follow his work, are correct that language is a modular mental system innately given to human infants and activated by miniscule amounts of external stimulus, then AI is again unlikely to be linguistic, since most of our most impressive LLMs are sucking up so many resources (both in terms of data and energy) that they are far from this childish learning target. On the third hand, if languages are just very large (possibly infinite) collections of sentences produced by applying discrete rules, then AI could be super-linguistic.
In my new book, I attempt to find a middle ground or intersection between these views. I start with an ontological picture (meaning a picture of what there is “out there”) advocated in the early nineties by the prominent philosopher and cognitive scientist, Daniel Dennett. He draws from information theory to distinguish between noise and patterns. In the noise, nothing is predictable, he says. But more often than not, we can and do find regularities in large data structures. These regularities provide us with the first steps towards pattern recognition. Another way to put this is that if you want to send a message and you need the entire series (string or bitmap) of information to do so, then it’s random. But if there’s some way to compress the information, it’s a pattern! What makes a pattern real, is whether or not it needs an observer for its existence. Dennett uses this view to make a case for “mild realism” about the mind and the position (which he calls the “intentional stance”) we use to identify minds in other humans, non-humans, and even artifacts. Basically, it’s like a theory we use to predict behaviour based on the success of our “minded” vocabulary comprising beliefs, desires, thoughts, etc. For Dennett, prediction matters theoretically!
If it’s not super clear yet, consider a barcode. At first blush, the black lines of varying length set to a background of white might seem random. But the lines (and spaces) can be set at regular intervals to reveal an underlying pattern that can be used to encode information (about the labelled entity/product). Barcodes are unique patterns, i.e. representations of the data from which more information can be drawn (by the way Nature produces these kinds of patterns too in fractal formation).
“The methodological chasm between theoretical and computational linguistics can be surmounted.”
I adapt this idea in two ways in light of recent advances in computational linguistics and AI. The first reinterprets grammars, specifically discrete grammars of theoretical linguistics, as compression algorithms. So, in essence, a language is like a real pattern. Our grammars are collections of rules that compress these patterns. In English, noticing that a sentence is made up of a noun phrase and verb phrase is such a compression. More complex rules capture more complex patterns. Secondly, discrete rules are just a subset of continuous processes. In other words, at one level information theory looks very statistical while generative grammar looks very categorical. But the latter is a special case of the former. I show in the book how some of the foundational theorems of information theory can be translated to discrete grammar representations. So there’s no need to banish the kinds of (stochastic) processes often used and manipulated in computational linguistics, as many theoretical linguists have been wont to do in the past.
This just means that the methodological chasm between theoretical and computational linguistics, which has often served to close the lines of communication between the fields, can be surmounted. Ontologically speaking, languages are not collections of sentences, minimal mental structures, or social entities by themselves. They are informational states taken from complex interactions of all of the above and more (like the environment). On this view, linguistics quickly emerges as a complexity science in which the tools of linguistic grammars, LLMs, and sociolinguistic observations all find a homogeneous home. Recent work on complex systems, especially in biological systems theory, has breathed new life into this interdisciplinary field of inquiry. I argue that the study of language, including the inner workings of both the human mind and ChatGPT, belong within this growing framework.
For decades, computational and theoretical linguists have been talking different languages. The shocking syntactic successes of modern LLMs and ChatGPT have forced them into the same room. Realising that languages are real patterns emerging from biological systems gets someone to break the awkward silence…
Featured image by Google DeepMind Via Unsplash (public domain)