A few days ago, linguist Rachael Tatman has published an intriguing question on Twitter:
Genuine question for folks who think that large language models understand language in some meaningful way: do you also think that spell checkers know what you meant?— Rachael Tatman (@rctatman) September 7, 2022
The intention of the tweet is clear: GPT-3 doesn’t have an understanding of the meaning of language, and neither does a spellchecker.
However, one word caught my attention: “meant”. Does GPT-3 have an understanding not of language, but of what you meant? And how about your spellchecker: Does it know what you meant?
What follows is merely a stream of thought, touching on some philosophical questions and, mostly, a subjective opinion. But I do think it’s important to think about the implications of what the technological advances of the past decades actually mean for us, and for our understanding of certain words. Bear that in mind, and then let’s talk meaning!
What does “Meaning” mean?
Before answering the question of whether or not some tool “knows what you meant”, it is important to think about what the word “meant” refers to here. There are two ways in which the word “meaning” can be understood: in a general sense, and in a specific sense.
A general understanding of meaning is the meaning of the symbols we use in language. For example, the sentence “He doesn’t see the forest for the trees” has two symbols in it: forest, and tree. “Forest” is just an agglomeration of “trees” and “tree” is a symbol for something we can see. Using such symbols save us the hassle of having to describe the tree’s colors, that they have branches (which would raise the question: what is a branch?), et cetera. There is a third symbol in there, since the sentence itself is an idiom: Whenever we say that someone “doesn’t see the forest for the trees” we don’t literally mean that someone can’t “see” forests, it is a metaphor for a person who is so lost in details that they miss the bigger picture.
That is the general meaning of the term “to mean”. However, there is another meaning which is important for us here: Meaning in a specific contextual sense. When we talk about meaning in a contextual sense, we don’t need to know what a term means in general. As an example, when we write about “someone being a looser”, we may immediately notice that the word is actually spelled “loser”, and if we write about a somewhat “loser screw” we might quickly see that what we actually meant was a “looser screw”. At no point did we need to know what the symbolical meaning of “loser” or “looser” is, we just need to know that “loser” is a noun and “looser” is an adjective. Likewise, if someone tells us they’re going to buy an “aubergine” we may be able to tell they’re British since an American would say “eggplant” without knowing what either of it is.
The dimension that matters here is scoping: Of course no computer can have a general understanding of language, simply because this is not possible. But many algorithms actually do have an understanding of what we mean, even though it is strictly scoped. A spellchecker, for example, has a very simple job: Find typos and suggest the most likely correct replacement for that. In this – very limited – sense, a spellchecker does actually know what we meant!
GPT-3 on the other hand, doesn’t. And the reason is not that it’s “dumber” or less complex than a spellchecker – to the contrary. The reason is rather that there is rarely a scope attached to LLMs (large language models) such as GPT-3 or BERT. When you define “it knows what we meant” in a very general sense, you define the “scope” of your meaning by formulating a question. You may ask “what do you think about the weather?” or “I think Robert De Niro did a bad acting job in The Irishman”. Both of these require knowledge of the meaning of what you just said. However, since language-generating models (especially those out in the open) normally are not scoped, it is a hit-or-miss game whether they will generate something that you interpret (!) as sensible.
And it is here where LLMs generally fail to surprise. As long as the scope of those models is generic without them having an understanding of language in general, they won’t know what we meant because what we meant can be literally anything. Spellcheckers, on the other hand, have a clearly, well-defined use-case, and hence I think that, in the narrower sense of the word, we can indeed state that they really know what we meant.
Think about it: Imagine you meant to write “asphyxiation” but accidentally wrote “asphixyation”. You probably had to squeeze your eyes to see the typo in the second one, right? My spellchecker immediately underlined the second word and suggested the correct spelling, without me having to spot the typo myself. In that sense, it knew what I meant: I meant to write asphyxiation.
You may now see an interesting edge case with my example above: Both “loser” and “looser” are valid English words, so in this case, my spellchecker actually doesn’t know what I meant. The problem is that the spellchecker is scoped not just to wrong words, but to wrongly typed words. However, even the detection of loser vs. looser is possible with simple algorithms: All you need to do is extract the grammar of a sentence and then you will see a mismatch between the grammatical function of the adjective “looser” and the required grammatical construct, a noun. After this detection, finding the word that we meant requires basically the same algorithm that a regular spellchecker uses. Microsoft Word, for example, has added that functionality a few years ago (the squiggly blue lines whenever you make a grammatical error).
How does that all matter?
The last question is now: why even bother? I think, Rachael’s question points towards a serious problem in current NLP research: Excited by the extreme advancements in the development of language models ever since the fateful “Attention is all you need”-paper in 2017, the research has shifted from “How do we solve this peculiar problem?” to “How can we solve problems in general?”. What got lost along the way is a certain sense of care that tries to carve out the huge problem into which you run as soon as you try to generalize. Add to this the fact that most of the prominent models out there, such as GPT-3, are being developed by for-profit corporations that have an incentive to sell us things, not an incentive to perform rigoros evaluations of their pets.
In sociology, the 20th century is known as the century of the “grand theories”. With Talcott Parsons or Niklas Luhmann, it featured several renowned scholars who all tried – in some way or another – to “explain it all”. However, explaining everything also means that you have to generalize to a large extent. And by generalizing, you also increase the amount of situations where you are simply wrong. There is a reason Robert K. Merton advocated for what he called “middle range theories”, that is theories that try to generalize a little bit, but not too much, in order to strike a balance between the two goals of “I can explain more than just my own little case study” and “I am most definitely right”. Science is pretty boring if all you can explain is the Balinese cock fight. But science can also become wrong if you claim that Hitler and Stalin were “literally the same person” because of your ridiculously abstract theory.
By scoping your research you ensure that you explain all those cases that are within your scope, but exclude cases where your data just doesn’t allow you any judgment. And what counts for social science also counts for language models. If you clearly state what a tool can do, and what it can’t, it is reasonable to state that it “knows what you meant (in this specific case)”. But if you have no idea what it can and cannot do, then you also cannot know whether it knows what you meant.
And in this – attention, scoping! – very specific case, GPT-3 is dumber than your good old spellchecker.