Some words are much more frequent than others. For example, in a sample of almost 18 million words from published texts, the word can occurs about 70,000 times, while souse occurs only once. But can doesn’t just occur more frequently – it’s also much more ambiguous. That is, it has many possible meanings. Can sometimes refers to a container for storing food or drink (‘He drinks beer straight from the can’), but it also doubles as a verb about the process of putting things in a container (‘I need to can this food’), and as a modal verb about one’s ability or permission to do something (‘She can open the can’). It even occasionally moonlights as a verb about getting fired (‘Can they can him for stealing that can?’), and as an informal noun for prison (‘Well, it’s better than a year in the can’).
This multiplicity of possible uses raises a question: how do can, souse and other words each end up with the particular numbers of meanings they have? The answer could rest in fundamental, competing forces that shape the evolution of languages.
Remarkably, the relationship between word frequency and word ambiguity goes well beyond can and souse. In 1945, the linguist George Kingsley Zipf noticed that frequent English words are, on average, more ambiguous than less frequent words. This pattern is surprisingly robust across languages – so much so that it is sometimes described as Zipf’s meaning-frequency law. A leading explanation for this phenomenon – first proposed by Zipf himself – begins with the premise that languages are in some sense evolved to make communication more efficient. According to this view, language change is (roughly) analogous to biological evolution: just as biological species are shaped over successive generations by the demands of their environments, languages are constrained by the needs of the people who use them. In particular, languages may evolve to minimise the effort required to convey information.
At first blush, this theory might seem obvious. Presumably, languages would not change in ways that make them unusable – and who wouldn’t prefer a more efficient language over a less efficient one? Yet the picture is complicated by the fact that communication is a two-way street. Communication – whether spoken, signed or written – requires a producer (the person trying to convey a message) and a comprehender (the person trying to understand it). And, critically, what’s efficient for language producers isn’t always what’s most efficient for comprehenders.
This essay would be very challenging to read – though perhaps easier to write – if I substituted every word with the word ba
All else being equal, a producer likely prefers utterances that are shorter and easier to say: why go to the trouble of saying 10 words where one would do? This experience is likely familiar to all of us. Rather than enumerating every minute detail of an event, perhaps we use a simpler but vaguer expression, such as ‘He was there today.’ Of course, this places the burden of inference on the comprehender, who – depending on how much they already know about the situation – might prefer a more precise formulation: ‘Simon, my ex-boyfriend, came into the coffee shop where I work today.’
Zipf argued that these competing interests would manifest in the very structure of the lexicon. A producer’s ideal language (in terms of the effort required) would simply be a single word. In this language, the word ba could communicate everything from ‘A coffee, please’ to ‘The capital of France is Paris.’ As one might expect, such an arrangement would demand much of comprehenders: every linguistic encounter would effectively be an exercise in mind-reading. (Certainly, this essay would be very challenging to read – though perhaps easier to write – if I substituted every word with the word ba.) A comprehender’s ideal language, by contrast, is one in which every meaning is conveyed by a different word, minimising the possibility of confusion. Combined, the opposing forces created by the needs of speakers and comprehenders – forces that Zipf called unification and diversification, respectively – lead to trade-offs. Languages, then, must reach a compromise.
This is where Zipf’s meaning-frequency law comes in. According to Zipf, this law is a product of that compromise. We have many more words than ba, which partially satisfies the comprehender’s need for clarity. But many of those words – especially the most frequent ones – can be used to express more than one meaning, which benefits producers. Put another way: the opposing forces of diversification and unification work against each other, resulting in Zipf’s meaning-frequency law.
Yet this explanation is incomplete. Does the compromise between unification and diversification imply that these forces are equal in strength – or does one pressure exert a stronger pull than the other?
Some language scientists have argued that certain aspects of language structure, such as grammar, are shaped primarily by a producer-centric pressure to make things easier to say. Given the effort it takes to produce language – a person must ultimately translate the concepts they wish to convey into a complex series of motor commands – it stands to reason that producers will take the easy option where possible, and that grammars would evolve in ways that ensure an easy option is typically available. For example, grammatical alternations give speakers the freedom to begin a sentence with different referents (eg, either ‘The noise startled the boy’ or ‘The boy was startled by the noise’) depending on which one is more mentally salient to a speaker at any given time.
Less is known, however, about the lexicon. Finding out whether one pressure is dominant with regard to word meanings requires a neutral baseline. That is, we need a sense of how many meanings each word should theoretically have in the absence of a producer-centric or comprehender-centric pressure. Once this expectation is established, one can compare it with real data – how many meanings each word actually has. If frequent words such as can have more meanings than the baseline expectation, it suggests that a producer-centric pressure is stronger. And if words such as can have fewer meanings than you would expect them to, it suggests that a comprehender-centric pressure is stronger. This was the logic guiding recent work that I conducted with my collaborator Benjamin Bergen.
While producers and comprehenders were forced to compromise, comprehenders walked away with a slightly better deal
The most important step was figuring out how to determine a neutral baseline that does not privilege either a producer’s or a comprehender’s perspective. An excellent candidate approach is to assign an expected number of meanings to each word on the basis of the word’s phonotactic probability. Each language has rules about which sounds can start and end a word, which sounds can occur in which sequence, and so on. For example, modern English words are not allowed to begin with the onset mb–, but words in Swahili are. Because of these patterns (or phonotactics), within any given language, some words are more probable than others: they contain sequences of sounds that are more commonly found in words of that language.
The phonotactic probability of a word can be calculated using something called a Markov Model, which looks at all the words in a given language and determines which sequences of sounds are the most and least likely to have appeared in that language. From there, calculating the number of meanings a word should have in neutral conditions was straightforward: we multiplied its phonotactic probability by the total number of meanings available to words of that length.
Using this procedure, we discovered that frequent words such as can – despite already being quite ambiguous due to their number of meanings – often had fewer meanings than the baseline predicted. This pattern generalised across the entire English lexicon, and to the other languages we tested: Dutch, German, French, Japanese and Mandarin. In each language, frequent words – though ambiguous – were less ambiguous than one would expect on the basis of their phonotactics. This is most consistent with a comprehender-centric pressure winning out. In this case, it seems that while producers and comprehenders were forced to compromise, comprehenders walked away with a slightly better deal.
From one perspective, this finding makes perfect sense. If frequent words were too ambiguous – if can had 100 different possible meanings – then comprehenders would constantly encounter an overwhelming amount of ambiguity, so much so that it might impede communication altogether. Yet it’s important to note that this result was not obvious from the get-go. It runs counter to other theories about why languages look the way they do. As noted earlier, producing language has challenges of its own, which is why some researchers argue that grammar is producer-centric. For similar reasons, one might expect these difficulties to result in a lexicon that privileges speakers: a small number of words that are very easy to retrieve and produce, each loaded with many meanings. This makes it all the more striking that a pressure to avoid ambiguity is favoured in the design of human lexica.
How do individual communicative interactions bubble up to affect the very structure of the lexicon?
Moving forward, language scientists can try to replicate this result in a larger sample of languages, including those from language families such as Niger-Congo or Austronesian. They can also ask how this pressure relates to previously observed examples of ambiguity avoidance. For example, recent work found evidence of ambiguity avoidance in historical sound change. Sometimes, different sounds in a language ‘merge’ over time, meaning that they are no longer treated as distinct (for example, cot and caught are now pronounced the same way in some English dialects). Yet, according to Andrew Wedel and colleagues, mergers are less likely when they would create many homophones in a language – a prime example of how ambiguity avoidance might shape processes of language change.
Further, there are still deep questions in the field of language evolution about exactly how these large-scale language changes come about at the local level. Concretely, how do effects at the level of individual communicative interactions bubble up to affect the very structure of the lexicon? In the case of what we studied, one speculative possibility is that language comprehenders struggle to understand any word that has too many possible meanings in addition to the intended one. Over time, speakers might eventually use different words to express that meaning. If the comprehension errors are sufficiently systematic, these effects might be observed across many different interactions and individuals, thereby suppressing the ambiguity of the original word.
Languages are dynamic creatures. They change over time, sometimes in ways that appear inscrutable. Yet work on language evolution has revealed that these changes are often systematic. Communication systems are shaped in fundamental ways by competing forces that we observe in everyday conversation, including a speaker’s desire to say something simply and a comprehender’s desire to avoid ambiguity. As such, languages reflect a long history of trade-offs and compromises.