In 3.6 of King Lear, four characters take shelter in a hovel: one is mad, one is pretending to be mad, one is pretending to be someone else, and the other is a professional fool. The result is somewhat chaotic:
EDGAR Frateretto calls me, and tells me Nero is an angler in the lake of darkness. Pray, innocent, and beware the foul fiend.
FOOL Prithee, nuncle, tell me whether a madman be a gentleman or a yeoman.
LEAR A king, a king!
FOOL No, he’s a yeoman that has a gentleman to his son; for he’s a mad yeoman that sees his son a gentleman before him.
LEAR To have a thousand with red burning spits
Come hissing in upon ’em!
EDGAR Bless thy five wits.
KENT O, pity! Sir, where is the patience now
That you so oft have boasted to retain?
EDGAR My tears begin to take his part so much
They mar my counterfeiting.
LEAR The little dogs and all,
Tray, Blanch, and Sweetheart—see, they bark at me.
There is a weird poetry in the disconnected chatter of the scene that has rightly been celebrated. But it is also a very skilful dramatization of the ways in which a conversation can go wrong. While it is possible to make sense of individual speeches, they only fleeting cohere into purposeful sequences of talk. The Fool manages, briefly, to draw Lear into an exchange about madmen, only for the old man’s mind to turn immediately back to the fantasy of punishing his daughters. It is often unclear who is speaking to whom, or how much of the foregoing conversation each character has heard. Kent’s question, for example, directly follows a remark from Edgar, but is clearly addressed to Lear. (In performance, the two actors might bring this confusion out by talking over one another.) To add a final layer of complexity, some of what is said may not be heard by anyone else present, as when Edgar momentarily drops his guard to make a meta-conversational comment—presumably in an aside—on the difficulty of staying in character as Poor Tom.
With characteristic aplomb, then, Shakespeare has anticipated—by a good four hundred years—exactly what happens when more than three people try to chat informally via Zoom. The kind of interaction that would be relatively straightforward in person becomes torturously difficult. Everything takes longer. Everything requires more effort. Without careful attention to what linguists call “turn-taking,” things quickly descend into chaos.
Why this should be the case is not immediately obvious. If we can hear and see our interlocutors, if the connection is good and the lag minimal, why is it so much harder to string together rapid sequences of talk? The best way to answer that question is to turn it on its head. Properly understood, even the simplest conversation is an astonishing feat of interpersonal coordination. The remarkable thing is not that turn-taking so frequently goes wrong on Zoom, but that it ever goes right at all.
It is an observable fact that speakers are able to coordinate transitions between turns at talk to within a fraction of a second. Average response time in conversation is around 200 milliseconds. This is surprising because language production is comparatively slow—some 600 milliseconds from conception to articulation, even for a single word. Somehow the other participants appear to know in advance exactly when the current speaker will stop, what she will have said when she does so, and which of them should speak next.
To explain how this is possible, conversation analysts have come up with an awkwardly-named but brilliantly useful concept. A “transition-relevance place” is any point at which the current speaker might plausibly have finished. The end of a sentence, obviously, or some other less emphatic point of syntactical completion. But also, potentially, the punchline of a joke—or even just the moment, part way through a turn, at which the sense of the whole becomes clear. Two things matter about transition relevance places. The first is that they are projectable: it is possible to hear them coming. The second is that they are optional: the occurrence of a transition-relevance place does not necessitate a change of speaker any more than the occurrence of an exit necessitates that I come off the motorway. As the exit approaches, the possibility of my coming off becomes relevant (hence the name) but I can still choose not to take it.
A single turn may thus contain a series of transition-relevance places at which no transition occurs. Unlike a letter, or a WhatsApp message, the turn at talk is telescopic. Its length is the product of a fragile process of incremental expansion that might have stopped when it didn’t and needn’t have stopped when it did. And clustered around these potential stopping points is a series of micro-negotiations about whether this next exit is the one we will finally take. It is possible, of course, to make such things explicit: “I’ll stop there and hand over to Mike.” Most of the time, however, the exchange of turns is negotiated in ways that are largely subconscious. Intonation, gaze-direction, gesture, and facial expression, all play a part. An intake of breath or a tilt of the head can be enough to suggest that a new speaker is ready to launch. A glance upward can be enough to show that the current speaker is not yet done.
What Zoom does is to filter out much of this layer of subconscious communication. We cannot tell who anyone else is looking at, nor sense the tiny adjustments of body and face that would ordinarily help us to coordinate the exchange of turns. If you combine that with even a tiny lag, the whole exquisitely calibrated system begins to malfunction. And thus Zoom turns us all to fools and madmen.
Feature image by Tim Gouw