Humans are very good at reading from start to finish and collecting lots of information to understand the aggregated story a text tells, but they are very bad at keeping track of the details of language in use across many texts. Computers, in contrast, are very good at this level of detail through counting and more advanced statistical analyses while being very bad at understanding a story told in a text or a collection of text. One such statistical approach, collocational analysis, is a way to compute highly-likely lexical combinations which are likely to appear in a highly salient way and can be used to find patterns (or lack thereof) in a collection of texts such as all of Shakespeare’s plays.
Collocational relationships are a statistically predicted measure of whether co-occurrence appears more often than by chance. In any given text or collection of texts, co-occurrence measures the probability of word1 occurring in a text that has word2, divided by the probability of word2 occurring in a text that has word1. The conditional probability will be scored on a scale of 0-1; a score of 0 means that word1 never appears next to word2 and a score of 1 means that word1 always appears next to word2.
While this metric may not necessarily be truly indicative of Shakespeare’s use of language, it can certainly make suggestions about how, for example, Shakespeare discusses men and women in a variety of contexts in his plays. Gendered nouns, in particular, provide a set of convenient binaries for a range of registers, such asman/woman (unmarked), lord/lady (higher in register) knave/wench (lower in register). These terms are all generally understood to be semantic equivalents in the Historical Thesaurus of the Oxford English Dictionary.
Using one method of computing collocational relationships included in the Wordhoard software package for Shakespeare’s plays, you can see which words only appear with one from each binary (lord/lady etc.), and how likely you are to encounter them.
The Dice Coefficient Test
One test, the Dice Coefficient test (see the Wordhoard Help Files describing collocational analysis), triangulates nicely with three other included statistical tests in the software package. Because multiple collocational measures show very similar results to the Dice Coefficient Test, one can infer that this is quite unlikely to be an artifact of the statistical measurement used. As a result, the collocation relationships described below suggest ways in which Shakespeare uses gendered nouns, but cannot be considered absolute and conclusive facts about his use of them.
Comparisons across each binary pair are therefore possible: man compared to woman, lord compared to lady, and knave compared to wench, and if a collocate is found for man but not woman, it is considered unique. But comparisons across formality by gender are also possible: one can compare unique collocates for woman when compared to those for lady and those for wench. Table 1 explores unique collocations for each binary pair and across register.
As we can see, woman is associated with largely negative adjectives whereas the adjectives associated with lady are all highly complimentary and overwhelmingly positive. Moreover, these negative collocates for woman are predominantly native to English: ‘fat’, ‘foolish’, ‘mad’, ‘waxen’, and ‘weak’, whereas the positive collocates for higher-register lady are all Latinate in root, with examples such as ‘sovereign’, ‘beauteous’, ‘virtuous’, and ‘honourable’. In contrast, man is found alongside largely positive terms (‘good’, ‘proper’ ‘young’, ‘honourable’) or part of recognizable phrases (‘no man’, ‘poor men’, ‘dead man’). Meanwhile, lord’s unique collocates includes highly-frequent function words (including ‘my’, ‘of’, ‘what’, ‘if’, ‘and’, ‘you’, ‘will’) which serve as the building blocks of most language, alongside two more recognizable phrases: ‘lord cardinal’ and ‘dear lord’.
There are very few examples listed for wench, not because there is a huge overlap between wench and knave but because these are the only unique collocations available. Meanwhile, the words used to describe knave are overwhelmingly negative adjectives (‘lousy’, ‘cuckoldy’, ‘lazy’, ‘rascally’) and are easily recognized as ways to build rather recognizable phrases about knaves. Meanwhile, the example of ‘kitchen wench’, a particularly misogynistic phrase, appears only twice in the corpus: once in Comedy of Errors and once in Romeo & Juliet.
Lexical realizations in Shakespeare’s corpus
Although each pair of binary search terms can be considered semantic equivalents, they have quite different lexical realizations in Shakespeare’s corpus. The high Latinate phrasing suggested through collocates for lady is unavailable to the lower status woman and wench through the same methodology. Lord is the only noun investigated which has a strong relationship with very common words in any English text. Meanwhile, man and knave show very stratified descriptions of men across register: man is more likely to describe qualities of men, whereas knave is more likely to be supplementary with regards to low-status individual in question. This practice suggests that social class is stratified at the level of socio-pragmatic lexical relationships in ways which are not visible to linear readers of Shakespeare’s plays, quite contrary to what a reader of Shakespeare’s plays may expect.
Image Credit: “Ophelia” by John William Waterhouse. Public Domain via WikiArt.