To Quote or Not to Quote

How Repeated Citation Makes Shakespeare Legible (or Not)

data: JSTOR Labs | texts: Folger Library | visualizations: Derek Miller


Here you will find all of the dialogue from all of the plays by William Shakespeare.

But this version of Shakespeare's plays is different from those you usually find online. On this site the plays are deliberately hard to read. That's because I've displayed them so that they reflect not simply the texts but rather how people cite Shakespeare's words in their own writing.

Using an API built by JSTOR Labs, I gathered the number of times every line from every play has been cited in JSTOR's journal collection. (JSTOR Labs explains in detail how they created their dataset. I restricted my search to 85% similarity and a match length of 20 characters.) With that data, I've calculated the average number of citations per line, normalized those results between 0 and 1, and then made the underlying text on this site more or less clear (I'll call this "fuzzifying"), as a function of the normalized citations per line calculation.

To understand better this process, look at the menu on the left. I ran this operation first on every play in the Shakespeare canon. If you glance at the list of play titles, you can see the results. Hamlet stands out clearly and firmly: it has the highest citations per line of all the plays and thus is fully legible. Other highly cited (and legible) plays include The Merchant of Venice and Macbeth. Meanwhile, Love's Labors Lost and Two Noble Kinsmen are barely legible. (The plays are listed here in one of the common compositional chronologies.)

Then I iterated this process at each level of the text. What does that mean? Well, if you click on a title, you'll see a submenu for that play. Each act is fuzzified to represent the normalized citations per line for that act relative to other acts in the play. And each scene within each act is fuzzified to represent the normalized citations per line for that scene relative to other scenes in that act. Finally, if you select a scene itself, you'll see the text for that scene, with each line fuzzified relative to the normalized citation count for lines in that scene. (The play texts are all courtesy of the Folger Library's Digital Texts, which JSTOR Labs used for their data query.) If you want to see the whole play at once, just click the play's title.

Use the options box in the lower right to modify the calculation that fuzzifies the play title list. The default is citations per line. But you can also fuzzify by the total number of citations, the percentage of a play's lines that are cited at least once, or the percent of all citations that are from each play.

View charts summarizing this data here.

Look at the 100 most-cited lines here.

You can always return here by selecting "To Quote or Not to Quote" at the top of the page.

I hope this project helps us to see (a) how much Shakespeare we don't cite (or, at least, don't cite very often); and (b) that our unequal citational practices are fractal. By this I mean that, no matter how you slice the text, we cite some subset of the text with far more frequency than other sections. This is true at the level of the scene, act, play, and corpus. Every time you see one or two sharply defined items and a passel of other blurry items, you're witnessing the effects of those uneven citational practices. And no matter what level of Shakespeare's work you look at, you will find that same inequality repeated.

So, click around and enjoy reading (or trying to read) the Shakespeare that citational practices bring us.

Fuzzify Play Titles by

  • Citations Per Line
  • Total # Citations
  • % of Lines Cited
  • % of All Citations