Let’s try an experiment. If we start with some large body of text — postwar American novels, say, or twentiethcentury British newspapers — and count all the occurrences of all the words in those texts, we can put together a fairly accurate list of the most popular words in English. The word “the” would be at the top, followed by “of” and then “and”. With this list of word counts in hand, you could turn to any other similar body of work — British novels or American newspapers, for example — and have a good idea of how often you’d expect to find each of the words on your list. Simple enough.
Next, imagine that you throw away those word counts. You keep only the list of words themselves, ordered from most to least common. You don’t know if “the” occurs twice as often as “of” or a hundred times more than it. It turns out that you can still predict how often you’re likely to encounter a given word: knowing only that “the” is the most common word, “of” is second most common, “and” is third, and so on, it is possible to guess with quite startling accuracy exactly how likely you are to encounter a given word. The mathematical relationship that underpins all this is called Zipf’s Law, named for its discovery in the 1930s by Kingsley Zipf, a professor of German at Harvard,[7] and it is very simple indeed. Eric Weisstein’s excellent Mathworld site explains it as follows:
In the English language, the probability of encountering the rth most common word is given roughly by P(r) = 0.1/r for r up to 1000 or so.[8]
To put some numbers on it, you should encounter the word “the” around every ten words, equating to a probability of 0.1/1 = 0.1; “of” should occur every twenty words or so, from 0.1/2 = 0.05; “and” will appear once every thirty words or thereabouts, from 0.1/3; and so on. This is an instance of what is called an inverse power law, and if you plot these numbers on a logarithmic scale you get a shockingly straight line. Here’s an example of the raw numbers for the fifty most common words in the socalled Brown Corpus, a millionword collection of texts compiled between 1964 and 1979:[9]
Word counts (blue) in the Brown Corpus, ordered from most to least common. Also shown are the expected word counts according to Zipf’s Law (green). (Image by the author.)
Not bad, I think. I’ve overlaid the expected word counts (in green) as predicted by Zipf’s Law, and it looks fairly convincing. If we make each axis logarithmic rather than linear, we get this:
Word counts (blue) in the Brown Corpus, ordered from most to least common. Also shown are the expected word counts according to Zipf’s Law (red). Both x– and yaxes are logarithmic rather than linear; values on the horizontal axis correspond to the rank of the words in question. (Image by the author.)
Better! The maths behind this are quite involved, but the effect of viewing the data on logarithmic axes is to show the perfectly straight line predicted by Zipf’s Law. Again, our data looks good — not a perfect fit, but our actual word counts conform to the predicted values relatively closely. So far, so good. It look as though Zipf’s Law is in full effect in our millionword test case.
Now the weird thing about Zipf’s Law is that is can be arrived at only by observation. There are no verbs, conjunctions and or definite articles out there in nature, waiting for their physical properties to be discovered; our ancestors made them up as they went along and yet somehow we have constructed a language that adheres uncannily to an abstract mathematical idea. Why should the word “the” occur twice as often as “of”, three times as often as “and”, and so on? Noone really knows.
What is even odder is that inverse power laws crop up again and again in what should, by rights, be entirely random groups of things; Zipf’s Law is to words what Benford’s Law is to digits, and Benford’s Law is absolutely everywhere. The distribution of digits in house numbers, prime numbers, the halflives of radioactive isotopes, and even the lengths in kilometres of the world’s rivers all follow inverse power laws, with the digit 1 being most prevalent by far and the others falling off behind it. Benford’s Law is so reliable that economists use it to detect fraud: if they don’t see a logarithmic distribution of digits in a given set of accounts, with 1 enthroned at the top, they know that someone has been doctoring the figures.[10]
My thought, then, was this: does punctuation follow some variant of Zipf’s Law? If we count all the marks of punctuation in some suitably large dataset of English texts, do we see a logarithmic distribution in them? There are many fewer unique punctuation marks than there are words, of course, but then Benford’s Law works quite happily with only ten digits to play with. It’s intriguing to wonder: were the writers and editors who invented the comma, full stop and apostrophe moved by the same inexplicable law that governs baseball statistics, the Dow Jones index and the size of files on your PC? I wrote a computer program to find out.
I started by looking at the Brown Corpus, but given that it contains a paltry million words or so there aren’t all that many punctuation marks to be found. I turned instead to Project Gutenberg, which makes outofcopyright books available in a variety of formats, and downloaded twelve of the most popular works. Next, I counted the occurrences of all marks of punctuation and plotted them both as raw numbers and as loglog graphs of their occurrences and rank numbers of those same values. Here’s the equivalent of our first graph, only for marks of punctuation rather than words:
Punctuation mark counts (blue) in a selection of works from Project Gutenberg, ordered from most to least common. Also shown are the projected counts (red). (Image by the author.)
Well then. This looks familiar.
We’ll come to the red line in a moment, but let’s stick with the blue line for now. It represents the number of times that each of the marks of punctuation along the xaxis occurred in my ad hoc Project Gutenberg corpus, with the comma in pole position and the full stop around 50% behind it. There’s a bit of a jump down to the paired quotation mark, but the fact that the quotation mark is up there at all is doubtless to be expected from the dialogheavy novels that make up the bulk of the works I analysed. The semicolon is is fourth position, likely because my texts are predominantly of the nineteenth century, and the apostrophe follows it in fifth.
Now to the red line. If you remember, Zipf’s Law says that the probability P of encountering a word with ranking r is given by P(r) = 0.1/r. Guessing that there’s a similar distribution for punctuation marks, I played around with a variety of different values for the numerator of the fraction, eventually settling on 0.3 as a reasonable proposition. The red line, then, is my predicted distribution of punctuation marks, as given by the equation P(r) = 0.3/r. Enter Houston’s Law, I guess…? Not great, but not terrible either; a larger corpus and some more sophisticated mathematics would likely produce a better number.
If we play the same trick as above, making both x– and yaxes logarithmic to smooth out the curve, this it what we see:
Punctuation mark counts (blue) in a selection of works from Project Gutenberg, ordered from most to least common. Also shown are the projected counts (red). Both axes are plotted on a logarithmic scale. (Image by the author.)
The first ten punctuation marks, then, follow a Zipfian distribtion in a quite striking way. The unhelpful behaviour of the last few marks (from ‘*’ to ‘%’) may well be because they’re either logograms or nonstandard marks of punctuation; why the colon is underrepresented, however, I’m not sure. Even so, this is all rather startling. Punctuation marks are Zipfian to a large degree, just like words; the frequency with which we use them obeys the same eerily ubiquitous inverse power law distribution, and I am none the wiser as to why. If ever there was a time to weigh in, commenters, this is it! What’s going on here, and why?

[1] J. Wilkins, An essay towards a real character, and a philosophical language., Printed for S. Gellibrand [etc.], 1668. <http://www.worldcat.org/oclc/4088592> Bibtex
@book{JW1668,
author = {Wilkins, John},
keywords = {irony,shady{\_}characters},
mendeleytags = {irony,shady{\_}characters},
publisher = {Printed for S. Gellibrand [etc.]},
title = {{An essay towards a real character, and a philosophical language.}},
type = {Book},
url = {http://www.worldcat.org/oclc/4088592},
year = {1668}
}

[2] P. Wright Henderson, The life and times of John Wilkins, Edinburgh, London: W. Blackwood and Sons, 1910. Bibtex
@book{WrightHenderson1910, address = {Edinburgh, London},
author = {{Wright Henderson},
P},
publisher = {W. Blackwood and Sons},
title = {{The life and times of John Wilkins}},
year = {1910}
}

[3] “Great Fire of London,” in Encyclopaedia Britannica. Chicago: Encyclopaedia Britannica, 2015. <http://www.britannica.com/event/GreatFireofLondon> Bibtex
@misc{eb2015greatfire, address = {Chicago},
booktitle = {Encyclopaedia Britannica},
publisher = {Encyclopaedia Britannica},
title = {{Great Fire of London}},
url = {http://www.britannica.com/event/GreatFireofLondon},
urldate = {20151118},
year = {2015}
}

[4] R. Lewis, “The publication of John Wilkins’s Essay (1668): some contextual considerations,” Notes and Records of the Royal Society of London, vol. 56, iss. 2, pp. 133146, 2002. Bibtex
@article{lewis2002publication, annote = {(privatenote)Wilkins sent a copy of his manuscript to John Pell, likely inventor of the division sign.},
author = {Lewis, R},
journal = {Notes and Records of the Royal Society of London},
keywords = {irony,obelisk,shady{\_}characters},
number = {2},
pages = {133146},
publisher = {The Royal Society},
title = {{The publication of John Wilkins's Essay (1668): some contextual considerations}},
volume = {56},
year = {2002}
}

[5] J. Robertson, A clear and practical system of punctuation : abridged from Robertson’s Essay on punctuation : for the use of schools., Boston: I. Thomas and E.T. Andrews, 1792. Bibtex
@book{Robertson1792, address = {Boston},
author = {Robertson, J},
publisher = {I. Thomas and E.T. Andrews},
title = {{A clear and practical system of punctuation : abridged from Robertson's Essay on punctuation : for the use of schools.}},
year = {1792}
}

[6] J. Greenman, “A Giant Step Forward for Punctuation¡,” in Slate. Microsoft, 2004. <http://www.slate.com/articles/news{_}and{_}politics/low{_}concept/2004/12/a{_}giant{_}step{_}forward{_}for{_}punctuation.html> Bibtex
@misc{JG2004,
author = {Greenman, Josh},
booktitle = {Slate},
keywords = {irony,shady{\_}characters},
month = {dec},
publisher = {Microsoft},
title = {{A Giant Step Forward for Punctuation¡}},
url = {http://www.slate.com/articles/news{\_}and{\_}politics/low{\_}concept/2004/12/a{\_}giant{\_}step{\_}forward{\_}for{\_}punctuation.html},
year = {2004}
}

[7] “Zipf Dies After 3Month Illness,” in The Harvard Crimson. 1950. <http://www.thecrimson.com/article/1950/9/27/zipfdiesafter3month/> Bibtex
@misc{Zipf1950, booktitle = {The Harvard Crimson},
title = {{Zipf Dies After 3Month Illness}},
url = {http://www.thecrimson.com/article/1950/9/27/zipfdiesafter3month/},
urldate = {20151004},
year = {1950}
}

[8] E. W. Weisstein, “Zipf’s Law.” Wolfram Research, Inc.. <http://mathworld.wolfram.com/ZipfsLaw.html> Bibtex
@misc{Weisstein2015, abstract = {In the English language, the probability of encountering the rth most common word is given roughly by P(r)=0.1/r for r up to 1000 or so. The law breaks down for less frequent words, since the harmonic series diverges. Pierce's (1980, p. 87) statement that sumP(r)>1 for r=8727 is incorrect. Goetz states the law as follows: The frequency of a word is inversely proportional to its statistical rank r such that P(r) approx 1/(rln(1.78R)), where R is the number of different words.},
author = {Weisstein, Eric W.},
keywords = {62,Mathematics:Probability and Statistics:Descriptive},
language = {en},
publisher = {Wolfram Research, Inc.},
title = {{Zipf's Law}},
url = {http://mathworld.wolfram.com/ZipfsLaw.html},
urldate = {20151004}
}

[9] N. Francis and H. Kucera, “Brown Corpus,” in The Internet Archive. 1964. <https://archive.org/details/BrownCorpus> Bibtex
@misc{brown2015,
author = {Francis, Nelson and Kucera, Henry},
booktitle = {The Internet Archive},
title = {{Brown Corpus}},
url = {https://archive.org/details/BrownCorpus},
urldate = {20151004},
year = {1964}
}

[10] E. W. Weisstein, “Benford’s Law.” Wolfram Research, Inc.. <http://mathworld.wolfram.com/BenfordsLaw.html> Bibtex
@misc{benford2015, abstract = {A phenomenological law also called the first digit law, first digit phenomenon, or leading digit phenomenon. Benford's law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ∼30{\%},
much greater than the expected 11.1{\%} (i.e., one digit out of 9). Benford's law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881). While Benford's law...},
author = {Weisstein, Eric W.},
keywords = {62,Mathematics:Probability and Statistics:Descriptive,Mathematics:Recreational Mathematics:Mathematics i},
language = {en},
publisher = {Wolfram Research, Inc.},
title = {{Benford's Law}},
url = {http://mathworld.wolfram.com/BenfordsLaw.html}
}