A post from Shady Characters

Miscellany № 69: the hyphen resolved

But first: interrobangs. This is shaping up to be a banner year for Martin K. Speckter’s creation. Having been selected by Pearson, the giant publishing firm, to form the nucleus of its new logo, the interrobang now pops up at Ditchling Museum of Art and Craft as the title of an upcoming exhibition of letterpress printing.

Oddly, the interrobang itself is nowhere to be seen on Ditchling Museum’s website (‽), but I can just about forgive their omission because of the pleasing conjunction that this all represents. The interrobang was the subject of the second chapter of Shady Characters book, but what was the subject of the first? It was the pilcrow (¶), of course, the preferred shady character of Ditchling’s most famous resident: the late Eric Gill, typographer, sculptor and posthumous scandal-merchant, who lived in the village for more than a decade.1

I’ve contacted Ditchling Museum to ask why they chose the interrobang to represent their exhibition, and whether they were aware of Eric Gill’s fondness for unconventional marks of punctuation (and, let’s face it, his fondness for unconventional everything else, too), but they have yet to reply. You’ll be the first to know.

Elsewhere, Gunther Schmidt of lexikaliker.de wrote to tell me about a recent article on Maria Popova’s reliably thought-provoking website Brainpickings in which Maria discusses a rather striking book of poetry. In the Land of Punctuation (originally Im Reich der Interpunktionen) was written in 1905 by the German poet Christian Morgenstern as a satirical trip through an imagined realm of punctuation, but this new edition has been translated into English by Sirish Rao and illustrated — spectacularly — by Rathna Ramanathan.

Maria elaborates on the book’s origins:

Morgenstern, a sort of German Lewis Carroll who crafted literary nonsense with an aphoristic quality and a touch of wry wisdom, was in his early thirties when he wrote the poem — a jocular parable of how dividing a common lot into warring subgroups produces only devastation and no winners. That he died mere months before the start of WWI only lends the piece an eerie air of prescient poignancy.

An illustration by Rathna Ramanathan, taken from In the Land of Punctuation, published by Tara Books. (Image courtesy of Tara Books.)
An illustration by Rathna Ramanathan, taken from In the Land of Punctuation, published by Tara Books. (Image courtesy of Tara Books.)

Accompanying all this are Ramanathan’s excellent illustrations. In the Land of Punctuation is available now from Tara Books.

Lastly this week, reader Ben Denckla got in touch to point me towards his in-depth account of preparing an e-book by scanning an original printed edition — and the frequent punctuational conundrums that crept in as he did so. The book was Lionel Lord Tennyson’s 1933 autobiography, From Verse to Worse, and you can read about Ben’s travails in bringing it to fruition here.

Ben writes that he often runs into trouble at hyphenated line breaks: should “un-ionised”, broken across a line, be interpreted as “unionised” or “un-ionised”? A human copyeditor could almost certainly pick the right one (God help them if not), but a computer without the appropriate natural language processing abilities is stumped. What Ben wonders, then, is this: is it time for a graphical distinction between line-end and compound-word hyphens? That is, where a single word is broken across a line, should the resulting line-end hyphen be shown as, say, a tilde (∼)* rather than a plain hyphen (-)? Broken across a line, “un∼ionised” would be correctly understood to mean “unionised”, while the compound word “un-ionised”, with a conventional hyphen, remains “un-ionised”.

This has to be one of the few situations where a proposed new mark of punctuation clarifies a genuinely problematic area of typography, and I must thank Ben for telling me about it. What say you?

Unknown bibtex entry with key [FM1989-115] ↩︎
Swung Dash,” Merriam-Webster Online, 2016. ↩︎
Ironically, I’ve had to use a “swung dash” (∼)2 in place of a standard tilde (~) to find a glyph that is suitably distinctive. I think Ben’s point stands, however. ↩︎

24 comments on “Miscellany № 69: the hyphen resolved

  1. Comment posted by Brian on

    Ben’s point is worthy, but I would be much more inclined to support a system where, say, “unionised” is broken across lines with a hyphen (“un- | ionised”), but “un-ionised” is broken across lines with an en dash (“un– | ionised”). This would read more naturally, but still make enough of a distinction for the reader to see, when it matters.

  2. Comment posted by Zeissmann on

    I once ran into this problem when typesetting some project in TeX. The most obvious answer to me, and I still think it’s the best way out without putting all typographic conventions on their head, is to simply repeat the hyphen when dealing with a hyphenated word: un-/ionised = unionised and un-/-ionised = un-ionised. Isn’t it obvious? Why multiply entities beyond necessity?

    1. Comment posted by Keith Houston on

      Hi Zeissmann — I think you, Ben and Brian are all largely in agreement, except as to the visual appearance of the “new” hyphen. Perhaps we need a hyphen-off to choose the best option.

      Thanks for the comment!

    2. Comment posted by Hugh Greene (no relation) on

      I’ve actually seen this in print, I think in a Graham Greene novel in my high school library. That would have been nearly 15 years ago, and I’m sure the book was a good few years old at the time. It struck me then as a really good idea, and some good an image of the book comes to mind: leaf green hardback, without a slip cover.

    3. Comment posted by Keith Houston on

      Hi Hugh — it’s good to know that some compositors, at least, have tackled this before. Thanks for the comment!

  3. Comment posted by Dick Margulis on

    I think readers, being human and having context, are rarely confused about that hyphen. I suppose it happens, but perhaps not often enough to shoot a bazooka at the problem. For pages transmitted digitally, the two hyphens are already distinguished by different code points. The only situation where there is frequent confusion is in the scanning of old texts by automated systems, and as the old texts already exist with their ambiguous hyphens, introducing a new convention isn’t going to fix them. I think this is a situation that cries out for being left alone. Hire a copyeditor and be done with it.

    1. Comment posted by Keith Houston on

      Hi Dick – I take your point, but I think it’s a question of facilitating the reader’s “flow” as much as anything else. Even when it’s obvious which meaning to choose for an ambiguously-hyphenated word, that split second of decision making is still a speed-bump in the reading process. Surely a tilde (~), en dash (–) or double hyphen () would help ease the pain without offending the eye too much?

      Thanks for the comment!

  4. Comment posted by Michel Fioc on

    The same problem may occur at a slash or a dash (if the latter is not surrounded by spaces), at an underscore in a code variable, at a dot in an electronic address or URL, and between digits in a long number. The best solution would probably be to insert the same unusual character at the end of a line every time it is not broken at a space, and to move the hyphen/dash/slash, etc., at the beginning of the following line. The Mathematica software uses three dots in diagonal for this, if I remember well.

    1. Comment posted by Keith Houston on

      Hi Michel – that’s certainly an idea. I wonder, though, if perhaps those other examples (URLs, inline code, long numbers) are already visually distinct enough to carry off being broken across a line without any special treatment? For me the difficulty with the hyphen as it stands is the potential for misunderstanding, not necessarily how it is typeset.

      Thanks for the comment!

  5. Comment posted by Mark Hougaard on

    Another way to address the issue is to not use line break hyphens. In fact, most of my writing education and experience in the past had forbidden or highly discourage the line break hyphen as either being lazy or recognizing the inherent disambiguation that Ben notes. Certainly, in the technical papers I wrote, I was dissuaded from its use.

    The newspaper is the only place I’ve seen the necessity of its use (and abuse, to near comical effect) is to compensate for the limited column width of the medium. And these days, what’s a newspaper? Maybe it’s time for the line break hyphen to pass into the halls of history rather than to introduce a new punctuation paradigm to compensate for a dying medium.

    1. Comment posted by Keith Houston on

      Hi Mark — I’d be entirely with you were it not for the aesthetic need for the hyphen. Yes, it can be ambiguous and yes, we could do without it, but it’s one of the touches that elevates a typeset page (or newspaper story, or website) from good to great. A ragged right margin might be good for readability but too ragged and it starts to get distracting — and the hyphen is the thing that rescues the situation.

      Thanks for the comment!

    2. Comment posted by Mark Hougaard on

      I don’t know about good to great, most newspapers would lead me to think from good to near unreadable, but that’s personal. In the days before WYSIWYG word processors, I needed to use the typesetter language troff to write (I honestly can’t remember how many) technical papers, and found the use of the “fill” function to be quite effective to eliminate the ragged right margin. And, yes, I will admit, even the fill function can be overused. So, maybe a compromise between the two.

      A side note, the biggest problem with troff was the amount of time spent on formatting and not content. But, oh my, the control was glorious!!

    3. Comment posted by Keith Houston on

      For me, writing a paper at university meant LaTeX. It felt almost like letterpress printing: there was far more control to be had than with a semantic markup system such as HTML (alright, mostly semantic), and it was bracingly honest in comparison to the misleading friendliness of Word and other word processors.

      I’ve made my peace with Word since then (I’ve had to, in order to produce manuscripts in the form that my publisher wants them), but as a software engineer by trade I yearn for a decent hybrid of semantic and presentational markup. Markdown plus Pandoc looks promising, if only it can manage footnotes and references sensibly.

      Thanks for the comment!

  6. Comment posted by Ben Denckla on

    Here are some great tweets from Rich Greenhill showing examples of hyphen disambiguation in the wild (both examples are from dictionaries, where, not surprisingly, it is particularly desirable to avoid ambiguity):



    Like readers/commenters Brian and Ziessmann suggest above, Merriam-Webster wisely opts to do something special only in the case of a hyphen that is both line-ending and word-joining. In the case of M-W, this “something special” is using a double (stacked) hyphen, a bit like an equals sign with a slight northeast tilt. Or as they put it (a bit awkwardly, in my opinion):

    A double hyphen at the end of a line in this dictionary stands for a hyphen that belongs at that point in a hyphenated word and is retained when the word is written as a unit on one line.

    I don’t have hard evidence to support this, but I suspect this is the right thing to do if you want to minimize doing something special, i.e. leave most text as-is. I.e. I suspect that the case of “overloaded” hyphen (both line-ending and word-joining) is much rarer, in most English texts, than a hyphen that is only line-ending.

    On the other hand, it should be noted that this makes the meaning of plain hyphen, though unambiguous, still position-dependent, i.e. a plain hyphen means something different depending on whether it is line-ending or not. Not a big deal, as both computers and humans can easily deal with position-dependent meaning, but worth noting.

    A couple of additional notes:

    Sadly, contrary to what reader/commenter Dick Margulis suggests above, OCR is very much with us even for documents produced today. OCR is the lowest common denominator in a world of non-standards. For instance to my dismay, many ebooks are produced from quite recent paper books by doing OCR on scans of the paper book.

    On another note, the current CSS support for hyphen disambiguation (hyphenate-character) only allows a special character for hyphens that are only line-ending. I.e. it does not permit a system like that of M-W that uses a special character only for “overloaded” (both line-ending and word-joining) hyphens. In addition to being supported by the version of WebKit used by Apple’s Safari browser, hyphenate-character is now supported by the PrinceXML typesetting software. To my knowledge this is the only print-oriented software that currently has such a feature. InDesign experts please correct me if I’m wrong.

  7. Comment posted by Solo Owl on

    I vote for ragged right margins. It is simpler. No need to remember where to hyphenate. (Of course this does not help in optical scanning of justified text, where human proof reading is essential anyway.)

    1. Comment posted by Solo Owl on

      “does NOT help” — sorry!

    2. Comment posted by Keith Houston on

      I’m definitely on the side of ragged-right settings for the web, if only because the measure can change so much across devices.

      (Also, thanks for the note. It’s fixed now.)

    3. Comment posted by Ben Denckla on

      I agree with Solo Owl that proofreading of OCRed texts is essential. But it is important to note that many influential people disagree, most notably, and shockingly, those in charge of making ebooks for big publishers.

      I have been trying for years, without success, to get people upset about this issue, or at least get them to be aware of it. In my opinion the news world is too concerned with ebook pricing (endless articles about the Apple price-fixing lawsuit, the Amazon–Hachette controversy, etc.) and not concerned enough with ebook quality. Also, I guess I was naive, but I thought discussion of price without reference to quality makes no sense.

      One of the very few honest reports of ebook quality problems from an insider was recently posted here:


      The most notable quote in that post, from the perspective of this issue, is: “you have hundreds or thousands of backlist ebooks … with poor-quality OCR, in outdated formats, never proofread” (emphasis mine).

    4. Comment posted by Ben Denckla on

      On the topic of ragged right vs. justified, I just wanted to note a couple of things.

      (1) It may be obvious, but in case not, remember that ragged right vs. justified and hyphenated vs. not hyphenated are separate decisions. In most cases justified without hyphenation doesn’t work well, but it can be done. A rarely-used option is hyphenated ragged right, which one might call “less ragged right.”

      (2) Justification is often discussed as if its goal were to create rectangular text. While this is not wrong, per se, you can view it as having a different goal: to maximize the space between words, given a certain margin. In this light, the rectangles that result are merely consequences of pursing maximal spacing, not ends in and of themselves. These two perspectives, the macro (e.g. rectangular text) and the micro (e.g. word spacing), and the trade-offs between them, are where much of the interesting debates in setting text lie. Note that such issues have little to do with fonts, where much of the uninteresting debates in setting text lie.

    5. Comment posted by Keith Houston on

      Hi Ben — I’m a professed “less ragged right” aficionado. I’ll be using that term in future!

      With regards to word spacing, surely the goal for justified text is not to seek a maximal word spacing — that would be easy to accomplish with just two words per line, one hard left and one hard right — but rather to minimise the deviation from some ideal word spacing across all words in a paragraph while still maintaining justification? TeX’s justification mechanism does just that.

    6. Comment posted by Ben Denckla on

      Thanks Keith, good point. I guess implicitly what I had in mind was a naive justification strategy which takes ragged right line breaks as a given and just stretches the lines to create a rectangle / maximize word spacing. I suspect this is what most web browsers and ebook readers do.

      Maybe I should have made my point more simply: in comparing it to ragged right, people tend to emphasize the rectangularity of justified text rather than its much higher average word spacing. In other words they tend to emphasize the macro rather than micro effects of justification. This may be a designer-centric rather than reader-centric emphasis. I.e. perhaps designers spend more time thinking about macro features, whereas readers (consciously or not) spend more time dwelling in the details.

      I’ve ignored a variable exploited by advanced justification strategies: character spacing. I.e. advanced justification strategies subtly adjust both character and word spacing. Though in some cases they may space words and/or characters more tightly than the ragged-right equivalent, in general I think they result in higher average character and word spacing.

    7. Comment posted by Keith Houston on

      Ah, I see what you mean. Agreed! Apparently web browsers still don’t use a decent line-breaking algorithm — they resort to the simple, “greedy” algorithm you describe above.

      And yes, adjusting letterspacing to improve justification seems to be pretty common in high-end typesetting tools. Hermann Zapf wrote about it, and I think Donald Knuth may have appropriated it for TeX.

    1. Comment posted by Keith Houston on

      Snakes! Visually distinctive and, if I may say so, practical too.

Leave a comment

Required fields are marked *. Your email address will not be published.

Leave a blank line for a new paragraph. You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>