Wednesday 12 January 2022

Unicode Trivial U+0256

Codepoint: U+0256 "LATIN SMALL LETTER D WITH TAIL"
Block: U+0250..02AF "IPA Extensions"

The "Ð" (U+00D0 "LATIN CAPITAL LETTER ETH") that we first met in the Icelandic alphabet is easily confused with other Unicode codepoints:

  • "Đ" (U+0110 "LATIN CAPITAL LETTER D WITH STROKE")
  • "Ɖ" (U+0189 "LATIN CAPITAL LETTER AFRICAN D")

These three codepoints are considered distinct according to the Unicode standard but are typically rendered (almost) identically.

The lowercase mapping of the African D (which sits in the "Latin Extended-B" block) is "ɖ" (U+0256 "LATIN SMALL LETTER D WITH TAIL") which sits in the "IPA Extensions" block. Note that the lowercase is not called "LATIN SMALL LETTER AFRICAN D" as you might expect. However, the uppercase mapping of "LATIN SMALL LETTER D WITH TAIL" is indeed "LATIN CAPITAL LETTER AFRICAN D", so they're obviously a pair.

At first, I thought the African D naming inconsistency was because the two blocks ("Latin Extended-B" and "IPA Extensions") were added to the Unicode standard at different times. But the UCD tells us the codepoint were both in the original version 1.0, though under differently-named blocks.

An alternative reason for the codepoint naming inconsistency may be due to history of the African D itself.

The African D is the sixth letter of the International African Alphabet developed in 1928 by Diedrich Hermann Westermann and others. This alphabet was a precursor to the African Reference Alphabet (1978) and the World Orthography alphabet (1948).

International African Alphabet

The International African Alphabet was itself developed from the International Phonetic Alphabet which has been evolving since 1888. However, the IPA famously does not have uppercase versions of its letters. Fortunately, the International Institute of African Languages and Cultures had already prepared such a mapping in "The Practical Orthography of African Languages" (1928) [transcript]:

[source]

The International Phonetic Alphabet does not have a dedicated block of its own in Unicode as its characters are mainly "borrowed" from other sources (e.g. Latin, Greek and Cyrillic blocks). During the original compilation of Unicode, any phonetic character that wasn't extant elsewhere was added to the new "Standard Phonetic" (later renamed "IPA Extensions") block. The original code charts show U+0189 (p.185 "Extended Latin") and U+0256 (p.189 "Standard Phonetic") with their current names and the expected case mapping between them.

In a Unicode meeting (ISO/IEC JTC 1/SC 2/WG 2 on 1994-06-01 item N 989), a suggestion was made to rename U+0189 to "LATIN CAPITAL LETTER OF LETTER D WITH TAIL", but this was withdrawn for unknown reasons. The alternatives are to rename U+0189 to "LATIN CAPITAL LETTER D WITH TAIL" which is obviously visually incorrect, or to rename U+0256 to "LATIN SMALL LETTER AFRICAN D". The last suggestion is problematic because I'm sure codepoint U+0256 is primarily used in the context of phonetics.

I believe the two codepoints were added to Unicode 1.0 with their current names knowingly inconsistent as a compromise. And compromises are what standards committees are all about.

No comments:

Post a Comment