Thursday 13 January 2022

Unicode Trivia U+02DB

Codepoint: U+02DB "OGONEK"
Block: U+02B0..02FF "Spacing Modifier Letters"

The English language doesn't really have diacritics except in loanwords (such as "café", "naïve", "façade" and "piñata") or in poetry (as with "belovèd"). As a consequence, many English-speakers struggle with  the whole concept.

The many scripts and languages supported by Unicode make diacritics a thorny issue here too. Trawling though the UCD comes up with the following major instances:*

  1. ACUTE
    The acute accent:
     
    Ó
    U+00D3 "LATIN CAPITAL LETTER O WITH ACUTE"

  2. DOUBLE ACUTE
    The double acute accent (sometimes called the hungarumlaut):

    Ő
    U+0150 "LATIN CAPITAL LETTER O WITH DOUBLE ACUTE"

  3. GRAVE
    The grave accent:

    Ò
    U+00D2 "LATIN CAPITAL LETTER O WITH GRAVE"

  4. DOUBLE GRAVE
    The double grave accent (mainly used in Serbo-Croatian and Slovenian):

    Ȍ
    U+020C "LATIN CAPITAL LETTER O WITH DOUBLE GRAVE"

  5. CIRCUMFLEX
    The circumflex (easily confused with the inverted breve):

    Ô
    U+00D4 "LATIN CAPITAL LETTER O WITH CIRCUMFLEX"

  6. TILDE
    The tilde (in the Estonian alphabet "õ" is an independent letter):

    Õ
    U+00D5 "LATIN CAPITAL LETTER O WITH TILDE"

  7. DIAERESIS
    The diaeresis or umlaut:
    [In Unicode, the term "DIAERESIS" is preferred over "UMLAUT"]

    Ö
    U+00D6 "LATIN CAPITAL LETTER O WITH DIAERESIS"

  8. STROKE
    The stroke (in some Scandinavian alphabets "ø" is an independent letter):

    Ø
    U+00D8 "LATIN CAPITAL LETTER O WITH STROKE"

  9. MACRON
    The macron or line above:

    Ō
    U+014C "LATIN CAPITAL LETTER O WITH MACRON"

  10. BREVE
    The breve (easily confused with the caron or háček):

    Ŏ
    U+014E "LATIN CAPITAL LETTER O WITH BREVE"

  11. INVERTED BREVE
    The inverted breve or arch (easily confused with the circumflex):

    Ȏ
    U+020E "LATIN CAPITAL LETTER O WITH INVERTED BREVE"

  12. HORN
    The horn (used in Vietnamese):

    Ơ
    U+01A0 "LATIN CAPITAL LETTER O WITH HORN"

  13. CARON
    The caron or háček (easily confused with the breve):
    [Since Unicode 1.1, the term "CARON" is preferred over "HACEK"]

    Ǒ
    U+01D1 "LATIN CAPITAL LETTER O WITH CARON"

  14. DOT ABOVE
    The dot above or overdot:

    Ȯ
    U+022E "LATIN CAPITAL LETTER O WITH DOT ABOVE"

  15. DOT BELOW
    The dot below or underdot:

    U+1ECC "LATIN CAPITAL LETTER O WITH DOT BELOW"

  16. HOOK ABOVE
    The hook above (used in Vietnamese):

    U+1ECE "LATIN CAPITAL LETTER O WITH HOOK ABOVE"

  17. LONG STROKE OVERLAY
    The long stroke overlay ("ꝋ" was a medieval abbreviation for the Latin obiit "he died"):

    U+A74A "LATIN CAPITAL LETTER O WITH LONG STROKE OVERLAY"

  18. LOOP
    The loop ("ꝍ" is used for transliterating medieval Nordic vowels):

    U+A74C "LATIN CAPITAL LETTER O WITH LOOP"

  19. BELT
    The belt ("ɬ" is used in IPA for the voiceless alveolar lateral fricative):

    U+A7AD "LATIN CAPITAL LETTER L WITH BELT"

  20. LINE BELOW
    The line below (or macron below):

    U+1E3A "LATIN CAPITAL LETTER L WITH LINE BELOW"

  21. STROKE
    The stroke ("ł" is a Polish dark L):

    Ł
    U+0141 "LATIN CAPITAL LETTER L WITH STROKE"

  22. CEDILLA
    The cedilla:

    Ç
    U+00C7 "LATIN CAPITAL LETTER C WITH CEDILLA"

  23. RING ABOVE
    The ring above or overring (used in many Scandinavian languages):

    Å
    U+00C5 "LATIN CAPITAL LETTER A WITH RING ABOVE"

  24. RING BELOW
    The ring below or underring:

    U+1E00 "LATIN CAPITAL LETTER A WITH RING BELOW"

  25. OGONEK
    The ogonek (usually applied to vowels):

    Ǫ
    U+01EA "LATIN CAPITAL LETTER O WITH OGONEK"

The Polish ogonek (literally "little tail") is applied to the letters "A" and "E":

Ąą Ęę

According to Adam Twardoch, the Polish ogonek isn't simply an accent...

It's much more a character element, just like a stem, a serif or a descent. In a vast majority of cases ogonek should be smoothly connected with the base glyph, it should be a part of the glyph.

Wikimedia Commons

If you search the UCD, you'll find 18 references to "ogonek":

  • U+0104 "LATIN CAPITAL LETTER A WITH OGONEK"
  • U+0105 "LATIN SMALL LETTER A WITH OGONEK"
  • U+0118 "LATIN CAPITAL LETTER E WITH OGONEK"
  • U+0119 "LATIN SMALL LETTER E WITH OGONEK"
  • U+012E "LATIN CAPITAL LETTER I WITH OGONEK"
  • U+012F "LATIN SMALL LETTER I WITH OGONEK"
  • U+0172 "LATIN CAPITAL LETTER U WITH OGONEK"
  • U+0173 "LATIN SMALL LETTER U WITH OGONEK"
  • U+01EA "LATIN CAPITAL LETTER O WITH OGONEK"
  • U+01EB "LATIN SMALL LETTER O WITH OGONEK"
  • U+01EC "LATIN CAPITAL LETTER O WITH OGONEK AND MACRON"
  • U+01ED "LATIN SMALL LETTER O WITH OGONEK AND MACRON"
  • U+02DB "OGONEK"
  • U+0328 "COMBINING OGONEK"
  • U+04BE "CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER"
    • Was named "CYRILLIC CAPITAL LETTER IE HOOK OGONEK" in Unicode 1.0
  • U+04BF "CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER"
    • Was named "CYRILLIC SMALL LETTER IE HOOK OGONEK" in Unicode 1.0
  • U+1AB7 "COMBINING OPEN MARK BELOW"
    • Notes include "see also combining ogonek - 0328"
  • U+1DCE "COMBINING OGONEK ABOVE"

Codepoint U+02DB is interesting. It's in the "Spacing clones of diacritics" column of the "Spacing Modifier Letters" block. This column contains six codepoints (including U+02DB) which fill in the gaps for "standalone" diacritics; that is, codepoints for diacritics that take up space without the need for the combining equivalent being applied to a letter.

So, if you're talking about ogoneks in general and want to include them in text without being attached to another glyph, you can just use U+02DB:

An ogonek looks like “˛”

There will be visible differences between this and a combining ogonek with a standard space (U+0020):

An ogonek looks like “̨

A combining ogonek with a non-breaking space  (U+00A0):

An ogonek looks like “ ̨

And a combining ogonek with a dotted circle (U+25CC):

An ogonek looks like “◌̨”

One of the few times you depict ogoneks in isolation is when you're talking about how to depict ogoneks in isolation.

* Good luck with the text rendering in your browser here!

No comments:

Post a Comment