Block: U+0080..00FF "Latin-1 Supplement"
The Basic Latin Unicode block (U+0000..007F) is fine if you're writing English, but it quickly runs out of steam for other languages. For example, the Icelandic alphabet ("stafrófið") has 32 letters:
Aa Áá Bb Dd Ðð Ee Éé Ff Gg Hh Ii Íí Jj Kk Ll Mm Nn Oo Óó Pp Rr Ss Tt Uu Úú Vv Xx Yy Ýý Þþ Ææ Öö
The fifth letter "ð" is "eth", seen here in a sign in Landmannalaugar:
One of Unicode's founding principles is "universal repertoire" and, indeed, "LATIN SMALL LETTER ETH" has been assigned the unique codepoint U+00F0. Before Unicode, region-specific characters had a tendency to "move around" in codepoint space. For instance, in IBM DOS Code Page 861, small eth was at position 0x8C.
For every codepoint, the Unicode Character Database maintains a plethora of information. For U+00F0, we can view that data using an online utility:
https://util.unicode.org/UnicodeJsps/character.jsp?a=00F0
Amongst other things, we see that:
- The official name of that codepoint is "LATIN SMALL LETTER ETH"
- It belongs to the "Latin-1 Supplement" block (U+0080..00FF)
- It primarily belongs to the "Latin" script
- It was introduced in Unicode 1.1
- Its General Category is "Lowercase Letter"
- Its uppercase mapping is "Ð" U+00D0
- Its titlecase mapping is also "Ð" U+00D0
- etc.
The titlecase mapping is somewhat moot as there are no words in Icelandic that begin with "eth". This makes children's "A is for Apple"-style alphabet posters somewhat difficult to produce:
It also means that the uppercase letter "Ð" (U+00D0) only occurs in text where the whole word is uppercase, such as … erm … "STAFRÓFIÐ".
No comments:
Post a Comment