Tuesday 11 January 2022

Unicode Trivia U+0132

Codepoint: U+0132 "LATIN CAPITAL LIGATURE IJ"
Block: U+0100..017F "Latin Extended-A"

[If you haven't already noticed, I'm trying to come up with an mildly interesting fact about one codepoint in every Unicode block. As of Version 14.0, that's 320 blocks]

The Latin script is the basis of many alphabets. Of the languages whose Latin-script alphabets can (currently) be expressed as single codepoints, here are a selection [source]:

Danish
Aa Bb Cc Dd Ee Ff Gg Hh Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Ææ Øø Åå

Dutch
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy IJij Zz

English
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz

Estonian
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Šš Zz Žž Tt Uu Vv Ww Õõ Ää Öö Üü Xx Yy

Faroese
Aa Áá Bb Dd Ðð Ee Ff Gg Hh Ii Íí Jj Kk Ll Mm Nn Oo Óó Pp Rr Ss Tt Uu Úú Vv Yy Ýý Ææ Øø 

Finnish
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Xx Yy Zz Ää Öö

French
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz

German
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Ää Öö Üü
ẞß

Icelandic
Aa Áá Bb Dd Ðð Ee Éé Ff Gg Hh Ii Íí Jj Kk Ll Mm Nn Oo Óó Pp Rr Ss Tt Uu Úú Vv Xx Yy Ýý Þþ Ææ Öö

Irish
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz

Italian
Aa Bb Cc Dd Ee Ff Gg Hh Ii Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Zz

Latvian
Aa Āā Bb Cc Čč Dd Ee Ēē Ff Gg Ģģ Hh Ii Īī Jj Ķķ Ll Ļļ Mm Nn Ņņ Oo Pp Rr Ss Šš Tt Uu Ūū Vv Zz Žž

Lithuanian
Aa Ąą Bb Cc Čč Ee Ęę Ėė Ff Gg Hh Ii Įį Yy Jj Kk Ll Mm Nn Oo Pp Rr Ss Šš Tt Uu Ųų Ūū Vv Zz Žž

Norwegian
Aa Bb Cc Dd Ee Ff Gg Hh Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Ææ Øø Åå

Polish
Aa Ąą Bb Cc Ćć Dd Ee Ęę Ff Gg Hh Ii Jj Kk Ll Łł Mm Nn Ńń Oo Óó Pp Rr Ss Śś Tt Uu Ww Yy Zz Źź Żż

Portuguese
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Ll Mm Nn Oo Pp Qq Ss Tt Uu Vv Xx Zz

Romanian
Aa Ăă Ââ Bb Cc Dd Ee Ff Gg Hh Ii Îî Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Şş Tt Ţţ Uu Vv Ww Xx Yy Zz

Sami
Aa Áá Bb Cc Čč Dd Đđ Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Ŋŋ Oo Pp Rr Ss Šš Ŧŧ Uu Vv Zz Žž

Swedish
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Åå Ää Öö

Turkish
Aa Bb Cc Çç Dd Ee Ff Gg Ğğ Hh Iı Ii Jj Kk Ll Mm Nn Oo Öö Pp Rr Ss Şş Tt Uu Üü Vv Vv Yy Zz

Some languages use digraphs (or similar) in their alphabets which are not single codepoints in Unicode:

Albanian
Aa Bb Cc Çç Dd DH/dh Ee Ëë Ff Gg GJ/gj Hh Ii Jj Kk Ll LL/ll Mm Nn NJ/nj Oo Pp Qq Rr RR/rr Ss SH/sh Tt TH/th Uu Vv Xx XH/xh Yy Zz ZH/zh

Croatian
Aa Bb Cc Čč Ćć Dd DŽ/dž Đđ Ee Ff Gg Hh Ii Jj Kk Ll LJlj Mm Nn NJ/nj Oo Pp Rr Ss Šš Uu Vv Zz Žž

Czech
Aa Bb Cc Čč Dd Ee Ff Gg Hh CH/ch Ii Jj Kk Ll Mm Nn Oo Pp Rr Řř Ss Šš Tt Uu Vv Ww Xx Yy Zz Žž

Hungarian (standard)
Aa Áá Bb Cc CS/cs Dd DZ/dz DZS/dzs Ee Éé Ff Gg GY/gy Hh Ii Íí Jj Kk Ll LY/ly Mm Nn NY/ny Oo Óó Öö Őő Pp Qq Rr Ss SZ/sz Tt TY/ty Uu Úú Üü Űű Vv ZZS/zzs

Spanish
Aa Bb Cc CH/ch Dd Ee Ff Gg Hh Ii Jj Kk Ll LL/ll Mm Nn Ññ Oo Pp Qq Rr RR/rr Ss Tt Uu Vv Ww Xx Yy Zz

Welsh
Aa Bb Cc CH/ch Dd DD/dd Ee Ff FF/ff Gg NG/ng Hh Ii Jj Ll LL/ll Mm Nn Oo Pp PH/ph Rr RH/rh Ss Tt TH/th Uu Ww Yy

According to Wikipedia, the largest Latin (and European) true alphabet is Slovak with 46 letters:

Slovak
Aa Áá Ää Bb Cc Čč Dd Ďď DZ/dz DŽ/dž Ee Éé Ff Gg Hh CH/ch Ii Íí Jj Kk Ll Ĺĺ Ľľ Mm Nn Ňň Oo Óó Ôô Pp Qq Rr Ŕŕ Ss Šš Tt Ťť Uu Úú Vv Ww Xx Yy Ýý Zz Žž

Italian has just 21.

A number of digraphs (and similar) are encoded in Unicode:

  • DZ, Dz, dz (U+01F1, U+01F2, U+01F3)
  • DŽ, Dž, dž (U+01C4, U+01C5, U+01C6)
  • IJ, ij (U+0132, U+0133)
  • LJ, Lj, lj (U+01C7, U+01C8, U+01C9)
  • NJ, Nj, nj (U+01CA, U+01CB, U+01CC)

Of these, the Dutch "IJ" (U+0132 "LATIN CAPITAL LIGATURE IJ") is unusual in not having a titlecase mapping distinct from its uppercase mapping. Consider the Dutch word "ijsje" meaning ice cream; in uppercase it is "IJSJE" and in titlecase it is "IJsje" not "Ijsje".

This quirk is considered by many as evidence that "IJ" should be considered a true letter in its own right. That possibly includes this purveyor of alcohol:

"Slijterij" means "Off-Licence"

However, the replacement of the "IJ" tile with "Y" in Dutch Scrabble may be another nail in the coffin.

No comments:

Post a Comment