Wednesday 19 January 2022

Unicode Trivia U+05D0

Codepoint: U+05D0 "HEBREW LETTER ALEF"
Block: U+0590..05FF "Hebrew"

Hebrew is usually written in an abjad script, right-to-left. Abjads are also known as consonant alphabets because they lack "letters" for vowel sounds. Diacritics indicating vowels are used for poetry, religious texts and teaching Hebrew.

When we type the 22 consonants, "alef" (U+0590) to "taw" (U+05EA), a text renderer should render them right-to-left:

But how does it know that?

Within the UCD fields for U+05D0, we see:

bc = R

This means that the "bidirectionality class" for U+05D0 is "any strong right-to-left (non-Arabic-type) character" (UAX #44). This, together with the fiendishly complex bidirectional algorithm (UAX #9), allows text renderers to render arbitrary sequences of mixed-script codepoints correctly.

The U+05D0 "HEBREW LETTER ALEF" codepoint is marked as Hebrew script:

sc = Hebr

but the Unicode bidirectional algorithm does not rely on script-level properties. That is, Unicode says that "alef" is usually rendered "right-to-left", not that "alef" is part of the "Hebrew" script and the "Hebrew" script is usually rendered "right-to-left".

There is no concept of lowercase and uppercase letters in Hebrew; the script is unicameral.

Finally, and appropriately, four Hebrew letters have "final forms" when they appear at the end of words:


This is similar to Greek sigma, but without the special-case handling necessary to overcome the lack of an uppercase final sigma.

No comments:

Post a Comment