Sunday 20 February 2022

Unicode Trivia U+0EA5

Codepoint: U+0EA5 "LAO LETTER LO LOOT"
Block: U+0E80..0EFF "Lao"

The Lao script (Akson Lao) is a sister script of the Thai script; both derive from the Sukhothai script of the thirteenth century CE. As such, they have many similarities. For instance, both Lao and Thai consonants are given individual names. Here are the 27 Lao consonants with their typical names:

  1. ກ = chicken (ໄກ່)
  2. ຂ = egg (ໄຂ່)
  3. ຄ = water buffalo (ຄວາຍ)
  4. ງ = ox (ງົວ)
  5. ຈ = glass (ຈອກ)
  6. ສ = tiger (ເສືອ)
  7. ຊ = elephant (ຊ້າງ)
  8. ຍ = mosquito (ຍຸງ)
  9. ດ = child (ເດັກ)
  10. ຕ = eye (ຕາ)
  11. ຖ = bag (ຖົງ)
  12. ທ = flag (ທຸງ)
  13. ນ = bird (ນົກ)
  14. ບ = goat (ແບ້)
  15. ປ = fish (ປາ)
  16. ຜ = bee (ເຜິ້ງ)
  17. ຝ = rain (ຝົນ)
  18. ພ = mountain (ພູ)
  19. ຟ = fire (ໄຟ)
  20. ມ = cat (ແມວ)
  21. ຢ = medicine (ຢາ)
  22. ຣ = car (ຣົຖ)
  23. ລ = monkey (ລີງ)
  24. ວ = fan (ວີ)
  25. ຫ = goose (ຫ່ານ)
  26. ອ = bowl (ອື່ງ)
  27. ຮ = house (ເຮືອນ)

Each consonant's name begins with that consonant in a similar fashion to English alphabet mnemonics such a "A is for apple, B is for banana, etc.", known as acrophony:

[source]

Alas, the mapping of these consonants to the appropriate "column" of the Unicode Lao block is complicated by two factors:

  1. The Unicode encoding is based loosely on Thai Industrial Standard 620-2533 and has holes where unused characters are omitted.
  2. The names of four of the consonants were incorrect when they were added to Unicode 1.0.

These complications are discussed in Andrew West's N3137 notes:

The Unicode code charts note that the Lao block is "Based on TIS 620-2529". This statement is misleading as TIS 620-2529 is a Thai standard for representing the Thai script in an 8-bit code, and does not define names or code points for the Lao script. The Unicode Lao block is based on a mapping of Lao characters to the equivalent Thai characters in TIS 620, but is not actually based on this standard.

And:

The Unicode names for Lao consonants are based on the syllabic pronunciation of the character (i.e. consonant plus inherent vowel). All consonants belong to one of three tone classes: high, mid and low. Where two letters are only distinguished phonetically by their tone class, the modifiers SUNG "high" and TAM "low" are used to indicate the tone class of the letter (e.g. U+0E82 "LAO LETTER KHO SUNG" and U+0E84 "LAO LETTER KHO TAM"). However, the Unicode names for two of the consonants have the wrong tone class applied to them:

U+0E9D "LAO LETTER FO TAM" is a high tone class letter, and should have been named "LAO LETTER FO SUNG"

U+0E9F "LAO LETTER FO SUNG" is a low tone class letter, and should have been named "LAO LETTER FO TAM"

Whilst the Unicode names for 25 of the 27 consonants use this naming scheme, the names of two of the consonants use mnemonic names (presumably because they share the same vowel and tone class, and so could not otherwise be differentiated). Mnemonic names are how the consonants are normally identified in the Lao language, although there is no official list of standard mnemonic names for consonants, and different sources may use different mnemonic names for some letters.

The two letters whose Unicode names are based on mnemonic names are:

U+0EA3 "LAO LETTER LO LING"

U+0EA5 "LAO LETTER LO LOOT"

The mnemonic names for these two letters are the wrong way round. U+0EA5 is the normal letter [l] and is universally identified by the mnemonic name lo ling "lo as in ling [monkey]". On the other hand, U+0EA3 is a letter that is used to represent [r] in foreign words; however this letter has been officially deprecated by the Lao government since 1975, and is no longer in common use. The name element LO LOOT applied to U+0EA5 would seem to represent the mnemonic ro rot, "rot" meaning automobile, that should be applied to U+0EA3.

So  U+0EA3 should be named "LAO LETTER RO ROT" (car) and U+0EA5 should be named "LAO LETTER LO LING" (monkey).

It is interesting that the Unicode standard has effectively "nailed down" the names of the consonants even though Andrew West says there is no official standard.

It has always troubled me that English does not have a satisfactory mechanism for naming its letters. These are the names typically used in British English:

  1. a
  2. bee
  3. cee
  4. dee
  5. e
  6. eff
  7. gee
  8. aitch
  9. i
  10. jay
  11. kay
  12. el
  13. em
  14. en
  15. o
  16. pee
  17. cue
  18. ar
  19. ess
  20. tee
  21. u
  22. vee
  23. double-u
  24. ex
  25. wye
  26. zed

If we ignore "double-u" (which we've met before), the obvious elephant in the room is "cue" for "Q". Not only is it not acrophonic (only 15 of the 26 truly are), "Q" doesn't appear anywhere in its name.

No comments:

Post a Comment