Monday, 23 March 2026

A Certain Date

I've been working with historical data recently as part of my Whim project. The thorny issue of how to store time series (and time-varying relationships) inevitably came up, but I'll restrict this post to some thoughts on storing (uncertain) dates, times and timespans.

There are many good texts on the broader subject (Richard T. Snodgrass's "Developing Time-oriented Database Applications in SQL" is old but entertaining) and a whole industry has grown up around it. There's also the tangential subject of temporal databases. But for our discussion, we'll talk about instants, timespans and uncertainty in historical data; in particular, the rise and fall of empires, nations and polities.

Instants

An instant is a point in time. One attribute of an instant is its resolution. For historical events spanning millennia, a resolution of a year may be sufficient. So we can say that "Alexander III of Macedon was born in 356 BCE". A higher resolution would be of a day (i.e. a specific date): "Winston Churchill was born on 30 November 1874". Beyond a specific date, we can add higher and higher resolutions of time: "Britain declared war on Germany at 11 p.m. (GMT) on 4 August 1914". Interestingly the prepositions change for these three cases: "in", "on" and "at".

Storing Years

It's tempting to store historical years as signed integers (e.g. -356 for 356 BCE), but beware of year zero: the year following 1 BCE is 1 CE, not 0 BCE/CE. So the number of years "between" 356 BCE and 1874 CE is 2229, not 2230 (1874 minus -356). For human-readable data, unadjusted signed integers are intuitive; for calculations, adjusting negative values (such that -355 is used for 356 BCE) reduces the probability of off-by-one errors in arithmetic.

Storing Dates

Instants with a resolution of one day are often natively supported by data stores. Many people think that this completely circumvents the knotty problems of timezones and leap seconds. Alas, it doesn't!

Consider Howland Island of USA and the Line Islands of Kiribati:

Source

When it is Wednesday 23:00 on Howland Island, it is Friday 01:00 on the Line Islands. Two "days" difference. At the same instant.

For this reason, it may still be prudent to store timezone/location information alongside local dates.

Another issue is calendars. The "standard" Western Gregorian calendar is difficult enough to convert to and from serial (integer) dates, but consider the calendars where dates change at midday, not midnight.

Storing Times

Instants with a resolution of less than one day typically do so using seconds, or multiples/divisions thereof. However, beware of daylight savings (some days can have more or fewer seconds in them) and leap seconds (minutes can have 59, 60, 61 or 62 seconds in them).

Storing a timezone with each date goes some way to alleviate these problems. One could also use UTC or (to handle leap seconds) TAI with an appropriate adjustment flag. Leap seconds are particularly nasty because they can be announced seemingly randomly ("time-varying time"!?).

Storing ISO 8601

ISO 8601 can be useful for storing instants. This can be as simple as storing the instant as a string where the length of the string denotes the resolution of the data point:

  • "yyyy" for years.
  • "yyyy-mm-dd" for days.
  • "yyyy-mm-ddThh:mm:ss" for seconds.
  • "yyyy-mm-ddThh:mm:ss.ffffff" for microseconds.
  • etc.

Timezones and UTC can be indicated by the appropriate suffixes.

Notice that the standard only covers four-digit years. Earlier years must be supported by what is euphemistically called "prior arrangement between parties". This can include a "-YYYYY" extension for BCE. Unfortunately, this negates one very useful property of ISO 8601 strings: lexicographical sorting equates to chronological sorting.

Instant Operators

Instants belong to a strictly total ordering:

  • "a < b" implies that instant "a" is strictly before instant "b" chronologically.
  • If not "a < b", then "b  a" ("b" is the same as or before "a").
  • If neither "a < b" nor "b < a", then "a  b" ("a" is the same instant as "b").
  • If "a < b" and "b < c", then "a < c".
  • etc.

Instant Sentinels

Although there is no notion of "zero" for instants, "+∞" can be used to denote an instant infinitely far in the future. Similarly, "-∞" is infinitely far in the past.

Timespans

If instant "a" is non-strictly before instant "b", i.e. "a  b", then the difference between the two instants, "b - a", is the elapsed time between them, measured in the same units as "a" and "b".

Zero-length timespans imply "a  b" . We call these "instantaneous timespans".

Open or Closed

When storing timespans, we must decide whether to include or exclude each endpoint:

  1. "t ∊ [a, b)" implies "a  t < b" (half-open)
  2. "t ∊ (ab]" implies "a < t  b" (half-open)
  3. "t ∊ (ab)" implies "a < t < b" (open)
  4. "t ∊ [ab]" implies "a  t  b" (closed)

Scheme 1 is often used and we'll use "b" to denote "[ab)", i.e. the timespan from (and including) instant "a" to (but excluding) instant "b". We always assume "a  b".

One problem with this formulation is that the expression "a" does not denote an instantaneous timespan; it denotes an empty (or null) timespan.

Three "infinite" timespans can also be defined:

  • "+∞" denotes "a  t"
  • "- ~ b" denotes "t < b"
  • "-∞ +∞" denotes "t" is unconstrained

Allen Intervals

Timespans can be thought of as Allen intervals. For "X = a ~ b" and "Y = c ~ d", we have:

Sketch

Constraints*

Allen Interval

Allen Meaning

Combined

  a   b
X ├───┤

Y      ├───┤
       c   d

b < c

X < Y

Y > X

X precedes Y

Y is preceded by X

Does not combine

  a   b
X ├───┤

Y     ├───┤
      c   d

b ≡ c

X m Y

Y mi X

X meets Y

Y is met by X

a ~ d

  a   b
X ├───┤

Y    ├───┤
     c   d

c < b

X o Y

Y oi X

X overlaps with Y

Y is overlapped by X

a ~ d

  a   b
X ├───┤

Y ├─────┤
  c     d

a ≡ c

b < d

X s Y

Y si X

X starts Y

Y is started by X

a ~ d

   a   b
X  ├───┤

Y ├─────┤
  c     d

c < a

b < d

X d Y

Y di X

X during Y

Y contains X

c ~ d

    a   b
X   ├───┤

Y ├─────┤
  c     d

c < a

b ≡ d

X f Y

Y fi X

X finishes Y

Y is finished by X

c ~ d

  a   b
X ├───┤

Y ├───┤
  c   d

a ≡ c

b ≡ d

X = Y

X is equal to Y

a ~ d

* In addition to "a  b" and "c  d".

This leads to a whole algebra along with sanity checks that can be performed on related timespans.

Timespan Resolution

The resolution of a timespan corresponds to the resolution used for both the endpoints. This can sometimes lead to suspicious-looking (but valid) expressions for low resolutions:

  • "World War II started in 1939 and ended in 1945"
  • 1939 ≤ year ≤ 1945
  • 1939 ≤ year < 1946
  • 1939 ~ 1946

Storing Timespans

It is tempting to always store timespans as pairs of instants on the grounds that storing absolute (non-relative) values is somehow better (e.g. storing date of birth instead of age). Thus, "a ~ b" is stored as two values: "a" and "b". However, it is sometimes better to store "a" (start) and "b-a" (duration) instead. This allows the resolutions of the two quantities to be different and solves the problem of representing instantaneous timespans.

    Uncertainty

    Consider the following statement:

    "The Voynich manuscript was written between 1404 and 1438."

    This could be interpretted in at least two ways:

    1. A certain timespan: the writing of the manuscript was started in 1404 and completed in 1438.
    2. An uncertain timespan: the writing of the manuscript was started between 1404 and 1438 and completed within the same timespan.

    The latter interpretation is more likely, in this case. The timespan, "[ab]", for the writing of the manuscript is given by:

    1404 a b ≤ 1438

    Care must be taken to distinguish certain timespans (Interpretation 1) from uncertain instants or uncertain timespans (Interpretation 2). Sometimes, this can only be achieved with extra contextual information. Consider:

    "Germany expanded into Denmark, Norway, Belgium, the Netherlands, Luxembourg and France between April and June 1940."

    Storing Uncertain Instants

    Within historical datasets, we can use chronologically-ordered lists of values to store uncertain instants. Here, we use the notation "<x, y, z>" for such lists.

    • Totally unknown instants are stored as an empty list: "< >".
    • Certain instants are stored as singletons: "<x>" means the event unequivocally took place at instant "x".
    • Uncertain instants with a uniform range are stored as pairs: "<x, y>" means the event took place between instant "x" and instant "y".
    • Uncertain instants with a non-uniform range are stored as triplets: "<xy, z>" means the event took place between instant "x" and instant "z" with a median value of "y", where x ≤ y ≤ z.
    • More quantiles can be added to the distribution by adding elements to the list. The more elements we add, the more precise the probability distribution becomes.

    A nice property of these lists is that the average (median) value of the uncertain instant is the "central" value:

    • <1066> → 1066
    • <1404, 1438> → 1421
    • <1404, 1415, 1438> → 1415
    • <1404, 1413, 1417 1438> → 1415 = (1413+1417)÷2
    • etc.

    Storing Uncertain Timespans

    As with certain timespans, we could store uncertain timespans as two uncertain instants. However, care must be taken to prevent "negative" durations.

    For example, if we assume the Voynich manuscript was actually written in the interval "[a, b]", some time between 1404 and 1438 inclusive, then merely constraining "a ∊ [1404, 1438]" and "b ∊ [1404, 1438]" opens up the possibility of violating "a ≤ b" (consider "a=1420" and "b=1410").

    Representing uncertain timespans as "[a, a+d]" where "d ≥ 0" may be more appropriate and/or natural. This is particularly true for historical references where the duration of the event is more or less certain than its start or end date.

    Gantt Charts

    Once we start treating uncertain timespans as a "floating" start instant and a "flexible" duration, relationships/dependencies between timespans begins to look a lot like Gantt chart analysis. There is a wealth of literature and algorithms that we can leverage from this field.

    Charmingly, the Wikipedia page for Gantt charts starts with an uncertain timespan:

    "It was designed and popularized by Henry Gantt c. 1910–1915."

    Monday, 23 February 2026

    World History in Maps

    I've started to put together some visualisations and maps spanning the last five millennia of nations, empires and dynasties, on a whim really.

    World History in Maps



    Friday, 22 March 2024

    Unicode Numeral Systems 2

    Ray Toal recently enlightened me on the existence of two interesting number systems: Kaktovik numerals and Cistercian numerals.

    I haven't updated my list of numeral systems since Unicode 14.0.0 (see original blog post), so I thought I'd revisit the whole "Universe" project and update it to Unicode 15.1.0.

    The vigesimal Kaktovik numerals are supported by Unicode 15.0.0 (U+1D2C0 to U+1D2D3), but, at the time of writing, Google Noto font support is still shuffling along the pipe, so they are difficult to display:

    [source]

    Cistercian numerals are interesting as they are (sort of) base-10000, but they have not been allocated a Unicode range, although the proposal dates back to 2020.

    [source]

    The Cistercian numeral clock tickled my fancy.


    Friday, 15 April 2022

    Unicode Trivia U+10FB

    Codepoint: U+10FB "GEORGIAN PARAGRAPH SEPARATOR"
    Block: U+10A0..10FF "Georgian"

    The Georgian scripts are encoded in four letter forms in three Unicode blocks:

    The four rows are:

    1. Asomtavruli is the oldest form, dating from the fifth century CE
    2. Nuskhuri dates from the ninth century CE
    3. Mkhedruli is the current Georgian script
    4. Mtavruli is the uppercase version of Mkhedruli

    In the original "Georgian" block, the codepoints U+10A0..10C5 encode the uppercase of the old ecclesiastical alphabet, Asomtavruli (row 1). The codepoints U+10D0..10F0 encode the the lowercase of the modern secular alphabet, Mkhedruli (row 4). The latter is used for almost all text, including at the beginning of sentences and names.

    However, don't be tempted to mash together uppercase Asomtavruli with lowercase Mkhedruli to get a bicameral script. That problem wasn't "solved" until the addition of the later "Georgian Extended" and "Georgian Supplement" blocks. More on that in later posts. For modern Georgians, this isn't really a problem at all; writing uses only one case.

    In old texts, the "჻" symbol (U+10FB GEORGIAN PARAGRAPH SEPARATOR) was used at the end of the last line of a paragraph. Its use was presumably similar to that of the pilcrow "¶" but at the end of the paragraph, not at the beginning. Alas, the Georgian script didn't get its own full stop; it must share it with the Armenian one, "։" (U+0589 ARMENIAN FULL STOP)

    ISO 10586:1996 encodes 42 characters of the Georgian script in a 7-bit character set, including the paragraph mark at 0x4F.

    There is an interesting annex in the standard, part of which I'll include below:

    Annex A: Development of the Georgian script

    Armenian and Georgian, two of the multitudinous tongues spoken in the Caucasian Region, are vehicles of millennial civilizations. Both languages present peculiar phonetic resemblances in spite of their completely different origins. Georgian, or Grusinian, is a member of the Kartvelian language family. Armenian is a member of the Indo-European language family. Each language has its own alphabet, which resemble one another, since the alphabets developed from the same source.

    According to one tradition, these two alphabets were invented circa A.D. 406 by the Armenian monk, missionary and theologian Mesrop Mast’oc’ (ca. A.D. 360 to A.D. 439), who also invented an alphabet for the now extinct language Albani (or Caucasian Albanian). According to another tradition, the Georgian script was invented circa A.D. 300 by the Georgian king, Parnavaz. Some scholars allege that it was invented many centuries earlier. The origin of, and the relations between, the three forms of the script are also still in dispute.

    More likely, the Georgian script was derived, as was the Armenian script, from a Semitic alphabet, the Pahlavi script, used in Persia in the 4th century. It was developed under a strong Greek influence (by Mast’oc’ or perhaps one of his disciples) into an alphabet enabling the Georgian people to spell their language, with its wealth of sounds in a simple and phonemic way. Owing to phonetic evolution, a few letters became superfluous. In former times, the Georgian alphabet was also used in writing Ossetic and Abkhaz. The oldest inscription in Georgian dates back to the 5th century. The oldest manuscripts date from the 8th century. The period from A.D. 980 to A.D. 1220 is considered the golden age of Georgian literature.

    Wednesday, 13 April 2022

    Unicode Trivia U+1090

    Codepoint: U+1090 "MYANMAR SHAN DIGIT ZERO"
    Block: U+1000..109F "Myanmar"

    The "Myanmar" Unicode block contains glyphs used in various regional writing systems including the Burmese and Shan scripts. In this post, I'm going to play fast and loose with the script names and just call them "Burmese" and "Shan". Unicode Technical Note 11 describes some of the intricacies involved with the various scripts, weighing in at a healthy 67 pages.

    The "Myanmar" block contains two sets of digits: one for Burmese (second row below) and one for Shan (bottom row):

    If you have appropriate fonts installed, these are the Burmese digits (U+1040..1049):

    ၀၁၂၃၄၅၆၇၈၉

    and Shan digits  (U+1090-1099):

    ႐႑႒႓႔႕႖႗႘႙

    Burmese digits have the advantage (over Hindu-Arabic and Shan digits) of having ascenders and descenders which help to differentiate them. They are very similar to "Tai Tham Hora" digits (U+1A80..1A89). See here.

    The Shan script supposedly evolved from the Burmese, but their digits are markedly different. To my eyes, they appear to resemble the Hindu-Arabic digits, but the "8" and "9" are inexplicably similar.

    The Burmese language has words for very large numbers: powers of ten up t107 and then increasing multiplicatively by factors of 107 up to 10140 ("athinche", fittingly this is a synonym for "countless number"). I could find no reason why the names progress in multiples of 107. Most other languages use 103 (e.g. English "thousand", "million", "billion", etc.) or sometimes 102 (e.g. Indian "lakh", "crore", "arab", etc.)

    Another curiosity is that the tonal pronunciation of digits changes depending on the denary position of the digit within the number. The tone generally changes from "low" to "creaky" (no, really!) for digits in the 101102 and 103 places.

    Monday, 11 April 2022

    Unicode Trivia U+0F33

    Codepoint: U+0F33 "TIBETAN DIGIT HALF ZERO"
    Block: U+0F00..0FFF "Tibetan"

    The Unicode Character Database has a field named "Numeric_Value" (abbreviated to "nv"). For the vast majority of the 144,697 used codepoints in Unicode 14.0.0 (in fact, precisely 142,890) this field holds the value "NaN" meaning that the codepoint does not represent a numeric value.

    Other values for "nv", with the number of codepoints having that value in parentheses, are shown below, in approximate order of frequency.

    First, the denary digits. The distribution is not flat because of the irregularity of CJK ideographs representing small numbers and the lack of a "zero" digit in some writing systems:

    • "1" (141)
    • "2" (140)
    • "3" (141)
    • "4" (132)
    • "5" (130)
    • "6" (114)
    • "7" (113)
    • "8" (109)
    • "9" (113)
    • "0" (84)

    Next, multiples of ten:

    • "10" (62)
    • "20" (36)
    • "30" (19)
    • "40" (18)
    • "50" (29)
    • "60" (13)
    • "70" (13)
    • "80" (12)
    • "90" (12)

    Next, powers of ten. Characters for trillions are using in Japan and Taiwan (U+5146) and in the Pahawh Hmong script (U+16B61):

    • "100" (35)
    • "1000" (22)
    • "10000" (13)
    • "100000" (5)
    • "1000000" (1)
    • "10000000" (1)
    • "100000000" (3)
    • "10000000000" (1)
    • "1000000000000" (2)

    Next, sequential values up to twenty:

    • "11" (8)
    • "12" (8)
    • "13" (6)
    • "14" (6)
    • "15" (6)
    • "16" (7)
    • "17" (7)
    • "18" (7)
    • "19" (7)

    Next, blocks of circled numbers:

    • "21" (1)
    • "22" (1)
    • "23" (1)
    • "24" (1)
    • "25" (1)
    • "26" (1)
    • "27" (1)
    • "28" (1)
    • "29" (1)

    • "31" (1)
    • "32" (1)
    • "33" (1)
    • "34" (1)
    • "35" (1)
    • "36" (1)
    • "37" (1)
    • "38" (1)
    • "39" (1)

    • "41" (1)
    • "42" (1)
    • "43" (1)
    • "44" (1)
    • "45" (1)
    • "46" (1)
    • "47" (1)
    • "48" (1)
    • "49" (1)

    Next, multiples of 100. We can see the importance of 500 in ancient counting systems (e.g. "D" in Roman numerals)

    • "200" (6)
    • "300" (7)
    • "400" (7)
    • "500" (16)
    • "600" (7)
    • "700" (6)
    • "800" (6)
    • "900" (7)

    Next, multiples of 1000:

    • "2000" (5)
    • "3000" (4)
    • "4000" (4)
    • "5000" (8)
    • "6000" (4)
    • "7000" (4)
    • "8000" (4)
    • "9000" (4)

    Next, multiples of 10,000:

    • "20000" (4)
    • "30000" (4)
    • "40000" (4)
    • "50000" (7)
    • "60000" (4)
    • "70000" (4)
    • "80000" (4)
    • "90000" (4)

    Next, multiples of 100,000 (e.g. "lakh"):

    • "200000" (2)
    • "300000" (1)
    • "400000" (1)
    • "500000" (1)
    • "600000" (1)
    • "700000" (1)
    • "800000" (1)
    • "900000" (1)

    Next, multiples of 10,000,000 (e.g. "crore"):

    • "20000000" (1)

    Next are two large numbers from cuneiform (base 60):

    • "216000" (1)
    • "432000" (1)

    Next, we start the rational fractions (e.g. "half"):

    • "1/2" (18)

    Next, the quarters:

    • "1/4" (13)
    • "3/4" (8)

    Next, the eighths:

    • "1/8" (7)
    • "3/8" (1)
    • "5/8" (1)
    • "7/8" (1)

    Next, the sixteenths:

    • "1/16" (6)
    • "3/16" (5)

    Next, the thirty-seconds:

    • "1/32" (1)

    Next, the sixty-fourths:

    • "1/64" (1)
    • "3/64" (1)

    Next, the thirds (strangely, there's an Ancient Greek "⅔" U+10177, but not for "⅓"):

    • "1/3" (5)
    • "2/3" (6)

    Next, the fifths:

    • "1/5" (3)
    • "2/5" (1)
    • "3/5" (1)
    • "4/5" (1)

    Next, the sixths:

    • "1/6" (3)
    • "5/6" (2)

    Next, a seventh:

    • "1/7" (1)

    Next, a ninth:

    • "1/9" (1)

    Next, the twelfths (Meroitic cursive fractions, not reduced):

    • "1/12" (1)
    • "2/12" (1)
    • "3/12" (1)
    • "4/12" (1)
    • "5/12" (1)
    • "6/12" (1)
    • "7/12" (1)
    • "8/12" (1)
    • "9/12" (1)
    • "10/12" (1)
    • "11/12" (1)

    Next, a collection of (mostly Tamil and Malayalam) fractions we've seen already:

    • "1/320" (2)
    • "1/160" (2)
    • "1/80" (1)
    • "1/40" (2)
    • "3/80" (2)
    • "1/20" (2)
    • "1/10" (3)
    • "3/20" (2)

    Finally, a collection of what can only be described as "strange halves":

    • "3/2" (1)
    • "5/2" (1)
    • "7/2" (1)
    • "9/2" (1)
    • "11/2" (1)
    • "13/2" (1)
    • "15/2" (1)
    • "17/2" (1)
    • "-1/2" (1)

    These last nine all belong to the "Tibetan / Digits minus Half" group of codepoints (U+0F2A to U+0F33), including the wonderfully perplexing U+0F33 "TIBETAN DIGIT HALF ZERO".

    [source]

    This character supposedly has a numeric value of "-1/2" or "-0.5", and is the only codepoint (so far) with a negative "nv".

    As Andrew West points out, there is much confusion (and little evidence) surrounding the numeric values of these codepoints. The glyphs seem to appear on postage stamps, but if the Royal Mail was in the habit of issuing stamps with a denomination of minus ½p, they quickly go out of business. If you went into a Post Office and asked for one million "-½p" stamps, the teller would be obliged to give you a huge tome of stamps and £5,000.

    Sunday, 20 February 2022

    Unicode Trivia U+0EA5

    Codepoint: U+0EA5 "LAO LETTER LO LOOT"
    Block: U+0E80..0EFF "Lao"

    The Lao script (Akson Lao) is a sister script of the Thai script; both derive from the Sukhothai script of the thirteenth century CE. As such, they have many similarities. For instance, both Lao and Thai consonants are given individual names. Here are the 27 Lao consonants with their typical names:

    1. ກ = chicken (ໄກ່)
    2. ຂ = egg (ໄຂ່)
    3. ຄ = water buffalo (ຄວາຍ)
    4. ງ = ox (ງົວ)
    5. ຈ = glass (ຈອກ)
    6. ສ = tiger (ເສືອ)
    7. ຊ = elephant (ຊ້າງ)
    8. ຍ = mosquito (ຍຸງ)
    9. ດ = child (ເດັກ)
    10. ຕ = eye (ຕາ)
    11. ຖ = bag (ຖົງ)
    12. ທ = flag (ທຸງ)
    13. ນ = bird (ນົກ)
    14. ບ = goat (ແບ້)
    15. ປ = fish (ປາ)
    16. ຜ = bee (ເຜິ້ງ)
    17. ຝ = rain (ຝົນ)
    18. ພ = mountain (ພູ)
    19. ຟ = fire (ໄຟ)
    20. ມ = cat (ແມວ)
    21. ຢ = medicine (ຢາ)
    22. ຣ = car (ຣົຖ)
    23. ລ = monkey (ລີງ)
    24. ວ = fan (ວີ)
    25. ຫ = goose (ຫ່ານ)
    26. ອ = bowl (ອື່ງ)
    27. ຮ = house (ເຮືອນ)

    Each consonant's name begins with that consonant in a similar fashion to English alphabet mnemonics such a "A is for apple, B is for banana, etc.", known as acrophony:

    [source]

    Alas, the mapping of these consonants to the appropriate "column" of the Unicode Lao block is complicated by two factors:

    1. The Unicode encoding is based loosely on Thai Industrial Standard 620-2533 and has holes where unused characters are omitted.
    2. The names of four of the consonants were incorrect when they were added to Unicode 1.0.

    These complications are discussed in Andrew West's N3137 notes:

    The Unicode code charts note that the Lao block is "Based on TIS 620-2529". This statement is misleading as TIS 620-2529 is a Thai standard for representing the Thai script in an 8-bit code, and does not define names or code points for the Lao script. The Unicode Lao block is based on a mapping of Lao characters to the equivalent Thai characters in TIS 620, but is not actually based on this standard.

    And:

    The Unicode names for Lao consonants are based on the syllabic pronunciation of the character (i.e. consonant plus inherent vowel). All consonants belong to one of three tone classes: high, mid and low. Where two letters are only distinguished phonetically by their tone class, the modifiers SUNG "high" and TAM "low" are used to indicate the tone class of the letter (e.g. U+0E82 "LAO LETTER KHO SUNG" and U+0E84 "LAO LETTER KHO TAM"). However, the Unicode names for two of the consonants have the wrong tone class applied to them:

    U+0E9D "LAO LETTER FO TAM" is a high tone class letter, and should have been named "LAO LETTER FO SUNG"

    U+0E9F "LAO LETTER FO SUNG" is a low tone class letter, and should have been named "LAO LETTER FO TAM"

    Whilst the Unicode names for 25 of the 27 consonants use this naming scheme, the names of two of the consonants use mnemonic names (presumably because they share the same vowel and tone class, and so could not otherwise be differentiated). Mnemonic names are how the consonants are normally identified in the Lao language, although there is no official list of standard mnemonic names for consonants, and different sources may use different mnemonic names for some letters.

    The two letters whose Unicode names are based on mnemonic names are:

    U+0EA3 "LAO LETTER LO LING"

    U+0EA5 "LAO LETTER LO LOOT"

    The mnemonic names for these two letters are the wrong way round. U+0EA5 is the normal letter [l] and is universally identified by the mnemonic name lo ling "lo as in ling [monkey]". On the other hand, U+0EA3 is a letter that is used to represent [r] in foreign words; however this letter has been officially deprecated by the Lao government since 1975, and is no longer in common use. The name element LO LOOT applied to U+0EA5 would seem to represent the mnemonic ro rot, "rot" meaning automobile, that should be applied to U+0EA3.

    So  U+0EA3 should be named "LAO LETTER RO ROT" (car) and U+0EA5 should be named "LAO LETTER LO LING" (monkey).

    It is interesting that the Unicode standard has effectively "nailed down" the names of the consonants even though Andrew West says there is no official standard.

    It has always troubled me that English does not have a satisfactory mechanism for naming its letters. These are the names typically used in British English:

    1. a
    2. bee
    3. cee
    4. dee
    5. e
    6. eff
    7. gee
    8. aitch
    9. i
    10. jay
    11. kay
    12. el
    13. em
    14. en
    15. o
    16. pee
    17. cue
    18. ar
    19. ess
    20. tee
    21. u
    22. vee
    23. double-u
    24. ex
    25. wye
    26. zed

    If we ignore "double-u" (which we've met before), the obvious elephant in the room is "cue" for "Q". Not only is it not acrophonic (only 15 of the 26 truly are), "Q" doesn't appear anywhere in its name.