Friday, 17 September 2021

Unicode Numeral Systems

As part of my investigations into Machin stamps, I started looking at numeral systems for labelling stamp denominations and also clock faces.

Unicode 13.0.0 contains a few General Categories used to classify code points according to numeric value ("Nd", "Nl" & "No") plus Derived Numeric Types ("Decimal", "Digit" & "Numeric"). Trawling through these and the Wikipedia pages on numeral systems, I came up with well over one hundred systems that can be used to represent numbers in Unicode using single glyphs:

Unicode Numerals web page

As it turns out, this experiment became more of an exercise in font management. I initially used Unifont for rendering, but this is quite ugly. The Code2000 fonts are no longer maintained. I ended up using Googles Noto fonts. However, there doesn't seem to be a definitive list of which glyphs exist in which font, so I had to create a tool to interrogate all the Noto web fonts I could find. Even an online list of those web fonts seems lacking, so here's one:

  • Noto Kufi Arabic
  • Noto Naskh Arabic
  • Noto Naskh Arabic UI
  • Noto Nastaliq Urdu
  • Noto Sans
  • Noto Sans JP
  • Noto Sans KR
  • Noto Sans SC
  • Noto Sans TC
  • Noto Sans Adlam
  • Noto Sans Adlam Unjoined
  • Noto Sans Anatolian Hieroglyphs
  • Noto Sans Arabic
  • Noto Sans Arabic UI
  • Noto Sans Armenian
  • Noto Sans Avestan
  • Noto Sans Balinese
  • Noto Sans Bamum
  • Noto Sans Batak
  • Noto Sans Bengali
  • Noto Sans Bengali UI
  • Noto Sans Bhaiksuki
  • Noto Sans Brahmi
  • Noto Sans Buginese
  • Noto Sans Buhid
  • Noto Sans Canadian Aboriginal
  • Noto Sans Carian
  • Noto Sans Chakma
  • Noto Sans Cham
  • Noto Sans Cherokee
  • Noto Sans Coptic
  • Noto Sans Cuneiform
  • Noto Sans Cypriot
  • Noto Sans Deseret
  • Noto Sans Devanagari
  • Noto Sans Devanagari UI
  • Noto Sans Display
  • Noto Sans Egyptian Hieroglyphs
  • Noto Sans Ethiopic
  • Noto Sans Georgian
  • Noto Sans Glagolitic
  • Noto Sans Gothic
  • Noto Sans Gujarati
  • Noto Sans Gujarati UI
  • Noto Sans Gurmukhi
  • Noto Sans Gurmukhi UI
  • Noto Sans Gunjala Gondi
  • Noto Sans Hanifi Rohingya
  • Noto Sans Hanunoo
  • Noto Sans Hebrew
  • Noto Sans Imperial Aramaic
  • Noto Sans Indic Siyaq Numbers
  • Noto Sans Inscriptional Pahlavi
  • Noto Sans Inscriptional Parthian
  • Noto Sans Javanese
  • Noto Sans Kaithi
  • Noto Sans Kannada
  • Noto Sans Kannada UI
  • Noto Sans Kayah Li
  • Noto Sans Kharoshthi
  • Noto Sans Khmer
  • Noto Sans Khmer UI
  • Noto Sans Khudawadi
  • Noto Sans Lao
  • Noto Sans Lao UI
  • Noto Sans Lepcha
  • Noto Sans Limbu
  • Noto Sans Linear B
  • Noto Sans Lisu
  • Noto Sans Lycian
  • Noto Sans Lydian
  • Noto Sans Malayalam
  • Noto Sans Malayalam UI
  • Noto Sans Mandaic
  • Noto Sans Masaram Gondi
  • Noto Sans Mayan Numerals
  • Noto Sans Medefaidrin
  • Noto Sans MeeteiMayek
  • Noto Sans Mende Kikakui
  • Noto Sans Meroitic
  • Noto Sans Modi
  • Noto Sans Mongolian
  • Noto Sans Mono
  • Noto Sans Mro
  • Noto Sans Myanmar
  • Noto Sans Myanmar UI
  • Noto Sans New Tai Lue
  • Noto Sans Newa
  • Noto Sans Ogham
  • Noto Sans Ol Chiki
  • Noto Sans Old Italic
  • Noto Sans Old Persian
  • Noto Sans Old South Arabian
  • Noto Sans Old Turkic
  • Noto Sans Oriya
  • Noto Sans Oriya UI
  • Noto Sans Osage
  • Noto Sans Osmanya
  • Noto Sans Pahawh Hmong
  • Noto Sans Phags Pa
  • Noto Sans Phoenician
  • Noto Sans Rejang
  • Noto Sans Runic
  • Noto Sans Samaritan
  • Noto Sans Saurashtra
  • Noto Sans Sharada
  • Noto Sans Shavian
  • Noto Sans Sinhala
  • Noto Sans Sinhala UI
  • Noto Sans Sundanese
  • Noto Sans Syloti Nagri
  • Noto Sans Symbols
  • Noto Sans Symbols 2
  • Noto Sans Syriac
  • Noto Sans Tagalog
  • Noto Sans Tagbanwa
  • Noto Sans Tai Le
  • Noto Sans Tai Tham
  • Noto Sans Tai Viet
  • Noto Sans Takri
  • Noto Sans Tamil
  • Noto Sans Tamil UI
  • Noto Sans Telugu
  • Noto Sans Telugu UI
  • Noto Sans Thaana
  • Noto Sans Thai
  • Noto Sans Thai UI
  • Noto Sans Tifinagh
  • Noto Sans Tirhuta
  • Noto Sans Ugaritic
  • Noto Sans Vai
  • Noto Sans Wancho
  • Noto Sans Warang Citi
  • Noto Sans Yi
  • Noto Serif
  • Noto Serif JP
  • Noto Serif KR
  • Noto Serif SC
  • Noto Serif TC
  • Noto Serif Ahom
  • Noto Serif Armenian
  • Noto Serif Bengali
  • Noto Serif Devanagari
  • Noto Serif Display
  • Noto Serif Ethiopic
  • Noto Serif Georgian
  • Noto Serif Gujarati
  • Noto Serif Hebrew
  • Noto Serif Kannada
  • Noto Serif Khmer
  • Noto Serif Lao
  • Noto Serif Malayalam
  • Noto Serif Myanmar
  • Noto Serif Nyiakeng Puachue Hmong
  • Noto Serif Sinhala
  • Noto Serif Tamil
  • Noto Serif Telugu
  • Noto Serif Thai
  • Noto Serif Tibetan

The Font Check tool allows you to type in a hexadecimal Unicode code point and see the fonts that can render it. It uses the trick of measuring the glyph using an offscreen canvas with a fallback font of Adobe Blank. If the glyph is zero pixels wide, it means it was rendered using the fallback (or is naturally zero-width, so isn't of interest to us anyway).

Using the tool, I worked out the minimal set of Noto fonts needed to render the numerals table. However, there were a few issues:

  1. Some fonts only come in serif variants, not sans serif (and vice versa).

  2. I couldn't find any Noto web fonts that could render the following (though the asterisked scripts were rendered correctly using a fallback font by my browser):

    • Dives Akuru
    • Khmer Divination*
    • Myanmar Tai Laing*
    • Nko*
    • Ottoman Siyaq
    • Sinhala*
    • Sora Sompeng*
    • Tag

  3. Arabic Persian and Arabic Urdu use exactly the same codepoints but use different languages in the HTML mark-up to select different sets of glyphs.

Further notes:

  • I couldn't find any information on how Ogham represented numbers, so I made something up based on tally marks.
  • The Tibetan script has a parallel set of numerals for half values; they're used in stamps, which is a nice coincidence.
  • The Runic Golden Numbers are used in Younger Futhark calendars.
  • Other scripts do have numerals or counting systems, but I excluded many because they are "low radix" (e.g. native Korean)

No comments:

Post a Comment