Tuesday, 1 February 2022

Unicode Trivia U+0A70

Codepoint: U+0A70 "GURMUKHI TIPPI"
Block: U+0A00..0A7F "Gurmukhi"

Sometimes the information in the Unicode Character Database (UCD) is either insufficient for some purpose or requires clarification. This is the role of Unicode Technical Reports (UTR) and Unicode Technical Notes (UTN).

Like other Brahmic scripts, the Gurmukhi script was imported into Unicode 1.0 as part of ISCII, where it was known as Punjabi.

U+0A02 and U+0A70 highlighted in yellow [source]

However, Gurmukhi differs in having two diacritics for nasalisation, the Bindi and Tippi:


The ISCII codepage for Punjabi uses character 0xA2 for both diacritics with the expectation that the correct combining glyph will be rendered according to context. This logic is clarified in UTN #30 from Sukhjinder Sidhu (2006):

Bindi and Tippi are encoded using a single code point in ISCII (0xA2) and the underlying rendering engine selects the correct glyph. However, in Unicode they are given two separate code points.

Thus, 0xA2 should be converted to U+0A70 (Tippi) when:

  • The preceding letter is a consonant (ignoring any Nuktas)
  • The preceding letter is Vowel Sign I (U+0A3F), Vowel Sign U (U+0A41), Vowel Sign UU (U+0A42)
  • The preceding letter is Letter A (U+0A05), Letter I (U+0A07)

In all other cases, the sign should remain a Bindi (U+0A02).

When converting from Unicode to ISCII, both Bindi and Tippi should be converted to Bindi (0xA2).

This special case logic isn't part of the core Unicode Standard; it is advisory only. But Sukhjinder Sidhu points out

If the advice in this document is not heeded, any resulting conversion will not be legible to readers of the Gurmukhi script,

One would hope that software vendors would take heed, but a casual read of Microsoft's .NET core library source reveals no implementation (or even mention) of UTN #30 in ISCIIEncoding,cs. The code maps 0xA2 to and from U+0A02 (Bindi) but provides no transformations for Tippi. At the top of the C# source file is a comment:

Ported from windows c_iscii. If you find bugs here, there're likely similar bugs in the windows version

I decided not to look any further.

No comments:

Post a Comment