Wednesday 2 February 2022

Unicode Trivia U+0AF1

Codepoint: U+0AF1 "GUJARATI RUPEE SIGN"
Block: U+0A80..0AFF "Gujarati"

Sometimes a codepoint loses its lustre. Take U+0AF1 "GUJARATI RUPEE SIGN" as an example.

  • October 1991 — The "Gujarati" block is imported from ISCII into Unicode 1.0 without a specific rupee symbol

The rise...

  • July 2001 — The Indian Ministry of Information Technology suggests the addition of a Gujarati rupee symbol
  • November 2001 — The Unicode Technical Committee agrees to "add this rupee sign for Gujarati to the list of proposed additions, since the symbol is not made from pieces that are already encoded Gujarati characters. The form of this character is very Gujarati-like, and it will be proposed for encoding at this location, rather than in the Currency Symbols block."
  • April 2003 — U+0AF1 "GUJARATI RUPEE SIGN" is formally added to Unicode 4.0

U+0AF1

And fall...

  • October 2009 — Anshuman Pandey proposes the addition of a Gujarati abbreviation sign
  • October 2009 — Anshuman Pandey also proposes that U+0AF1 be deprecated as, with the addition of the abbreviation sign, the Gujarati rupee can be rendered using the codepoint sequence:

    • U+0AB0 "GUJARATI LETTER RA"
    • U+0AC2 "GUJARATI VOWEL SIGN UU"
    • U+0AF0 "GUJARATI ABBREVIATION SIGN"

  • January 2012 — U+0AF0 "GUJARATI ABBREVIATION SIGN" is formally added to Unicode 6.1

Of course, you cannot just remove an existing codepoint from the Unicode standard. What would you do with all the documents that had already embedded U+0AF1 as the rupee symbol? Instead, an annotation was added to U+0AF1 saying

preferred spelling is 0AB0 0AC2 0AF0

Job done? Not quite...

  • September 2018 — Charlotte Buff points out an inconsistency. She "identified the following 18 characters [including U+0AF1] that are strongly implied to be deprecated in the code charts, but actually aren’t in the UCD". She also raises the point that "U+0AF1 does not decompose into its preferred representation"

Should U+0AF1 be formally deprecated? Or should its usage be "discouraged"? Should codepoints in general be decomposed into their preferred spellings?

Personally, I think this is a case that's getting beyond the purview of the core Unicode Standard. Let's face it, U+0AF1 is already out there. Of course, it's difficult to know how prevalent it is; but even one occurrence makes it irrevocable.

And how exactly do you discourage the use of a codepoint, let alone deprecate it? Do you raid people's homes in the middle of the night and confiscate all the Gujarati Rupee codepoints?

The keen-eyed reader will have noticed I haven't actually used codepoint U+0AF1 in this post. I don't want to be woken up at 2am, thank you very much!

No comments:

Post a Comment