Saturday 8 January 2022

Unicode Trivia U+0000

[This blog post is part of the Universe series of investigations]

Codepoint: U+0000 "CONTROL CODE <NUL>"
Block: U+0000..007F "Basic Latin"

Unicode is built on top of any number of standards. The history of the first Unicode codepoint (U+0000, a.k.a. control code <NUL>) is an object lesson in the workings of standards committees.

  • The Unicode 1.0 (1991) standard, along with ISO/IEC 10646, bases its first 128 codepoints on ISO/IEC 646.
  • ISO/IEC 646 (1972) was itself based on ASCII.
  • The first major release of the ASCII standard was ASA X3.4-1963.

In the late 1950s, American Telephone and Telegraph Company (AT&T) had stated to the predecessors of the ASA X3 committee a functional requirement for any new standard to have an all-zeroes character named NULL (or IDLE). See Chapter 13 of Coded Character Sets, History and Development by Charles E. Mackenzie, 1980.

This "all-zeroes" requirement was almost certainly due to the practice of leaving gaps in punched tape or cards that could be "overwritten" with other characters later on without having to reissue the whole tape or card deck. Similarly, AT&T also requested an all-ones bit pattern to delete a character by "punching out" all the holes in the row. As ASA X3.4-1963 was a 7-bit character set, this led to the <DEL> control code eventually ending up at U+007F.

Further back in time, the "all-zeroes" control code was part of the International Telegraph Alphabet No. 2 (ITA2) code of 1924. This, inexorably, was a development of Baudot (ITA1) code developed in the 1870s and patented in the United States in 1888:

The final three rows of the table above are:

  1. Figure switch
  2. Letter switch
  3. Instrument at rest

where "Instrument at rest" is the idle state for teleprinters, a.k.a. NUL.

No comments:

Post a Comment