Wednesday 25 August 2021

Machin Postage Stamps 1

There is a web page to accompany these blog posts.

I was sorting through some old family papers when I came across my father's stamp collection. He had a number of unsorted UK stamps that I (being me) decided to organise.

Arnold Machin

The Machin series of UK postage stamps are named after the sculptor (Arnold Machin, 1911-1999) who designed them. His profile of Queen Elizabeth II has been used on many coins and stamps:

Pre-Decimal Machin Stamps

The original, pre-decimal stamps to use the Machin profile (from 1967 onwards) came in denominations of:

½d, 1d, 2d, 3d, 4d, 5d, 6d, 7d, 8d, 9d, 10d, 1/-, …

Each denomination had a different colour, although I'm fairly certain there was no systematic colour scheme.

I found all of the denominations, up to and including the shilling, in my father's collection, so I thought about how to mount them in a display. These stamps aren't very valuable, so permanently sticking them to a bit of card isn't so terrible.

My first thought was just a grid:

Or perhaps a circular layout:


At this point, I was reminded of a clock face. Alas, there was never a "11d" Machin stamp, but one could swap the half penny and shilling stamps and use "½d" for eleven o'clock and "1/-" for twelve o'clock:

Here's a mock-up of a clock built around this layout:

I also built a JavaScript demo loosely based on a beautiful CSS-only clock by Nils Rasmusson.

Decimal Machin Stamps

Next I moved on to the Machin stamps used after decimalisation in 1971. These had all the half-penny increments up to and including 13½p, so two rings could be constructed:

Or, with axis-aligned stamps:

These templates are available on the web page as SVGs with absolute measurements: each stamp is 21mm by 24mm. You can print out the desired page at 100% scale and use the templates when mounting the stamps. Unfortunately the "double ring" layouts don't quite fit on a single sheet of A4 so you'll need to crop and rely on symmetry to physically flip the template.

Mounting

I decided to mount 23 stamps (my father never acquired a "11½p" Machin) using the last template within a 240mm-by-300mm frame:

  1. Print out the template at 100% scale.
  2. Carefully cut along three sides of each stamp "window" with a scalpel.
  3. Position the template over the mounting card.
  4. Secure the template to the mounting card with masking tape. Try to avoid sticking the tape directly to the front of the card. (Figure 1)
  5. Stick the appropriate stamps on to the card using double-sided tape through the windows. (Figure 2)
  6. Carefully remove the template and insert the mounting card into the frame. (Figure 3)
Figure 1

Figure 2

Figure 3

There's a conspicuous gap where the missing stamp should go. I could fill it by splurging a couple of quid on ebay, but the gap itself has a story.

Another project would be to affix a cheap battery quartz movement to a similar clock face. I had hoped to use an old CD for the circular face, but I don't think twelve stamps quite fit.

Tuesday 24 August 2021

Gratuitous Aphorism #8

Experts are just people who have already made all the silly mistakes.

Tuesday 10 August 2021

Hexworld 3: Compression

Last time, we encoded our 311,040 world map hexel indices as a 622,080-character hexadecimal string as a way of embedding the data in an ASCII JavaScript file. We could use the following script:

hexworld(buffer => {
  const DATA = "00000000...ffffffff";
  buffer.set(DATA.match(/../g).map(x => parseInt(x, 16)));
});

This calls a function hexworld(), defined elsewhere, that takes as its only parameter a decompression function that fills in a pre-allocated Uint8Array buffer with the hexel indices. The "decompression" above consists of splitting the long hexadecimal string into 2-character segments and converting these to numeric values.

We can achieve better compression by using the built-in atob() function:

hexworld(buffer => {
  const DATA = "AAAAAAAA...////////";
  buffer.set(Array.from(atob(DATA), x => x.charCodeAt()));
});

The DATA string is reduced in size from 622,080 ASCII characters to 414,720.

If we look at the map itself, we see that it's ripe for run-length encoding or the like.

One possibility is the LZW compression algorithm. I tried this and came up with data plus decompressor what weigh in at under 32KB of ASCII:

function lzw(src) {
  var dic = Array.from({ length: 256 }, (_, i) => [i]);
  var key, dst = [];
  for (var val of src) {
    var nxt = dic[val] || [...key, key[0]];
    if (key) {
      dic.push([...key, nxt[0]]);
    }
    key = nxt;
    dst.push(...key);
  }
  return dst;
}
hexworld(buffer => {
  const DATA = "00007407...cb8cc80f";
  buffer.set(lzw(DATA.match(/.../g).map(x => parseInt(x, 36))));
});

The lzw() function above decompresses an array of integers into an array of bytes. For completeness, here's the corresponding compression function:

function compress(src) {
  const chr = String.fromCharCode;
  var dic = {};
  var idx = 0;
  do {
    dic[chr(idx)] = idx;
  } while (++idx < 256);
  var dst = [];
  var pre = "";
  for (var val of src) {
    var key = pre + chr(val);
    if (key in dic) {
      pre = key;
    } else {
      dst.push(dic[pre]);
      dic[key] = idx++;
      pre = chr(val);
    }
  }
  dst.push(dic[pre]);
  return dst;
}

In our Hexworld case, the LZW dictionary constructed during compression has about 10,000 entries, so those indices can be stored as three base-36 digits. This is a general-purpose compression scheme, so could a more tailored algorithm produce better results? Remember, we're aiming for something less than 16KB.

One solution I looked at was encoding the data as a sequence of 4-bit nybbles:

hexworld(buffer => {
  const DATA = "8bxQEhER...ER8NEw==";
  var input = Array.from(atob(DATA),
    x => x.charCodeAt().toString(16).padStart(2, "0")).join("");
  var i = 0;
  function read(n) {
    i += n;
    return parseInt(input.slice(i - n, i), 16);
  }
  var land = 0;
  var next = 0;
  var o = 0;
  while (o < buffer.length) {
    var n = read(1);
    switch (n) {
      case 0:
        land = next = read(2);
        continue;
      case 12:
        n = read(1) + 12;
        break;
      case 13:
        n = read(2) + 28;
        break;
      case 14:
        n = read(3) + 284;
        break;
      case 15:
        n = read(4) + 4380;
        break;
    }
    buffer.fill(next, o, o + n);
    o += n;
    next = next ? 0 : land;
  }
});

The algorithm is as follows:

  1. Read the next nybble in the stream
  2. If the nybble is 0, the next two nybbles contain the next "land" hexel index and go back to Step 1
  3. If the nybble is between 1 and 11 inclusive, it is used as the count below
  4. If the nybble is 12, the count to use is the value in the next nybble plus 12
  5. If the nybble is 13, the count to use is the value in the next two nybbles plus 28
  6. If the nybble is 14, the count to use is the value in the next three nybbles plus 284
  7. If the nybble is 15, the count to use is the value in the next four nybbles plus 4380
  8. If the last set of values output were "land", write out "count" indices of "water" (0)
  9. Otherwise, write out "count" indices of the current "land"
  10. Go back to Step 1
This algorithm assumes there are never more than about 70,000 value repetitions, which is more than enough for our map. It weighs in at under 19KB of ASCII text, which is getting very close to our goal. One thing to notice is that the nybbles are encoded into ASCII via atob/btoa which use base-64 and are therefore relatively inefficient.

My final attempt (the one that achieved my goal of data plus decompressor within 16KB of ASCII) uses base-95:

hexworld(buffer => {
  const DATA = ')  )gD}l...$)wA#).n';
  for (var d = Array.from(DATA, x => 126 - x.charCodeAt()),
    i = 0, o = 0, a = 0, b = 0, c = 0; DATA[i]; c = c ? 0 : a) {
    var v = d[i++];
    if (v >= 90) {
      v -= 88;
      while (--v) {
        d[--i] = 0;
      }
    }
    var n = ~~(v / 5);
    if (v %= 5) {
      c = v---4 ? v * 95 + d[i++] : b;
      b = a;
      a = c; 
    }
    switch (n++) {
      case 17:
        n = d[i++] * 95 + 112;
      case 16:
        n += d[i++] - 1;
        break;
    }
    buffer.fill(c, o, o + n);
    o += n;
  }
});

This is slightly golfed, so I'll clarify it below.

[In a perverse way, I quite like the expression "v---4" which is shorthand for "(v--) !== 4"]

The data is encoded in base-95: the number of printable ASCII characters. This means that backslashes and quotes need to be escaped within the string. The choices of (a) using single quotes (as opposed to double- or backquotes) and (b) storing values in reverse order ("126 - x" versus "x - 32") minimize the number of escape backslashes needed for our particular map data.

Lead base-95 values are split into two values: low (0 to 4 inclusive) and high (0 to 18 inclusive). The high value holds the repetition count:

  • High values from 0 to 15 inclusive imply repetition counts from 1 to 16,
  • Sixteen takes the next base-95 value to produce a repetition counts from 17 to 111,
  • Seventeen takes the next two base-95 values to produce a repetition counts from 112 to 9136,
  • Eighteen is a special "repeated zero" value that simulates a lead zero repeated 2 to 6 times based on the low value.
The low value (except for when the high value is eighteen) encodes what should be repeated:
  • Zero alternates between the last land index and the water index (0),
  • One sets the land index to the next base-95 value,
  • Two sets the land index to the next base-95 value plus 95,
  • Three sets the land index to the next base-95 value plus 190,
  • Four alternates between the last land index and the previous land index specified.

This rather ad hoc algorithm is based on the following observations:

  1. Water (hexel index zero) is common throughout the whole map,
  2. If the last land hexel was a particular index, it is likely that the next land will be the same index,
  3. Due to the staggered hexel layout, land indices quite often alternate with either water or the previous land index

I'm sure there's plenty more mileage using bespoke compression, but after a certain point it becomes unhealthy. For example, one option I didn't pursue is re-ordering the country-to-hexel index allocation in order to improve compression. This smells suspiciously like super-optimisation.

One last avenue to explore for compressing the hexel map is the recent JavaScript Compression Stream API, but that's for another time.

        Friday 6 August 2021

        Hexworld 2: Encoding

        As mentioned in the first part, the map of Hexworld consists of a 864-by-360 staggered grid of hexagons. Each hexagon (hexel) holds an 8-bit index, so the whole thing fits into about 300KB of binary data. It can be stored as an 8-bit PNG weighing in at about 13½KB:

        Greyscale optimisations and PNG Crush will reduce this to about 11½KB. However, reading the pixel index values back using JavaScript so that we can reconstruct the hexel data is problematic. One would think you could simply do the following:

        • Encode the greyscale PNG as a data URL,
        • Load it into an offscreen img element,
        • Draw it into an offscreen canvas element, 
        • Capture the pixel data via getImageData() element, then
        • Reconstruct the indices.

        The two main stumbling blocks are:

        1. Loading images (even data URLs) is an asynchronous activity.
        2. Colour space management typically applied a "gamma" correction which means the greyscale RGB values are not one-to-one with the original indices.
        The first problem can be solved with some truly horrific fudging:

            var done = false;
            img.onload = () => {
              process(img);
              done = true;
            };
            img.src = dataURL;
            while (!done) {
              await new Promise(r => setTimeout(r, 100));
            }

          The second issue cannot be solved without using a third-party JavaScript library (like pngjs) to decode the raw PNG data and extracting the indices directly.

          Another option is to encode the raw pixel data (as opposed to the PNG) into the JavaScript script itself. For example:

            function decode_hex(buffer) {
              const HEXWORLD = /* hexadecimal string */;
              buffer.set(HEXWORLD.match(/../g).map(x => parseInt(x, 16)));
            }

          This function takes a 311,040-element Uint8Array (that's 864 times 360) as an argument and fills it with the indices. Unfortunately, the hexadecimal string is over 600,000 characters long!

          If we limit ourselves to ASCII JavaScript, can we do better?