Last time, we encoded our 311,040 world map hexel indices as a 622,080-character hexadecimal string as a way of embedding the data in an ASCII JavaScript file. We could use the following script:
hexworld(buffer => {const DATA = "00000000...ffffffff";buffer.set(DATA.match(/../g).map(x => parseInt(x, 16)));});
This calls a function hexworld(), defined elsewhere, that takes as its only parameter a decompression function that fills in a pre-allocated Uint8Array buffer with the hexel indices. The "decompression" above consists of splitting the long hexadecimal string into 2-character segments and converting these to numeric values.
We can achieve better compression by using the built-in atob() function:
hexworld(buffer => {const DATA = "AAAAAAAA...////////";buffer.set(Array.from(atob(DATA), x => x.charCodeAt()));});
The DATA string is reduced in size from 622,080 ASCII characters to 414,720.
If we look at the map itself, we see that it's ripe for run-length encoding or the like.
One possibility is the LZW compression algorithm. I tried this and came up with data plus decompressor what weigh in at under 32KB of ASCII:
function lzw(src) {var dic = Array.from({ length: 256 }, (_, i) => [i]);var key, dst = [];for (var val of src) {var nxt = dic[val] || [...key, key[0]];if (key) {dic.push([...key, nxt[0]]);}key = nxt;dst.push(...key);}return dst;}hexworld(buffer => {const DATA = "00007407...cb8cc80f";buffer.set(lzw(DATA.match(/.../g).map(x => parseInt(x, 36))));});
The lzw() function above decompresses an array of integers into an array of bytes. For completeness, here's the corresponding compression function:
function compress(src) {const chr = String.fromCharCode;var dic = {};var idx = 0;do {dic[chr(idx)] = idx;} while (++idx < 256);var dst = [];var pre = "";for (var val of src) {var key = pre + chr(val);if (key in dic) {pre = key;} else {dst.push(dic[pre]);dic[key] = idx++;pre = chr(val);}}dst.push(dic[pre]);return dst;}
In our Hexworld case, the LZW dictionary constructed during compression has about 10,000 entries, so those indices can be stored as three base-36 digits. This is a general-purpose compression scheme, so could a more tailored algorithm produce better results? Remember, we're aiming for something less than 16KB.
One solution I looked at was encoding the data as a sequence of 4-bit nybbles:
hexworld(buffer => {const DATA = "8bxQEhER...ER8NEw==";var input = Array.from(atob(DATA),x => x.charCodeAt().toString(16).padStart(2, "0")).join("");var i = 0;function read(n) {i += n;return parseInt(input.slice(i - n, i), 16);}var land = 0;var next = 0;var o = 0;while (o < buffer.length) {var n = read(1);switch (n) {case 0:land = next = read(2);continue;case 12:n = read(1) + 12;break;case 13:n = read(2) + 28;break;case 14:n = read(3) + 284;break;case 15:n = read(4) + 4380;break;}buffer.fill(next, o, o + n);o += n;next = next ? 0 : land;}});
The algorithm is as follows:
- Read the next nybble in the stream
- If the nybble is 0, the next two nybbles contain the next "land" hexel index and go back to Step 1
- If the nybble is between 1 and 11 inclusive, it is used as the count below
- If the nybble is 12, the count to use is the value in the next nybble plus 12
- If the nybble is 13, the count to use is the value in the next two nybbles plus 28
- If the nybble is 14, the count to use is the value in the next three nybbles plus 284
- If the nybble is 15, the count to use is the value in the next four nybbles plus 4380
- If the last set of values output were "land", write out "count" indices of "water" (0)
- Otherwise, write out "count" indices of the current "land"
- Go back to Step 1
My final attempt (the one that achieved my goal of data plus decompressor within 16KB of ASCII) uses base-95:
hexworld(buffer => {const DATA = ') )gD}l...$)wA#).n';for (var d = Array.from(DATA, x => 126 - x.charCodeAt()),i = 0, o = 0, a = 0, b = 0, c = 0; DATA[i]; c = c ? 0 : a) {var v = d[i++];if (v >= 90) {v -= 88;while (--v) {d[--i] = 0;}}var n = ~~(v / 5);if (v %= 5) {c = v---4 ? v * 95 + d[i++] : b;b = a;a = c;}switch (n++) {case 17:n = d[i++] * 95 + 112;case 16:n += d[i++] - 1;break;}buffer.fill(c, o, o + n);o += n;}});
This is slightly golfed, so I'll clarify it below.
[In a perverse way, I quite like the expression "v---4" which is shorthand for "(v--) !== 4"]
The data is encoded in base-95: the number of printable ASCII characters. This means that backslashes and quotes need to be escaped within the string. The choices of (a) using single quotes (as opposed to double- or backquotes) and (b) storing values in reverse order ("126 - x" versus "x - 32") minimize the number of escape backslashes needed for our particular map data.
Lead base-95 values are split into two values: low (0 to 4 inclusive) and high (0 to 18 inclusive). The high value holds the repetition count:
- High values from 0 to 15 inclusive imply repetition counts from 1 to 16,
- Sixteen takes the next base-95 value to produce a repetition counts from 17 to 111,
- Seventeen takes the next two base-95 values to produce a repetition counts from 112 to 9136,
- Eighteen is a special "repeated zero" value that simulates a lead zero repeated 2 to 6 times based on the low value.
- Zero alternates between the last land index and the water index (0),
- One sets the land index to the next base-95 value,
- Two sets the land index to the next base-95 value plus 95,
- Three sets the land index to the next base-95 value plus 190,
- Four alternates between the last land index and the previous land index specified.
This rather ad hoc algorithm is based on the following observations:
- Water (hexel index zero) is common throughout the whole map,
- If the last land hexel was a particular index, it is likely that the next land will be the same index,
- Due to the staggered hexel layout, land indices quite often alternate with either water or the previous land index
I'm sure there's plenty more mileage using bespoke compression, but after a certain point it becomes unhealthy. For example, one option I didn't pursue is re-ordering the country-to-hexel index allocation in order to improve compression. This smells suspiciously like super-optimisation.
One last avenue to explore for compressing the hexel map is the recent JavaScript Compression Stream API, but that's for another time.
No comments:
Post a Comment