Thursday 4 June 2015

Gregorian Date Calculation Golf 2a

My previous attempt at a Gregorian Date Calculation Golf routine to calculate the number of days in a month can be found here. However, this thread has an interesting discussion about optimizing the calculation in JavaScript. It turns out that using a shift-value to encode the twelve-element table is not new at all; though the following treats all instances of February as special cases and can therefore pack all the remaining months into just one bit each:

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return 30 + ((5546 >> month) & 1);
}

A potentially superior (in JavaScript) alternative is also given:

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return 30 + ((month + (month >> 3)) & 1);
}

Although both these algorithms are slower than a simple table lookup in C/C++ (on my machine), the latter is marginally quicker than my previous 32-bit and 64-bit Golf attempts.

However, even with these elegant solutions, there's always room for "improvement":

int DaysInMonth(int year, int month) {
    if (month == 2) {
        return IsLeapYear(year) ? 29 : 28;
    }
    return (month + (month >> 3)) | 30;
}

Wednesday 3 June 2015

Gregorian Date Calculation Golf 4

Calculating a Gregorian year/month/day triplet from a Rata Die integer involves finding our temporal position within a number of cycles:

  • 146097 days for the 400-year Gregorian leap year rule cycle,
  • 1461 days for the 4-year Julian leap year rule cycle,
  • 12 months in a year, and
  • 30.6 days in the days-in-a-month approximation (see here)
This generally means a lot of divisions and modular arithmetic, which causes no end of problems for Gregorian Date Calculation Golf, particularly Rule 8.

Let's start with the canonical Richards' algorithm:


void FromRataDie(int rd, int& year, int& month, int& day) {
    int jdn = rd + 1721425;
    assert(jdn >= 0);
    int f = jdn + 1401 + (((4 * jdn + 274277) / 146097) * 3) / 4 - 38;
    int e = 4 * f + 3;
    int g = (e % 1461) / 4;
    int h = 5 * g + 2;
    day = (h % 153) / 5 + 1;
    month = (h / 153 + 2) % 12 + 1;
    year = (e / 1461) - 4716 + (14 - month) / 12;
    assert(year >= -4713);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

We've already seen how we can get rid of most of these divisions using bitwise shifts or by using different divisors entirely (instead of 153 and 12, for instance):

void FromRataDie(int rd, int& year, int& month, int& day) {
    assert(rd >= 1);
    int a = rd << 2;
    int b = (a + 147321) / 146097;
    int c = ((a + b * 3) | 3) + 1220;
    int d = c / 1461;
    int e = c - d * 1461;
    int f = (e >> 2) * 2141 + 197688;
    int g = f >> 16;
    int h = (g + 3) >> 4;
    day = ((f & 0xFFFF) / 2141) + 1;
    month = g - h * 12;
    year = d + h;
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

The rather curious bitwise-or with three in the calculation of "c" is due to the observation:

    (x | 3) == (((x / 4) * 4) + 3)


The new constant 2141 is due to the following approximation:

    153 ÷ 5 ~= 65536 ÷ 2141

Alas, our "Division by Integer Constants" trick must resort to 64-bit quantities to obtain a true Golf solution:

void FromRataDie(int rd, int& year, int& month, int& day) {
    assert((rd >= 1) && (rd <= 23936166));
    uint64_t a = uint64_t(rd << 2);
    uint64_t b = (((a + 147321) * 0xE5AC1AF3) >> 49);
    uint64_t c = ((a + b * 3) | 3) + 1220;
    int d = int((c * 0xB36D8398) >> 42);
    int e = int(c - d * 1461);
    int f = (e >> 2) * 2141 + 197688;
    int g = f >> 16;
    int h = (g + 3) >> 4;
    day = (((f & 0xFFFF) * 0x7A71) >> 26) + 1;
    month = g - h * 12;
    year = d + h;
    assert((year >= 1) && (year <= 65535));
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
}

One interesting thing about this algorithm is that none of the constants (with the possible exceptions of 12 and 1461 = 365.25 * 4) look like they've got anything to do with date calculations. Indeed, it looks more like a pseudo-random number generator.

Computing the year and the day-of-year (1 to 366) from the Rata Die integer is left as an exercise for the reader.

Gregorian Date Calculation Golf 3

So let's convert a Gregorian year/month/day triplet to a serial day number. To make things simple, I'm going to expression serial day numbers as Rata Die integers; i.e. the number of calendar days since the day before January 1, 1 C.E.

This means that January 1, 1 C.E. is RD=1 and June 1, 2015 is RD=735750.

Here's the classic Julian Day Number (JDN) calculation offset for the Rata Die scheme:

int ToRataDie(int year, int month, int day) {
    assert(year >= -4713);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (14 - month) / 12;
    int y = year + 4800 - a;
    int m = month + 12 * a - 3;
    int jdn = day + ((153 * m + 2) / 5) + 365 * y +
              (y / 4) - (y / 100) + (y / 400) - 32045;
    return jdn - 1721425;
}

Of course, for Gregorian Date Calculation Golf we want to eradicate those divisions. Fortunately, we're only interested in years between 1 and 65535 inclusive.

The calculation of "a" calls for a division by 12, but careful inspection of the logic uncovers:

    a == 1 for January and February
    a == 0 for all other months

So we can replace that division with another formula that achieves the same result, say:

    a = (18 - month) >> 4

Another culprit is the division by 5 in the calculation of the days until the start of the month:

    (153 * m + 2) / 5

By referring to the excellent Calendrical Calculations, we discover that the constants 153, 2 and 5 in the above equation are themselves somewhat arbitrary (see pp.53-54, but also Zeller's Congruence) and choices that better suit computer hardware can be used instead.

All this, along with a considerable amount of fettling, leads to the following implementation:

int ToRataDie(int year, int month, int day) {
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (18 - month) >> 4;
    int b = month + 12 * a + 1;
    int c = year - a;
    int d = c / 100;
    int e = (b * 1959) >> 6;
    int f = (c * 1461) >> 2;
    return day - d + (d >> 2) + e + f - 428;
}

Note that non-positive (non-Common Era) years are no longer supported; though this restriction can be relaxed relatively easily by offsetting the year and the result correspondingly.

We've already seen how to remove the division by 100 in our discussion of "IsLeapYear()", so our Golf solution is:

int ToRataDie(int year, int month, int day) {
    assert(year >= 1);
    assert((month >= 1) && (month <= 12));
    assert((day >= 1) && (day <= 31));
    int a = (18 - month) >> 4;
    int b = month + 12 * a + 1;
    int c = year - a;
    int d = (((c * 0x47AF) >> 16) + c) >> 7;
    int e = (b * 1959) >> 6;
    int f = (c * 1461) >> 2;
    return day - d + (d >> 2) + e + f - 428;
}

If you supply the day-of-the-year (1 to 366) instead of the month and day, the conversion is pleasingly simple:

int ToRataDie(int year, int dayOfYear) {
    assert(year >= 1);
    assert((dayOfYear >= 1) && (dayOfYear <= 366));
    int c = year - 1;
    int d = (((c * 0x47AF) >> 16) + c) >> 7;
    int f = (c * 1461) >> 2;
    return dayOfYear - d + (d >> 2) + f;
}

Gregorian Date Calculation Golf 2

If you browse the source code of a well-known, much-used library, you'll come across a bit of code which I'll paraphrase below:

static const int daysToMonth365[13] = {
    0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365
};
static const int daysToMonth366[13] = {
    0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366
};
int DaysInMonth(int year, int month) {
    const int* days = IsLeapYear(year) ? daysToMonth366
                                       : daysToMonth365;
    return days[month] - days[month - 1];
}

So, what's wrong with this code. Well, frankly, nothing at all ... unless you're playing Gregorian Date Calculation Golf.

The main problem is that, as any schoolchild can tell you, you only need to know if the year is a leap year when asked for the number of days in February. The code above calls "IsLeapYear(year)" for all months. Determining if a year is actually a leap year is (relatively) non-trivial, so things seem to be less than optimal. Another point is that the constant arrays "daysToMonth365" and "daysToMonth366" are used in other algorithms in their library, but the subtraction due to re-use seems to be slightly heavy-handed. Optimizing compilers understandably have a tough time improving matters.

Here's another stab at the function:

static const int daysInMonth365[12] = {
    31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
int DaysInMonth(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    return daysInMonth365[month - 1];
}

On my machine, this is actually 40% faster than the original implementation without loss of readability. Please remember: this is an academic exercise; I'm not suggesting that this code is necessarily more appropriate.

In the true spirit of Gregorian Date Calculation Golf, a pedant might point out that the table memory references may have an adverse effect on CPU pipelining and caching (but only if they are blind to the enormous "if" statement preceding it!). If that's truly the case, the following implementation packs the table into a 64-bit constant that gets embedded in the machine code:

int DaysInMonth64(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const uint64_t magic = 0x0FFBFEFFFDFF7F9F;
    return int(magic >> ((month - 1) * 5)) & 31;
}

Again, on my machine compiled for x64, this is a smidgen faster than using the array-lookup. Not surprisingly, it performs terribly on 32-bit instruction sets because of the 64-bit shift involved.

A non-table version can be written for 32-bit instruction sets by storing each table entry as two bits:

int DaysInMonth32(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const int magic = 0x03BBEECC;
    return ((magic >> (month << 1)) | -4) + 32;
}

But, although beautiful in its obfuscation, it proves rarely to be faster than using a table lookup.

A quick google for those magic numbers scores no hits, which suggests that these algorithms may be quite novel. I wonder why...