Wednesday, 3 June 2015

Gregorian Date Calculation Golf 2

If you browse the source code of a well-known, much-used library, you'll come across a bit of code which I'll paraphrase below:

static const int daysToMonth365[13] = {
    0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365
};
static const int daysToMonth366[13] = {
    0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366
};
int DaysInMonth(int year, int month) {
    const int* days = IsLeapYear(year) ? daysToMonth366
                                       : daysToMonth365;
    return days[month] - days[month - 1];
}

So, what's wrong with this code. Well, frankly, nothing at all ... unless you're playing Gregorian Date Calculation Golf.

The main problem is that, as any schoolchild can tell you, you only need to know if the year is a leap year when asked for the number of days in February. The code above calls "IsLeapYear(year)" for all months. Determining if a year is actually a leap year is (relatively) non-trivial, so things seem to be less than optimal. Another point is that the constant arrays "daysToMonth365" and "daysToMonth366" are used in other algorithms in their library, but the subtraction due to re-use seems to be slightly heavy-handed. Optimizing compilers understandably have a tough time improving matters.

Here's another stab at the function:

static const int daysInMonth365[12] = {
    31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};
int DaysInMonth(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    return daysInMonth365[month - 1];
}

On my machine, this is actually 40% faster than the original implementation without loss of readability. Please remember: this is an academic exercise; I'm not suggesting that this code is necessarily more appropriate.

In the true spirit of Gregorian Date Calculation Golf, a pedant might point out that the table memory references may have an adverse effect on CPU pipelining and caching (but only if they are blind to the enormous "if" statement preceding it!). If that's truly the case, the following implementation packs the table into a 64-bit constant that gets embedded in the machine code:

int DaysInMonth64(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const uint64_t magic = 0x0FFBFEFFFDFF7F9F;
    return int(magic >> ((month - 1) * 5)) & 31;
}

Again, on my machine compiled for x64, this is a smidgen faster than using the array-lookup. Not surprisingly, it performs terribly on 32-bit instruction sets because of the 64-bit shift involved.

A non-table version can be written for 32-bit instruction sets by storing each table entry as two bits:

int DaysInMonth32(int year, int month) {
    if ((month == February) && IsLeapYear(year)) {
        return 29;
    }
    const int magic = 0x03BBEECC;
    return ((magic >> (month << 1)) | -4) + 32;
}

But, although beautiful in its obfuscation, it proves rarely to be faster than using a table lookup.

A quick google for those magic numbers scores no hits, which suggests that these algorithms may be quite novel. I wonder why...

2 comments :

  1. I spoke too soon. There are indeed algorithms out there that use just this kind of magic number. See http://chilliant.blogspot.co.uk/2015/06/gregorian-date-calculation-golf-2a.html

    ReplyDelete