Saturday, 30 May 2015

Gregorian Date Calculation Golf 1

It's been a while since my last post to this blog, so I thought I'd return to a guilty pleasure of mine: Needless optimizations of date calculations. See Computus 3.

Date calculation optimizations are often overlooked because of the scope for subtle errors. If you are writing production code that is going to be used in anger, I strongly suggest you check and re-check everything you see hereafter. At the very least, run it through a thorough test suite.

So, what are the rules of "Gregorian Date Calculation Golf"? Well...

  1. Pseudo-code must be presented in C/C++/C# flavour.
  2. The integer type "int" is expected to be at least 32 bits long.
  3. The integer type "long" is expected to be at least 64 bits long.
  4. A (proleptic) Gregorian year is assumed to be between 1 (meaning 1 CE) and 65535 inclusive.
  5. A Gregorian month is assumed to be between 1 (January) and 12 (December) inclusive.
  6. Leap days must be handled correctly.
  7. Time and timezones are ignored.
  8. Integer, arithmetic divisions (and their cousins) are not permitted.
Rule 8 is where the fun starts!

To warm up, let's assume we want to decide whether a Gregorian year is a leap year.

According to Wikipedia, the logic is:
if (year is not exactly divisible by 4) then (it is a common year)
else
if (year is not exactly divisible by 100) then (it is a leap year)
else
if (year is not exactly divisible by 400) then (it is a common year)
else (it is a leap year)
 So here's an obvious implementation:

  bool IsLeapYear(int year) {
    if ((year % 4) != 0) {
      return false;
    }
    if ((year % 100) != 0) {
      return true;
    }
    if ((year % 400) != 0) {
      return false;
    }
    return true;
  }

However, the "%" modulo operator involves an implicit division, violating our Rule 8. We can remove two of the divisions by using the bitwise-and operator "&":

  bool IsLeapYear(int year) {
    if ((year & 3) != 0) {
      return false;
    }
    int century = year / 100;
    if ((century * 100) != year) {
      return true;
    }
    if ((century & 3) != 0) {
      return false;
    }
    return true;
  }

However, there's still a division by a constant (100). Fortunately, we can use the fact that:
x / c = x * (1 / c)
This technique is known as "Integer Division by Constants". Indeed, any optimizing compiler worth its salt will convert the division in the code above to a multiplication. But, in the spirit of Gregorian Date Calculation Golf, we'll do it explicitly.

  bool IsLeapYear(int year) {
    if ((year & 3) != 0) {
      return false;
    }
    int century = (((year * 0x47AF) >> 16) + year) >> 7;
    if ((century * 100) != year) {
      return true;
    }
    if ((century & 3) != 0) {
      return false;
    }
    return true;
  }

This can be needlessly re-factored to:

  bool IsLeapYear(int year) {
    if ((year & 3) == 0)
    {
      int century = (((year * 0x47AF) >> 16) + year) >> 7;
      return ((century * 100) != year) || ((century & 3) == 0);
    }
    return false;
  }

So, can we convert Gregorian year/month/day triplets to and from serials (i.e. the number of days since an epochal date)? Of course we can! Next time.