Tuesday, 12 June 2018

egg Null and Void

When I first came across the "void" keyword in the C programming language in the late eighties, I was very confused. What was the point of a type that couldn't hold any values? Judging by the questions on stackoverflow, it's still a problem for many learners of many languages. A related issue is the confusion over "null" and "undefined" in JavaScript. This particular thorn has been perpetuated in supposedly "cleaner" languages like TypeScript.

I've been struggling with this issue in the design of egg. But first some background...

Background

The introduction of the concept of "null" into higher-level computer languages by Tony Hoare has been called a "billion-dollar mistake". The idea is that values of the Null type (e.g. "nullptr_t" in C++) can only ever hold a single null value ("nullptr" in C++). In some languages, such at Java, object types (including "String") are implicitly sub-classes of the Null type. That is, you can always store a null value in a Java string variable. In C++, you can always assign "nullptr" to a pointer (e.g. "int*"), though never to references (e.g. "int&").

In contrast, if we ignore "compromised" type systems such as TypeScript, the "void" type can never hold any value. Thus, in C/C++, de-referencing of void pointers is not allowed.

This all fits nicely into an ontology of types:

  • Entities of type "void" can hold no values.
  • Entities of type "null" can hold only one value: "null"
  • Entities of type "bool" can hold only two values: "false" and "true"
  • etc, etc.

Plan A

The problem I'm struggling with in egg is when you use "void" and when to use "null". My first thought was to allow functions to return void (i.e. nothing) as an option as well as being able to return null or some other type:

  float|void root(float x) {
    if (x >= 0) {
      return Math.sqrt(x);
    }
    return; // Returns a "void" sentinel
  }
  var a = root(-2); // Run-time failure: cannot assign from void

But this can be simulated with exceptions:

  float root(float x) {
    if (x >= 0) {
      return Math.sqrt(x);
    }
    throw Exception("negative root");
  }
  var a = root(-2); // Run-time exception

or nulls:

  float? root(float x) {
    if (x >= 0) {
      return Math.sqrt(x);
    }
    return null;
  }
  var a = root(-2); // a is set to null

But the real motivation for using void returns is in generators:

  int|void ...counter() {
    yield 1;
    yield 2;
    yield 3;
    // implicit "return;"
  }
  for (var i : counter()) {
    print(i); // Print 1, 2, 3
  }

When control from the "counter" co-routine falls "off the end" of the function, it implicitly returns a void sentinel which terminates the for-each loop iterator. The same mechanism could be used in "guarded assignment" conditions:

  Person|void getSpouse(Person p) {
    Person spouse;
    if (object.tryGetProperty(p, "spouse", &spouse)) {
      return spouse;
    }
  }
  if (var wife = einstein.getSpouse()) { // Guarded assignment
    print("Einstein's wife is ", wife.name);
  } else {
    print("Einstein has no wife"); // 'wife' is out of scope
  }

Given that one of the aims of egg is to reduce extraneous cognitive load, do we really need two concepts, "null" and "void", instead of just one? Is "computer science" making "learning programming" more complicated than necessary? This is the problem I've been struggling with.

Plan B

I don't think we can get rid of the concept of "void" for functions that do not return a value (a.k.a. methods). Fortunately [at the time of writing!], tuples are not supported in egg. Therefore, we could enforce that user-defined functions are declared to return either zero or one value. Generators terminate with a (hidden) sentinel and functions designed to work with guarded assignment conditions should return null.

This would make things look simpler in the examples above:

  int... counter() { // Removed "|void"
    yield 1;
    yield 2;
    yield 3;
    // implicit "return;"
  }
  for (var i : counter()) {
    print(i); // Print 1, 2, 3
  }

and

  Person? getSpouse(Person p) { // Replaced "|void" with "?"
    Person spouse;
    if (object.tryGetProperty(p, "spouse", &spouse)) {
      return spouse;
    }
    return null; // Now needed
  }
  if (var wife = einstein.getSpouse()) { // Guarded assignment
    print("Einstein's wife is ", wife.name);
  } else {
    print("Einstein has no wife"); // 'wife' is out of scope
  }

This mechanism seems to clear up another (minor) issue I was worrying about: what is the difference between

  if (var wife = einstein.getSpouse()) ...

and

  if (var? wife = einstein.getSpouse()) ...

With the second plan, it's obvious that the latter form is not useful and I suspect that my gut feeling that "var?" is a useless construct anywhere is true.

Side Note

I came up with the notation "type1|type2" independently, but have since discovered that (at least) the Pike programming language uses the same syntax. Pike also only allows "void" to be used for function return types.
"One thing [the language designer] should not do is to include untried ideas of his own. His task is consolidation, not innovation." 
C. A. R. Hoare, "Hints on Programming Language Design" 1973

No comments:

Post a Comment