Sunday 29 November 2020

egg Pointer Ambiguity

I've been beavering away on the egg programming language for the last few months. One of the major stumbling blocks has been the type system. As Hisham Muhammad points out, type systems are almost always more complex they first appear to be. I've had to re-implement the compiler and virtual machine several times because of fundamental mistakes I've made with the egg type system, even though it's supposedly "simple."

Here's an example of one problem I'm still struggling with right now.

I'd like to have "safe pointers" in the egg language to handle concepts such as pass-by-reference:

bool safeDivide(float num, float den, float* out) {
  if (den == 0) {
    return false;
  }
  *out = num / dev;
  return true;
}

This all looks hunky-dory, but now consider this:

any v = 123; // line 1
int* p = &v; // line 2
v = "hello"; // line 3

In line 1, we define a variable 'v' that can store most types of value and initialize it with an integer value. In line 2, we define a pointer variable 'p' and point it at 'v'. In line 3, we modify 'v' to be a string. The question is: "What is the value of '*p' after line 3?"

The type declaration of 'p' suggests that '*p' should (always) be an integer, but it's now pointing to a string. There's obviously something "wrong" here, but what exactly is it?

Option A

Line 2 should have produced a compile-time error along the lines of
Cannot initialize a pointer to a value of type 'int' with the address of a value of type 'any'

That is, we only allow pointers to point to values of exactly the appropriate type. This requires that the type of any operand of the address-of '&' operator is known precisely at compile-time.

Option B

Line 3 should have produced a compile-time error because the assignment invalidates the type constraint of 'p'. This requires us to perform some very sophisticated static analysis; I'm not even sure it's possible beyond trivial examples.

Option C

Line 3 produces a runtime error because the assignment invalidates the type constraint of 'p'. This requires us to keep track of all pointers pointing to a value and checking for invalidation on every assignment. This "observer" scheme sounds very expensive to me.

Option D

Produce a runtime error if or when 'p' is subsequently dereferenced and it is discovered that it no longer points to an integer. This would mean that the error is raised "at a distance" from the assignment that caused the issue, thereby making debugging more difficult.

Option E

We make egg pointers "typeless" or, put it another way, all pointers must be of type 'any?*'. This means that if the pointee changes type, we don't really care.

Option F

Nothing is wrong! Just live with the fact that '*p' isn't necessarily an integer, even though it's defined like that.

It all comes down to how the runtime type of the pointee and the compile-time declaration of the pointer interact. I cannot imagine I'm the first to have come across this issue, but a quick search of literature hasn't come up with anything. But then, I don't know what the problem is called, so I'm stumbling in the dark somewhat.

My current "least-hated solution" is a hybrid of Options A and D: try to detect inconsistencies at compile-time but fall back to checking at runtime whenever the pointer is dereferenced.