Friday 27 July 2018

egg Sequences 2

Last time we saw how various curly-brace programming languages deal with sequences, if at all. Now we'll concentrate on sequence generation.

Generators

In these examples, I'll use JavaScript because the syntax is more concise, but similar effects have be achieved with C#.

Consider this code:

  // JavaScript
  function* countdown() {
    yield 10;
    yield 9;
    yield 8;
    yield 7;
    yield 6;
    yield 5;
    yield 4;
    yield 3;
    yield 2;
    yield 1;
  }
  for (var i of countdown()) {
    console.log(i);
  }

This obviously counts down from 10 to 1. (I say "obviously" but that wasn't strictly true for me. I used "in" instead of "of" in the for loop and got nothing out. Good luck, learners!)

The first improvement we'd probably want to make is to use a loop:

  // JavaScript
  function* countdown() {
    for (var n = 10; n > 0; --n) {
      yield n;
    }
  }
  for (var i of countdown()) {
    console.log(i);
  }

Next, we could parameterize the countdown:

  // JavaScript
  function* countdown(x) {
    for (var n = x; n > 0; --n) {
      yield n;
    }
  }
  for (var i of countdown(10)) {
    console.log(i);
  }

Nice. But what exactly is "countdown" and what does it return? If you bring up a Chrome console and type in the last snippet, you can get useful insights:


So, "countdown" is a function; no surprise there! But a call to "countdown(10)" produces a suspended "Generator" object with a couple of scopes. This gives an insight into what is happening under the hood.

A JavaScript generator function returns a object that implements the iterator protocol. This consists of the "next()" member function, which, in turn, allows the use of "for ... of" loops and spread syntax:

  > console.log(...countdown(10))
  10 9 8 7 6 5 4 3 2 1

Yield

The JavaScript yield operator differs from the return statement in a couple of fundamental ways.

Firstly, "yield" is resumable. If a function terminates with a "return" statement (including an implicit return at the end), that's it: game over; the function has finished. With a yield, the function may resume at a later date. This implies that the state of the function (e.g. local variables, exception try blocks, etc.) must be preserved when a "yield" is executed so that the generator can continue exactly where it left off. This is a coroutine.

Secondly, a JavaScript "yield" is an operator that produces a value. It is an expression. This allows the caller to pass information back into the iterator function. A "return" is a statement.

Coroutines

If generator functions produce coroutines, what sort of coroutines do they produce?

The various flavours of coroutines are discussed at length in the following papers:

Symmetric versus Asymmetric

Symmetric coroutines can yield to any arbitrary coroutine; asymmetric coroutines can only yield to their "caller". It has been shown that you can build symmetric coroutines from asymmetric ones, and vice versa. Asymmetric coroutines are generally considered to be easier for humans to understand; for one thing, they maintain the notion of caller and callee. Both JavaScript and C# implement asymmetric coroutines.

Stackful versus Non-stackful

Stackful coroutines can yield anywhere, including from functions called by coroutine. Non-stackful coroutines can only can only suspend their execution when the control stack is at the same level that it was at creation time. Non-stackful coroutines have the advantage that yields are unambiguous in cases where one co-routine has explicitly called another. Both JavaScript and C# implement non-stackful coroutines.

Internal-only versus Externalised

This only applies to coroutines that are generators. Internal-only yields can only be used inside foreach-like statements; for example, CLU iterators are internal-only. Externalised yields provide an interface for calling the coroutine at arbitrary call-sites. Both JavaScript and C# implement externalised yields (via the "iterator protocol" and "IEnumerator interface" respectively).

Unidirectional versus Bidirectional

This is my own nomenclature. Unidirectional coroutines use yield statements; i.e. no value can be passed from the caller back to the callee upon resumption. Bidirectional coroutines use yield expressions. In reality, bidirectional coroutines can be approximated using unidirectional ones:

  // JavaScript: bidirectional
  function* generator() {
    // blah blah
    var y = yield x; // caller returns 'y'
    // blah blah
  }

  // JavaScript: unidirectional
  function* generator() {
    // blah blah
    var xy = { x: x };
    yield xy; // caller adds 'xy.y'
    var y = xy.y;
    // blah blah
  }

JavaScript implements bidirectional coroutines; C# implements unidirectional coroutines.

Implementation

Under the hood, calls to JavaScript generator functions create objects that encapsulate the logic of the body of the function and maintain the state. We could do this by hand:

  // JavaScript
  function countdown(x) {
    var n = x;
    return {
       next: function() {
         return (n > 0) ? {value: n--, done: false}
                        : {done: true};
       }
    };
  }

But this is tedious and error-prone, not to mention ugly. Another option is to convert the generator body to resumable code automatically. This usually entails producing a finite state machine that replicates the functionality. There are at least two tools that do this:
  1. Google's Traceur, and
  2. Facebook's Regenerator.
Both produce genuinely scary code.

Perhaps the best solution is to support resumable function bodies natively. On code designed to run on a virtual machine, this is not so difficult. However, for efficiency, generators will probably be run via a different (stackless) code path from standard function calls. This is what the current egg implementation does, and I suspect this is also true for Chrome's generators too. C# on the other hand rewrites the function.

No comments:

Post a Comment