chilliant: egg Sequences 1

Sequences of items are a recurring theme in programming, but different computer languages have differing levels of support for them. For the purposes of these posts, I'm going to define a sequence as:

Zero or more homogeneous elements
Elements may be repeated
Elements have a position (or positions, in the case of repeated elements) within the sequence
The only operation allowed for a consumer is to fetch the next element; this may fail if we have reached the end of the sequence
Sequences may be endless

This definition is known by many names in different languages (lists, enumerations, streams, iterations, etc.) but I'll stick with "sequence" to avoid ambiguity.

Let's break down the definition. Point 1 sounds quite restrictive: what if we want a sequence of integers and strings? This isn't really a problem in a language with composable types; we just define the type of the elements in our sequence to be the union of all the expected, individual types. In egg, we could use the hold-all "any" (or "any?" if we want nulls too).

Point 4 hints that, from a programming point of view, the consumer of a sequence doesn't see very much. But what does it need to see?

Support in Existing Languages

Rust has a very good interface for what it calls iterators. It has a single member function that returns the next element or nothing:

// Rust

fn next(&mut self) -> Option;

JavaScript has a similar protocol. The single member returns an object with two fields ("value", which may be undefined, and "done", which is a Boolean):

// JavaScript

function next();

Other languages are, in my opinion, a bit inferior in this respect. C# defines an enumerator interface with three members:

// C#

public T Current { get; }

public bool MoveNext();

public void Reset();

My two concerns with C#'s "IEnumerator<T>" are:

Getting the next element is not an atomic operation: you need to move to the next element and then read it.
The "Reset" method suggests that sequences are always "replayable". This is rarely the case (see the comment in the source).

In Java 8, sequences are named "iterators". The interface has four members:

// Java

default void forEachRemaining(Consumer action)

boolean hasNext();

T next();

default void remove();

The first is a helper that can be overridden to optimise sequence operations. The next two exhibit the same atomicity problem that C#'s interface has. The final "remove" method is just plain strange. Needless to say, the default implementation throws a "not supported" exception.

C++ iterators are heavily templated and do not, in general, work via interfaces (though see Boost for some work towards this). Iterators in C++ are often bi-directional.

Atomicity

As I mentioned above, C# and Java have potential atomicity problems. In the vast majority of cases, these never appear or can be avoided by using additional locks. However, sequences can be a useful building block in concurrent systems (e.g. CSP), so native atomicity is a desirable feature.

For example, imagine a sequence of tasks being used as a queue of work for two or more consumers. If the consumers cannot atomically fetch the next work item from the sequence, there is the potential for work items to be accidentally skipped or even performed twice.

For that reason, if written in C++17, one interface for sequences could be:

// C++

template<typename T> class Sequence {

virtual ~Sequence() {}

virtual std::optional<T> next() = 0;

};

This is effectively the Rust interface above.

Sequence Functions in egg

In egg, we can simplify this interface (for integers) to a single function signature:

// egg

void|int next()

We could have used the following:

// egg

int? next()

But that would mean that sequences could not contain null values.

The type of a sequence function for integers is therefore "(void|int)()" which is somewhat ugly. I originally thought that abbreviating this to "int..." would be quite intuitive, but quickly ran into syntax problems (see later). At present, I am toying with the idea of using "int!" as a shorthand, based on some fuzzy notion that other languages using "!" when dealing with channels.

Thus, if a function took two integer sequences and merged them in some way, the signature could be:

// egg

int! merge(int! left, int! right)

This implies that sequences are first-class entities: what R. P. James, and A. Sabry call "externalised yields" that can be used outside of the "for-each" control statement.

However, this only deals with consumers of sequences. How does one produce a sequence in the first place?

Sequence Generators

One way to create a sequence is via a container that permits iteration. Most languages that support the "for-each" control statement allow you to iterate around elements in an array, for example.

// JavaScript

var a = [10, 20, 30];

for (var v of a) {

process(v);

}

However, one of the strengths of sequences is that the elements can be generated on-demand or lazily. Indeed, some sequences are infinite, so populating a container with all the elements of a sequence may be impractical or impossible.

Both C# and JavaScript have the "yield" keyword. C, C++ and Java have non-standard mechanisms for achieving similar effects, but they are not part of the language.

Here's a generator function to produce the infinite, zero-based, Fibonacci sequence in JavaScript:

function* fibonacci() {

var a = 0;

var b = 1;

for (;;) {

yield a;

var c = a + b;

a = b;

b = c;

}

And in C#:

public IEnumerable<int> fibonacci() {

int a = 0;

int b = 1;

for (;;) {

yield return a;

int c = a + b;

a = b;

b = c;

}

The syntax is slightly different, but the fundamentals appear similar. Surely that means that generator functions are trivial to implement? Alas, no.

Next time, I'll discuss why generators are so complicated under the surface. In the mean-time, might I again suggest this paper as some bed-time reading?

chilliant

Tuesday, 24 July 2018

egg Sequences 1

Support in Existing Languages

Atomicity

Sequence Functions in egg

Sequence Generators

No comments:

Post a Comment

Search This Blog

Blog Archive

About Me

Links