Saturday 28 July 2018

egg Sequences 3

Sequences (as introduced earlier) are a good way to abstract lists of homogeneous data. We are only allowed to see one piece of data at a time and we don't know (until we get there) where the end of the data is. Indeed, sequences can be infinite.

This promotes the use of streaming algorithms which allow us to process very large data sets (i.e. ones that cannot all fit into memory at one time).

However, there is a hidden danger here:
Sequences may promote sequential processing of data
The hint's in the name!

Sequential processing on today's multi-core computers is rightly frowned upon. We should be writing code that is either explicitly parallel or gives the compiler and/or run-time the opportunity to parallelize. But sequential list processing, where you primarily only process the head of the list before moving on to the tail elements, goes all the way back to Lisp; over sixty years ago! It's no wonder these concepts are so deeply ingrained.

Sequences are often reduced via accumulation. As Guy Steel hints at in "Four Solutions to a Trivial Problem", accumulation can be difficult to automatically parallelize. The abandoned Fortress project was an attempt to mitigate this.

The egg language (not in the same league!) is designed to be easy to learn, but I would like some aspects of concurrency and parallelization to be accessible, even if only as a taste of more complex paradigms. Microsoft's PLINQ demonstrates that it is possible to build parallel execution on top of sequences, but I need to do a lot more research in this area. In particular, do I need to worry about this as part of the language specification as opposed to a library, like PLINQ?

No comments:

Post a Comment