Tuesday 2 October 2018

egg Syntax 1

I've been working intensively on the syntax of the egg programming language. In particular, I've been looking at methods for teaching the syntax to learners not familiar with programming languages. But first, as ever, some background...

Backus-Naur Form

Backus-Naur Form (BNF) is a formal specification typically used to describe the syntax of computer languages. In its simplest form, it is a series of rules where each rule offers a choice or sequence of two or more further rules. Ironically, the syntax of BNF varies greatly, but I'll use the following:

<integer> ::= <zero> | <positive>
<positive::= <one-to-nine<opt-digits>
<opt-digits::= <digitsε
<digits::= <digit<opt-digits>
<digit::= <zero<one-to-nine>
<zero::= "0"
<one-to-nine> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

The epsilon "ε" represents a non-existent element.

The rules above define the syntax for (non-negative) integers. Informally,
• An integer is either zero or a positive integer.
• A positive integer is a digit "1" to "9" followed by zero or more digits "0" to "9".
These rules explicitly disallow sequences such as "007" being interpreted as "integers".

Formal BNF is great for computers to parse, but verbose and opaque for humans to read. The usual compromise is Extended Backus-Naur Form.

Extended Backus-Naur Form

Extended Backus-Naur Form (EBNF) adds a few more constructs to make rules more concise and (allegedly) easier to read:
• Suffix operator "?" means "zero or one" of the preceding element or group;
• Suffix operator "*" means "zero or more" of the preceding element or group;
• Suffix operator "+" means "one or more" of the preceding element or group;
• The need for the epsilon "ε" symbol can be removed; and
• Parentheses are used to group elements
Our example syntax above could be re-written in EBNF as:

<integer::= <zero| <positive>
<positive::= <one-to-nine<digit>*
<digit::= <zero<one-to-nine>
<zero::= "0"
<one-to-nine> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

EBNF syntax rules are used a great deal in computer science, but, as can be seen in the "official" EBNF of EBNF (Section 8.1), it's still quite impenetrable for non-trivial cases.

Railroad diagrams are graphic representations of syntax rules. I first came across them when I learned Pascal and I believe they are one of the factors in making JSON so successful. As with their textual counterparts, railroad diagrams come in a number of flavours. One of my favourites is Gunther Rademacher's. Paste the following into the "Edit Grammar" tab of http://www.bottlecaps.de/rr/ui for an example:

integer ::= zero | positive
positive ::= one-to-nine digit*
digit ::= zero | one-to-nine
zero ::= "0"
one-to-nine ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

However, with existing railroad syntax diagrams, there's generally a one-to-one correspondence between rules and images. I wondered if there was a way to break this link.

Egg BNF Diagrams

I wrote a simple railroad diagram generator in JavaScript with the following features:

Rules

Rules are enclosed in pale blue boxes:
Terminal tokens are in purple rectangles. References to rules are in brown ovals. Tracks are green lines terminated by green circles.

Choices

Choices are stacked vertically:
Optional elements branch below the main line:

Loops

Loops appear above the main line. There are three main forms: zero or more, one or more and lists with separators:

Embedding

Rule definitions may be embedded as one of the occurrences within another rule:
Using these features, you can express our example syntax using individual Egg BNF Diagrams:
Or you can embed the rules into a single diagram:
Personally, I find the last diagram gives me a fair indication of the overall structure of the syntax when compared to the stack of diagrams for individual rules.

This gave me the idea of a single poster diagram for the entire egg programming language syntax...