Thursday 8 March 2018

Vexatious Parses in C++

As part of my work on the egg computer language specification, I've been looking into parsing curly-brace-type languages. There are a number of cul de sacs in these language specifications. Here's one from C++ I've been struggling with today:
    int a = 1;
    int b = 2;
    int c = a-b;
What's the value of "c"? Obviously, it's minus one. But what about this:
    c = a--b;
My Microsoft compiler tells me that this is a malformed expression:
    syntax error: missing ';' before identifier 'b'
But the following is fine:
    c = a---b;
This sets "c" to minus one and decrements "a". Honest.

Here's a list of parses:
    a-b      // Parsed as "a - b"
    a--b     // Fails to compile: missing ';' before identifier 'b'
    a---b    // Parsed as "a-- - b"
    a----b   // Fails to compile: '--' needs l-value
    a-----b  // Fails to compile: '--' needs l-value

    a- -b    // Parsed as "a - -b"
    a- --b   // Parsed as "a - --b"
    a-- -b   // Parsed as "a-- -b"
    a- - -b  // Parsed as "a - - -b"
The compiler is obviously "greedy" when parsing operators; so, in the absence of white-space, it's easy for it to overlook an alternative interpretation:
    a--b     // COULD be parsed as "a - -b"
    a----b   // COULD be parsed as "a-- - -b"
    a-----b  // COULD be parsed as "a-- - --b"
I expect the compiler-writers have their hands tied by the formal language specification. But, for a new language like egg, I don't have any such restrictions.

I decided that prefix and postfix increments/decrements as expressions are bad things. This is mainly due to problems associated with side-effects and evaluation ordering. Consider:
    int a = p[++i] + p[i++]; // Not allowed
However, I think I will retain the prefix increment/decrement statements:
    ++i; // Allowed
    --i; // Allowed
    i++; // Not allowed
    i--; // Not allowed
This permits the idiomatic counter-based loop:
    for (i = 0; i < count; ++i) {
The reasons for only allowing the prefix versions are two-fold:
  1. It make the language specification much less ambiguous; and
  2. People still harp on about prefix increments/decrements being slightly faster than their postfix variants, which is why they are "preferred" for looping.
Whilst I was at it, I also decided I can probably do without the unary '+' operator. That gets rid of the truly vexatious:
    c = a+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+b;

1 comment :

  1. In Swift4 they removed all the prefix and suffix increments,
    Now you have to explicitly write i += 1 instead.