This Doesn’t Make Sense

I believe I found an inconsistency problem with the definition of the syntactic grammar in C#.
This problem remains with the interim version of the c# 2.0 spec. and worries me a great deal.

<!– BEGIN TECHNICAL MUMBO JUMBO –>

Chapter 5 of the specification starts with the following phrase:

“… The lexical grammar defines how characters can be combined to form tokens (§9.4), the minimal lexical elements of the language. The syntactic grammar defines how tokens can be combined to make valid C# programs.”

The lexical element token is defined:

token::
identifier
keyword
integer-literal
real-literal
character-literal
string-literal
operator-or-punctuator

This means that a token may be one of the following: an identifier, a keyword, an integer literal, a real literal, a character literal, a string literal, an operator or a punctuator.
Please note the fact that the lexical element is literal is not one of those options.

The grammatical element primary-no-array-creation-expression is defined:

primary-no-array-creation-expression:
literal
simple-name
parenthesized-expression
member-access
invocation-expression
element-access
this-access
base-access
post-increment-expression
post-decrement-expression
object-creation-expression
delegate-creation-expression
typeof-expression
checked-expression
unchecked-expression
pointer-member-access
pointer-element-access
sizeof-expression

This means that primary-no-array-creation-expression may be a literal, in addition to the other options specified.

Now, giving that all grammatical elements must be made out of lexical tokens, it surprises me that the primary-no-array-creation-expression might be a literal, which is a lexical element which is not a token nor any descendant of a token!

<!– END TECHNICAL MUMBO JUMBO –>

What all this means is that the specification is in err.

However, there are two ways to correct this:

  1. Replace the list of literals token may be with just the literal lexical element.
    This may have some implications that I have not yet foreseen, but it seems like the most logical of choises.
  2. Replace literal with the list of valid literals (those that can be tokens).
    The downside of that is that you could not, taking it literally (no pun intended), do this:
    bool b1 = true;
    bool b2 = false;
    object o = null;

    since null, true and false will not be valid values as assignees.

Oh, the burden of decision. Glad it’s not mine (ok, not really; I actually want to be a part of the decision making).
I submitted the problem to ECMA TC39-TG2‘s Convenor, Microsoft’s Jon Jagger, who has confirmed that this is indeed a problem.

My only remaining question is: How has no one found out about this up until now? There have been too many compilers written to ignore this mistake.
My guess is that everyone just cut corners and were done with it. No idea why, though.

Advertisements

2 thoughts on “This Doesn’t Make Sense

  1. I think you’re a little stuck in the Aho-Ullman web ;).

    Grammars like C#, VB.NET and other 3rd generation languages are ambiguistic. This means that you can’t write an LR(n) parser for them, you need a custom LL(n) parser, with n probably bigger than 1. At least, if you don’t care about shift-reduce/reduce-reduce or shift-shift conflicts.

    Besides that, often in EBNF ‘literal’ is not expressed further. This means that this terminal is not further defined via a non-terminal. I have the feeling you expect ‘literal’ to be a non-terminal here, but it is in fact a terminal. Thus it makes sense it is not listed with ‘tokens’.

    I haven’t read the grammar further, so it might also be that ‘tokens’ is used a lot in other nonterminal definitions, which do not allow a ‘literal’. This then asks for 2 token non-terminals which means more ambiguity. :)

  2. The grammar is straight forward and tight enough, I believe it can be made into an object oriented representation.
    ‘literal’ is not a loose sense of a literal, but rather the exact definition of a lexical item named ‘literal’.
    This is indeed a mistake and, as you can read, has been confirmed to be a mistake.
    Like I said in a previous comment, I lack any formal training in parsers theory, and I’m only doing this for… well… kicks. :)
    That and to see if I can do it. ;)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s