Problem

When creating a new project, the syntax definition in Common.sdf3 looks like:

  STRING         = "\"" StringChar* "\"" 
  StringChar     = ~[\"\n] 
  StringChar     = "\\\"" 
  StringChar     = BackSlashChar 
  BackSlashChar  = "\\"

As a new Spoofax user I’m very eager to use this definitions for strings, so in my Lang.sdf3 I write:

Start.StringLit = STRING

Looks like nothing is wrong. Indeed, many of our languages use this default definition for strings. However, the following string is now ambiguous:

"\"//a"

because this can be parsed as the string \"//a or the string \ followed by a comment //a". Note that not every language is affected, because the program should still be valid under the second interpretation (the comment). Since the comment reaches until the end of the line, this may invalidate certain parses, making the first interpretation the only valid one.

Suggested Solution

Even though this is an edge case (you need a string ending with a backslash, and the context of the string should allow an ambiguous parse when the rest of the line is a comment), we should try to avoid ambiguous grammars. To prevent new Spoofax users from accidentally creating ambiguous grammars, and to prevent ourselves from creating even more ambiguous grammars, I suggest changing above to:

  STRING         = "\"" StringChar* "\""
  StringChar     = ~[\"\n]]]
  StringChar     = "\\\""
  StringChar     = "\\\\"
  StringChar     = BackSlashChar
  BackSlashChar  = "\\"

this slightly changes the language, because now "\" is no longer a valid string, but I think this is for the better because this is common in most programming languages.

In light of the other definitions in Common.sdf3 I think this has always been the intention. Someone went through the trouble of defining the rule BackSlashChar = "\\" and a follow restriction for BackSlashChar, but since StringChar = ~[\"\n] allows a backslash the definition of BackSlashChar and its follow restriction are useless right now.

Submitted by Martijn on 28 November 2017 at 17:55

On 29 November 2017 at 11:01 Jeff Smits commented:

Yeah, sounds like this was just a bug. I’d be all for changing it, not only in the template but also in all the language that we have that use this definition.


On 29 November 2017 at 11:01 Jeff Smits tagged !jeffsmits

On 29 November 2017 at 11:02 Jeff Smits removed tag !jeffsmits

Log in to post comments