Add support for unicode characters in lexicals and comments (1)
The parse table format doesn’t quite support it, but we can map unicode letters, unicode numbers, and other unicode characters that are not in ASCII to the
\255
,\254
, and\253
characters.That way, identifiers with unicode letters and numbers can be specified as:
[a-zA-Z\255][a-zA-Z0-9_\255\254]* -> IDwhile line comments and string literals work as before but now also support non-ASCII characters:
Submitted by Lennart Kats on 9 February 2011 at 11:16
“"” StringChar* “"” -> STRING
~["\n] -> StringChar
“//” ~[\n\r]* ([\n\r] | EOF) -> LAYOUT
Issue Log
To be included with 0.6.1.
Where is it written in the documentation?
Has this been implemented? How does it work?
Ideally to support parsing a language like Scala you would need support for unicode character classes, I think. Not sure if this workaround would help with that.
On re-reading the description I now think that it works like this:
\255 matches any single unicode letter
\254 matches any single unicode number
\253 matches any other single unicode non-ASCII character
Log in to post comments