Improve indentation-sensitive language support (1)
I was hoping to make my new language indentation-sensitive like many new languages are these days (coffeescript, python). Spoofax is extremely cool, but it doesn’t quite support that in a friendly manner.
Here’s a ticket to think about what’s the shortest path to support this use case.
The current workaround is to make the line starts into tokens and do post-processing to merge adjacent statements with the same level of indentation into a block. This is OK but it looks like a lot of work and isn’t as flexible.
There are a couple alternative approaches that come to mind.
One is to allow a pre-processing step of some sort that modifies the source code to replace some characters with some other. For example you could do something like:
if x:
if y:
if z: bla
else:
bugbecomes
if x:
{ if y:
print ‘hello’
} if z: bla
else:
{ bug
bla
} boo(the closing brace goes BEFORE the final statement because there might not be enough whitespace afterwards)
Another preprocessing step might be to alter the character set used for newline and then preprocessing to use different newline characters depending on whether the next line has a change in indentation. This keeps the file layout similar but requires changing the line number calculations.
Another approach is to modify the parser so that you can alter or disable the LAYOUT processing between tokens, thus making whitespace significant where you want it to be, and still automatically insignificant elsewhere. This might allow some kind of bracket matching … but I haven’t though this through so I can’t tell immediately if this would actually solve the problem.
The ideal scenario would be to be able to insert some sort of “virtual” tokens into the stream by overriding a tokenizer somewhere.
Any thoughts / hints / suggestions ?
Submitted by Dobes Vandermeer on 6 October 2012 at 06:16
Issue Log
This problem is solved, in theory. At the recent SLE 2012 conference Sebastian Erdweg presented an extension of SDF/JSGLR with layout sensitive parsing:
Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann. Layout-sensitive Generalized Parsing. In Conference on Software Language Engineering (SLE), 2012. To appear. pdf
However, the extension is implemented in a branch that needs to be merged back into the trunk before we can deploy it in Spoofax. Also, it is not yet clear what the interaction between layout-sensitive parsing and error recovery will be.
Great news!
I’ll see if I can figure out how to make it work.
It looks like the merge is a bit of a task …
Is someone tasked with this merge already or should I embark upon it? If I do the merge, is there someone willing to review the results?
I started to merge the layout-sensitive parser implementation back into the JSGLR trunk. I’ll hope to get back to it this week. I’ll comment here once the merge is done.
I reintegrated the changes I made for layout-sensitive parsing into the SGLR svn repo. The merge can be found in branch jsglr-layout-merge.
The merge is currently under review by the Spoofax team. I hope the changes will be integrated into the Spoofax trunk soon, so you can start using layout-sensitive syntax in your Spoofax-based DSLs.
Great, thanks for the effort Sebastian.
Layout sensitive parser is now merged back into the trunk.
Log in to post comments