The way that “greedy” rules are defined currently is a bit more tricky and error-prone than it has to be, I think. So, here’s a little thread to capture the status of making it easier.

Probably the most obvious improvement would be to improve the locality of the restrictions needed to do greedy matching. Consider:

lexical syntax
    [a-zA-Z\_][a-zA-Z0-9\_]* -> ID

lexical restrictions
    ID            -/- [a-zA-Z0-9\_]

Could these be combined into one line/rule? Like:

lexical syntax
    [a-zA-Z\_][a-zA-Z0-9\_]*                  -> ID{greedy} %% Special keyword
    [a-zA-Z\_][a-zA-Z0-9\_]*(?![a-zA-Z0-9\_]) -> ID         %% Negative lookahead

The keyword “greedy” here would imply a restriction that the token would fail to match if matching one more character would also have matched. Negative lookahead combines the lexical restriction into the same regular expression.

The other cases that require lexical restrictions (in my grammar) to ensure greedy matches are keywords and operators. The problem there is that you can’t infer from a string like “if” or “*” what class of characters it was drawn from, so the “greedy” keyword wouldn’t cut it. Negative lookahead would work, however. This would make the uses of the keywords pretty wordy, however - the current approach of putting them all in a big restriction rule is pretty good except it’s a pain to keep that list up-to-date.

Anyway, I hope I’m capturing something useful or intelligent here …

Submitted by Dobes Vandermeer on 7 December 2012 at 01:34

On 8 January 2013 at 13:14 Eelco Visser tagged sdf

On 8 January 2013 at 13:15 Eelco Visser commented:

Yes, this is a good idea. Will be on the agenda of the SDF project.

On 10 March 2013 at 09:12 Guido Wachsmuth commented:

I do not get the part with the negative look ahead. For the greedy part, the current nightly supports longest-match from Sebastian Erdweg’s work on layout-sensitive parsing.

On 10 March 2013 at 09:12 Guido Wachsmuth closed this issue.

Log in to post comments