Normal whitespace creates a parse tree like this:


appl(
prod([lex(iter(layout()))], cf(layout()), no-attrs())
, [ appl(
list(lex(iter(layout())))
, [ appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
, appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
, appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)

while a recovered region creates:


appl(
prod([cf(layout())], cf(opt(layout())), no-attrs())
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod(
[cf(layout()), cf(layout())]
, cf(layout())
, attrs([assoc(left())])
)
, [ appl(
prod([lex(layout())], cf(layout()), no-attrs())
, [ appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
]
)
, appl(
prod([lex(layout())], cf(layout()), no-attrs())
, [ appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
]
)
]
)
, appl(
prod([lex(layout())], cf(layout()), no-attrs())
, [ appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
]
)
]
)
, appl(
prod([lex(layout())], cf(layout()), no-attrs())
, [ appl(
prod([lex(sort(“Ws”))], lex(layout()), no-attrs())
, [ appl(
prod(
[char-class([range(9, 10), 13, 32])]
, lex(sort(“Ws”))
, no-attrs()
)
, [32]
)
]
)
]
)
]
)

Trees like this take extra time to construct and use, and can cause stack overflows.

To reproduce, compare the following inputs:


ffffffffffff module f


module f

My guess is that the problem might be related to follow restrictions.

(See also Spoofax/124.)

Submitted by Lennart Kats on 8 June 2010 at 12:52

On 18 September 2010 at 14:38 Lennart Kats commented:

Is it possible that this is related to Spoofax/244?

That is, we currently use this to define WATER:


[A-Za-z0-9] -> WATERTOKENSTART {recover}
WATERTOKENSTART [A-Za-z0-9_]* -> WATERTOKEN

parse tree sizes may get better if we use this:


[A-Za-z0-9]+ -> WATER {recover}


On 18 February 2011 at 16:18 Maartje commented:

Probably this problem is caused by the lookahead. Proposed solution: use a ParseCharacterReader (of zoiets) filled with whitespace in stead of replacing characters one by one.

Log in to post comments