In SDF3, you can influence the tokenization of templates with an option:

templates Statement.Print = <System.out.println(<Exp>);>

template options tokenize : ".()"

This gives the following SDF2 rule

"System" "." "out" "." "println" "(" Exp ")" ";" -> Statement {cons("Print")}

However, when you split the option into multiple options, only one of them is considered.

template options 
  tokenize : "."
  tokenize : "("
  tokenize : ")"

Also, using a string there is confusing. I first tried `tokenize : “System.out.println”. Maybe a character class or single characters would be more appropriate:

template options tokenize : [.\(\)]

template options 
  tokenize : '.'
  tokenize : '('
  tokenize : ')'
Submitted by Guido Wachsmuth on 21 September 2013 at 09:52

On 21 September 2013 at 10:02 Guido Wachsmuth tagged @guwac

On 21 September 2013 at 10:02 Guido Wachsmuth tagged improvement

On 21 September 2013 at 10:02 Guido Wachsmuth tagged sdf

On 28 September 2013 at 15:41 Jeff Smits commented:

Correct me if I’m wrong here: I think this tokenize option exists to generate grammar rules that allow layout in places where the template didn’t have any. This gives the ability to write templates that don’t contain layout where you wouldn’t recommend it, but do want to allow it. Is that the (only) intended use for tokenize?
If this is the case then I’d like to point out that this cannot always be done with the current tokenize functionality. Take the example from http://metaborg.org/wiki/sdf:

template options
  tokenize : "()"
templates
  Exp.Call = <<ID>();>

This will generate a rule:

ID "()" ";" -> Exp {cons(Call)}

If you want to allow layout between the brackets, that will require whitespace in the template (Exp.Call = <<ID>( );>).

Perhaps the most flexible option would be to allow multiple strings. This would certainly solve the above problem. If you want "()" in the generated grammar rule, you specify tokenize: "()", but if you want to allow layout between the brackets, you specify tokenize: "(" ")".


On 28 September 2013 at 15:59 Guido Wachsmuth commented:

If you want to allow layout between the parentheses, you can use tokenize: "(;". tokenize: ")(" should also work. But the fact that the string is not interpreted as a set of characters which separate tokens is confusing here.


On 28 September 2013 at 17:00 Jeff Smits commented:

tokenize: "(;", tokenize: ")(" does not. Any string that contains one of the brackets but not the other gives the effect I was looking for. Indeed rather confusing…

Log in to post comments