Inconsistent (un)escaping behavior of generic term (de)construction
Generic term deconstruction and construction have:
- different behavior in the compiler and interpreter
- aren’t symmetric in the compiler
wrt. escaping / un-escaping of special characters
In short, the issue is that:
- the interpreter doesn’t bother with any escaping or un-escaping, while the compiler does.
- the compiler un-escapes
\n\t\b\f\r\\\'\"
, while it escapes only backslash, double-quote,\n
and\r
.Whether generic term construction should be exactly reversible using generic term deconstruction is discussable, but at the very least the different behavior between compiler and interpreter is a bug.
Here are my observations of the current behavior:
Generic term deconstruction (as used in explode-aterm)
Compiler
- compiler uses
SSL_get_constructor
andSSL_get_arguments
SSL_get_constructor
usesenv.setCurrent(factory.makeString(current.toString()));
current.toString
invokesStrategoString.writeAsString
StrategoString.writeAsString
double-quotes and escapes backslash, double-quote,\n
and\r
Interpreter
- interpreter uses
Match.getTermConstructor
andMatch.getTermArguments
Match.getTermConstructor
returnsenv.getFactory().makeString("\"" + ((IStrategoString)t).stringValue() + "\"");
- No escaping!
Generic term construction (as used in implode-aterm)
Compiler
- compiler uses
SSL_mkterm
SSL_mkterm
fails if string does not start with a double-quoteSSL_mkterm
usesenv.setCurrent(env.getFactory().parseFromString(value + "\""));
parseFromString
invokes(?)TAFTermReader.parseString
TAFTermReader.parseString
un-escapes\n\t\b\f\r\\\'\"
and throws on\0\1\2\3\4\5\5\6\7\8\9
TAFTermReader.parseString
throws if string does not end with a double-quote (which is probably why the double-quote is added to the end of the string beforeparseFromString
is called)Interpreter
- interpreter uses
Build.doBuildExplode
Build.doBuildExplode
un-double-quotes and makes a string iff the passed cons starts with a double-quoteBuild.doBuildExplode
makes an appl iff the passed cons does not start with a double-quote- No un-escaping!
Example
And here is a little bit of test code:
// interpreter | compiler
<debug(!"Test generic deconstruction 1: ")> <?#()> “.\t.\n.”; // adds quotes | adds quotes + escapes
<debug(!"Test generic deconstruction 2: ")> <?#()> “".\t.\n."”; // adds quotes | adds quotes + escapes
<debug(!"Test generic construction 1: ")> <!#([])> “.\t.\n.”; // nothing | replaces by ()
<debug(!"Test generic construction 2: ")> <!#([])> “".\t.\n."”; // removes quotes | removes quotes + unescapesA place where this issue manifests itself (at least, I think this issue is the root cause), is in the testing language.
Note the following inconsistency:
// this test succeeds
test << >> // tab char!
parse to Template([Layout(" ")]) // tab char!// this test fails
test << >> // tab char!
parse to Template([Layout(“\t”)])// but this test fails(!)
test [[ <<]]
parse to Template([Newline("
")])// and this test succeeds(!)
test [[ <<Submitted by Tobi Vollebregt on 28 September 2011 at 15:07]]
parse to Template([Newline(“\n”)])