Inconsistent (un)escaping behavior of generic term (de)construction
Generic term deconstruction and construction have:
- different behavior in the compiler and interpreter
- aren’t symmetric in the compiler
wrt. escaping / un-escaping of special characters
In short, the issue is that:
- the interpreter doesn’t bother with any escaping or un-escaping, while the compiler does.
- the compiler un-escapes
\n\t\b\f\r\\\'\", while it escapes only backslash, double-quote,\nand\r.Whether generic term construction should be exactly reversible using generic term deconstruction is discussable, but at the very least the different behavior between compiler and interpreter is a bug.
Here are my observations of the current behavior:
Generic term deconstruction (as used in explode-aterm)
Compiler
- compiler uses
SSL_get_constructorandSSL_get_argumentsSSL_get_constructorusesenv.setCurrent(factory.makeString(current.toString()));current.toStringinvokesStrategoString.writeAsStringStrategoString.writeAsStringdouble-quotes and escapes backslash, double-quote,\nand\rInterpreter
- interpreter uses
Match.getTermConstructorandMatch.getTermArgumentsMatch.getTermConstructorreturnsenv.getFactory().makeString("\"" + ((IStrategoString)t).stringValue() + "\"");- No escaping!
Generic term construction (as used in implode-aterm)
Compiler
- compiler uses
SSL_mktermSSL_mktermfails if string does not start with a double-quoteSSL_mktermusesenv.setCurrent(env.getFactory().parseFromString(value + "\""));parseFromStringinvokes(?)TAFTermReader.parseStringTAFTermReader.parseStringun-escapes\n\t\b\f\r\\\'\"and throws on\0\1\2\3\4\5\5\6\7\8\9TAFTermReader.parseStringthrows if string does not end with a double-quote (which is probably why the double-quote is added to the end of the string beforeparseFromStringis called)Interpreter
- interpreter uses
Build.doBuildExplodeBuild.doBuildExplodeun-double-quotes and makes a string iff the passed cons starts with a double-quoteBuild.doBuildExplodemakes an appl iff the passed cons does not start with a double-quote- No un-escaping!
Example
And here is a little bit of test code:
// interpreter | compiler
<debug(!"Test generic deconstruction 1: ")> <?#()> “.\t.\n.”; // adds quotes | adds quotes + escapes
<debug(!"Test generic deconstruction 2: ")> <?#()> “".\t.\n."”; // adds quotes | adds quotes + escapes
<debug(!"Test generic construction 1: ")> <!#([])> “.\t.\n.”; // nothing | replaces by ()
<debug(!"Test generic construction 2: ")> <!#([])> “".\t.\n."”; // removes quotes | removes quotes + unescapesA place where this issue manifests itself (at least, I think this issue is the root cause), is in the testing language.
Note the following inconsistency:
// this test succeeds
test << >> // tab char!
parse to Template([Layout(" ")]) // tab char!// this test fails
test << >> // tab char!
parse to Template([Layout(“\t”)])// but this test fails(!)
test [[ <<]]
parse to Template([Newline("
")])// and this test succeeds(!)
test [[ <<Submitted by Tobi Vollebregt on 28 September 2011 at 15:07]]
parse to Template([Newline(“\n”)])