#434 javadoc comments vs normal comments generate ambiguities and parse errors (project SpoofaxLegacy on YellowGrass.org)

Hi, I'm trying to integrate in my language some Javadoc style comments, but I get some weird behaviors.
I've narrowed down a simple grammar that shows the glitch:
context-free syntax
	Module*					-> Start {cons("ModuleList")}
	"start" Sentence* "end"	-> Module {cons("Module")}
	LATEX					-> Sentence {cons("Latex")}

lexical syntax
	[\ \t\n\r] -> LAYOUT
	[\*]								-> CommentChar
	-> EOF
	"/*"  (~[\*] | CommentChar)* "*/"	-> LAYOUT
	"/**" (~[\*] | CommentChar)* "*/"	-> LATEX
	"//"  ~[\n\r]* ([\n\r] | EOF)		-> LAYOUT
	"//*" ~[\n\r]* ([\n\r] | EOF)		-> LATEX

lexical restrictions
	CommentChar   -/- [\/]
	EOF  -/- ~[]
	"/*" -/- [\*]
	"//" -/- [\*]

context-free restrictions
	LAYOUT? -/- [\ \t\n\r]
	LAYOUT? -/- [\/].[\/].~[\*]
	LAYOUT? -/- [\/].[\*].~[\*]

This is a sample text:
start
  /* 1 */
end
start
  /** 2 */
end
start
  // 3
end
start
  //* 4
end

And the AST for this:
ModuleList(
  [ amb([Module([]), Module([])])
  , Module([Latex("/** 2 */")])
  , Module([])
  , Module([])
  ]
)

The basic effect is that a comment of type 1 generates an ambiguity (for each use another one),
although it shouldn't because I have follow restrictions for LAYOUT.
Comments of type 2 and 3 are correct.
Comments of type 4 are even more surprising, as they generate a parse error.

I hope I'm doing this correctly. There may be a special way of dealing with this kind of comments.

Submitted by Radu Mereuta on 4 October 2011 at 22:15

errorneedsinfo

On 5 October 2011 at 00:55 Vlad Vergu commented:

Hi Radu,

Would this work?

context-free syntax
Module* -> Start {cons(“ModuleList”)}
%% “start” Sentence* “end” -> Module {cons(“Module”)}
LATEX -> Sentence {cons(“Latex”)}

context-free priorities
{
“start” “end” -> Module {cons(“Module”)}
}>
{
“start” Sentence+ “end” -> Module {cons(“Module”)}
}
lexical syntax
    [\ \t\n\r] -> LAYOUT
    [\*]                                -> CommentChar
    %% "*" -> CommentChar
    
    -> EOF
    "/*"  (~[\*] | CommentChar)* "*/"   -> LAY
    "/**" (~[\*] | CommentChar)* "*/"   -> LATEX
    "//"  ~[\n\r]* ([\n\r] | EOF)       -> LAY
    "//*" ~[\n\r]*       		-> LATEX

    LAY -> LAYOUT

lexical restrictions
    CommentChar   -/- [\/]
    EOF  -/- ~[]
    "/*" -/- [\*]
    "//" -/- [\*]
    LAY -/- [\/].[\/].~[\*]
    LAY -/- [\/].[\*].~[\*]
    
context-free restrictions
    LAYOUT? -/- [\ \t\n\r]
    %% LAYOUT? -/- [\/].[\/].~[\*]
    %% LAYOUT? -/- [\/].[\*].~[\*]

On 5 October 2011 at 11:47 Lennart Kats tagged needsinfo

On 6 October 2011 at 09:48 Maartje commented:

Another solution could be to use the build in stratego primitives to extract the comment and then
inspect (or parse) the returned strings to extract javadoc info.

//Returns succeeding comments that attach to this node (heuristically determined)
origin-comments-after = origin-support-sublist(prim(“SSL_EXT_origin_comments_after”, ))

//Returns preceding comments that attach to this node (heuristically determined)
origin-comments-before = origin-support-sublist(prim(“SSL_EXT_origin_comments_before”, ))

//Extracts all block comments (see regex in MyLang-Syntax.esv)
//between the previous sibling and the current node,
//and all line comments between the current node and the next sibling (see lib/editor-commom.generated)
origin-surrounding-comments = prim(“SSL_EXT_origin_surrounding_comments”, “My-Lang”, )

On 6 October 2011 at 13:42 Radu Mereuta commented:

@Vlad. Thank you for that, interesting idea, but in my modules I also have other sentences. I tried to disambiguate with stratego. I managed that, but the performance became unacceptable (for a 800 lines of code file it jumped from half a second, to almost 10). The number of ambiguities generated is just too big.

@Maartje. Looks interesting and simple. How can I use those functions? Where can I find them? What should I import? And is this a feature only for eclipse? Because I want to create a runnable jar.

On 6 October 2011 at 14:37 Maartje commented:

1. How can I use those functions?

origin-comments-before = origin-support-sublist(prim(“SSL_EXT_origin_comments_before”, )) //adds primitive strategy to project

node-in-ast //returns preceding comment string associated to the node-in-ast

Where can I find them?

origin-comments-before / origin-comments-after:
https://svn.strategoxt.org/repos/StrategoXT/spoofax/trunk/spoofax/org.spoofax.interpreter.library.jsglr/
src/org/spoofax/interpreter/library/jsglr/origi/OriginCommentsBeforePrimitive.java

origin-surrounding-comments:
https://svn.strategoxt.org/repos/StrategoXT/spoofax-imp/trunk/org.strategoxt.imp.runtime/src/org/strategoxt/imp/runtime/stratego/OriginSurroundingCommentsPrimitive.java

What should I import? nothing special

Is this a feature only for eclipse?
origin-comments-before / origin-comments-after do not use any special Eclipse thing,
origin-surrounding comments currently uses Syntax.esv file to find the expressions for blockcomments and line comments.

On 6 October 2011 at 14:42 Radu Mereuta commented:

Ah, sorry Maartje for the previous message, I found on the svn the missing functions and just copy pasted them into my code and they work. But not really the way I’m expecting.
Here is another simple grammar:
context-free syntax
    Module*                 -> Start {cons("ModuleList")}
    "start" Sentence* "end" -> Module {cons("Module")}
    "rule"					-> Sentence {cons("Rule")}

lexical syntax
    [\ \t\n\r]	-> LAYOUT
    [\*]                            -> CommentChar
    -> EOF
    "/*"  (~[\*] | CommentChar)* "*/"   -> LAYOUT
    "//"  ~[\n\r]* ([\n\r] | EOF)       -> LAYOUT

lexical restrictions
    CommentChar   -/- [\/]
    EOF  -/- ~[]

context-free restrictions
    LAYOUT? -/- [\ \t\n\r]
    LAYOUT? -/- [\/].[\/]
    LAYOUT? -/- [\/].[\*]

also a sample input:

//0
start
  /* 1 */
  rule
  //2
  rule
  //3
end
//4
I’ve tried a topdown traversal printing the before and after terms for every term. I could only catch #1 and #2 with the origin-comments-before function.
I also tried with origin-surrounding-comments which misses #0.

On 6 October 2011 at 15:32 Maartje commented:

The comment-after, comment-before functions were implemented to support layout/comment preservation for textual transformations. Some heuristics are used to determine if a comment is associated to the preceding or succeeding node, or to a sublist of nodes, or to no node at all (for example an outcommented function or statement). The heuristics that are used are described in http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2011-027.pdf.

For java doc it may make sense to implement an all-comments-before function and then filter the javadoc comment later using stratego.

Log in to post comments

javadoc comments vs normal comments generate ambiguities and parse errors

Issue Log

Ah, sorry Maartje for the previous message, I found on the svn the missing functions and just copy pasted them into my code and they work. But not really the way I’m expecting.
Here is another simple grammar:

javadoc comments vs normal comments generate ambiguities and parse errors

Issue Log

Ah, sorry Maartje for the previous message, I found on the svn the missing functions and just copy pasted them into my code and they work. But not really the way I’m expecting.Here is another simple grammar:

Ah, sorry Maartje for the previous message, I found on the svn the missing functions and just copy pasted them into my code and they work. But not really the way I’m expecting.
Here is another simple grammar: