| <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> |
| <html> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <meta name="GENERATOR" content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]"> |
| </head> |
| <body> |
| Hi, |
| <p>Here are my thoughts regarding positions and AST. The major goal was |
| to come up with a consistent view, even though this causes some more work |
| for refactoring and for the implementation of the new AST. But I think |
| it might be worth if we can come up with a consistent story for positions.</p> |
| <h3> |
| Some general statements</h3> |
| |
| <ol> |
| <li> |
| sourceStart and sourceEnd should always cover the whole node. This is different |
| to the current implementation where for some nodes declarationSourceStart |
| and declararionSourcEnd covers the whole node and sourceStart and sourceEnd |
| only covers the name (examples are: LocalVariableDeclaration, TypeDeclararion, |
| ...)</li> |
| |
| <li> |
| sourceStart and sourceEnd should also cover all subnodes</li> |
| |
| <li> |
| whenever possible we should follow the grammar as defined in The Java Language |
| Specification book. So if the grammar says that a production includes the |
| semicolon then the AST node should include it too. For example the grammar |
| defines a return statement like</li> |
| |
| <br> return (expression) ; |
| <br>So the corresponding AST node should include the ;</ol> |
| |
| <h3> |
| Some statements from earlier discussions (mainly between Jim, Philippe, |
| and me)</h3> |
| |
| <ul> |
| <li> |
| There will be an ExpressionStatement node for expressions used as statements. |
| For example "if (isChecked()) {}" versus "isCheck();". We agreed that the |
| expression will not include the semicolon and the ExpressionStatement will. |
| For the isCheck() example this will look like [[isChecked()];]. This is |
| consistent with the grammer defined in (3). This together with the general |
| statement (2) leads to the conclusion that statements that have child statements |
| will include the semicolon if the child statement has one. For the example</li> |
| |
| <p><br>for (int i= 0; i < 10; i++) |
| <br> foo();</p> |
| <p>sourceEnd of the for statement will include the semicolon of the expression |
| statement.</p></ul> |
| |
| <h3> |
| Open issues</h3> |
| |
| <h4> |
| Multiple local declarations</h4> |
| <p>Currently multiple local declarations appear in the AST as n separate local |
| declarations without any relationship to each others. This raises various |
| questions:</p> |
| <ul> |
| <li> |
| what are the positions of those local declarations</li> |
| |
| <li> |
| how is a visitor of that AST able to figure out that he deals with multiple |
| local declaration.</li> |
| </ul> |
| <p>Since the new AST isn't a 1:1 mapping of the compiler's AST anyway (we |
| have the ExpressionStatement node) I opt to introduce new nodes as defined |
| in the grammar. Since the semicolon doesn't belong to the variable declaration, |
| it should be managed by the parent node that ties together multiple declarations. |
| Here is an example:</p> |
| <p>int x= 10, x[]= null, i;</p> |
| <p>LocalVariableDeclaration node manages: |
| <br> the type (e.g. int) |
| <br> the positions of the commas (if needed) |
| <br> the actual variable declarators |
| <br> sourceStart= start of the type |
| <br> sourceEnd= ;</p> |
| <p>VariableDeclarator node manages: |
| <br> the variable name and its positions |
| <br> the initialization |
| <br> sourceStart= start of variable name |
| <br> sourceEnd= end of initialization. Doesn't include |
| the comma.</p> |
| <p>If we want to do some optimization we could also have a node SingleLocalVariableDeclaration |
| for declaration like int x; or int y= 10; The node would have the following |
| fields: |
| <br> the type |
| <br> the variable name and its positions |
| <br> the initialization |
| <br> sourceStart= start of type |
| <br> sourceEnd= ;</p> |
| <h4> |
| Updates in for statements</h4> |
| <p>Analogous to the local variable declaration, the comma to separate the |
| update expressions can not be part of the expression (expressions don't |
| contain a semicolon so they can't contain a comma either). To know the |
| positions of the commas the for statement should manage them in a separate |
| array.</p> |
| <p><i>The general rule is, that whenever language elements are separate |
| using a comma (for example an interface list in the implements statement, |
| arguments of a method declaration, ...) the node containing the separated |
| nodes should manage the positions of the comma, if they are of any interest. |
| In a first implementation we could leave these positions out and use the |
| scanner to find them if they are of interest.</i></p> |
| <h4> |
| Treatment of semicolon</h4> |
| <p>From our experiences with refactoring it is helpful in some cases to know |
| where the position of the semicolon is. For example if the user extract |
| a for statement and he doesn't select the action's semicolon we allow the |
| extraction. So what can we do in these cases:</p> |
| <ul> |
| <li> |
| simple don't allow the case. To support better selection we can offer some |
| actions to extend the text selection to spawn valid AST nodes. We have |
| a running prototype for this.</li> |
| |
| <li> |
| do some parsing of the source code to find the position of the semicolon. |
| We could use the scanner for this.</li> |
| </ul> |
| |
| <h4> |
| Answers to explicit questions from Olivier</h4> |
| |
| <ul> |
| <li> |
| for (;;); : in this case the for statement should cover the semicolon. |
| The best way to achieve this is to have an empty statement as defined in |
| the grammar.</li> |
| |
| <li> |
| declaration source start of an argument: yes, Adam uses argument.type.sourceStart |
| as the start not declarationSourceStart.</li> |
| |
| <li> |
| test: the test we have are the refactoring test. We don't have special |
| test to check if the AST positions are correct.</li> |
| </ul> |
| |
| <br> |
| <br> |
| </body> |
| </html> |