notes/r2.0/jdom ast/ast.html - gerrit/e4/org.eclipse.jdt.core - Git at Google

 <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
 <html>
 <head>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    <meta name="GENERATOR" content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]">
    <title>JDT - Abstract Syntax Trees</title>
 </head>
 <body>

 <h2>
 Abstract Syntax Trees</h2>
 Last revised 12:05 Friday September 14, 2001
 <p>Original work item: "Exposing the AST API."
 <p>Related work item: "Improved Java code manipulation support, address
 JDOM limitations; JDOM doesn't preserve markers and isn't document aware;
 JDOM finer grained update support (e.g. change method name, type name);
 buffer contents is duplicated in Java UI document and needs to be manually
 synchronized."
 <h4>
 Background</h4>

 <ul>
 <li>
 refactoring is a key customer (Dirk B.)</li>

 <li>
 for 2.0: have good support in place so that refactoring can be made open
 API; would need corresponding core apis for core refactoring</li>

 <li>
 current refactoring appoach uses AST visitor which runs post-resolve on
 designated compilation unit</li>

 <li>
 visitor collects info (parent stack) including source positions</li>

 <li>
 uses name environment to validate/locate</li>

 <li>
 all changes are formulated as reified undoable edits in a source buffer</li>

 <li>
 a batch of changes are made at a time</li>

 <li>
 refactoring does not reason hypothetically (changes are undoable)</li>

 <li>
 JDOM is not used by refactoring, but potentially could be (JDOM is used
 inside Java Model)</li>
 </ul>
 Dirk's wish list with regard to the AST support:
 <ul>
 <li>
 consistent positions in AST (e.g. sourceStart and sourceEnd should always
 cover the whole node).</li>

 <li>
 comment handling in AST. Some node contain preceding comments other don't</li>

 <li>
 Parent pointer in AST node (?)</li>

 <li>
 Data pointer in AST node (?).</li>

 <li>
 Expression should provide getType method (cached value of resolveType(Scope)).</li>
 </ul>

 <h4>
 Summary from Sept. 10-11 meetings</h4>
 Dirk travelled to SNZ to discuss refactoring requirements and possible
 solutions with Philippe M. and Jeem.
 <p>Some of the forms of solutions discussed, but ultimately abandoned:
 <ul>
 <li>
 A vanilla DOM.</li>

 <ul>
 <li>
 too limiting: difficult to provide for pre-computing bindings.</li>

 <li>
 clumsy for clients to use without AST node types represented by different
 Java types</li>
 </ul>

 <li>
 AST plus resolves info in form of Java model elements.</li>

 <ul>
 <li>
 Java model methods are unsuitable canonical representation for resolved
 methods because parameter types are as per source.</li>

 <li>
 Would need additional Java model elements to represent variable declarations.</li>

 <li>
 Would need special Java element implementations for local types, methods,
 and fields.</li>
 </ul>
 </ul>
 In the end, we agreed on AST plus bindings:
 <ul>
 <li>
 AST is simple tree of typed nodes as determined by parser.</li>

 <li>
 A different Java type for each AST node type.</li>

 <li>
 TBD: are node types API classes, API interfaces, or both?</li>

 <li>
 Simple case: no bindings.</li>

 <li>
 Basic AST Lifecycle: client requests AST; compilation unit is parsed and
 nodes are created; root is handed back to client; AST is garbage collected
 after client lets go of all AST nodes. AST is not affected if underlying
 compilation unit is changed (i.e., eager parse, with no residual dependence
 on state of file in workspace).</li>

 <li>
 Any syntax errors detected need to be reported to client. Parser should
 try to localize errors, e.g., to a single method declaration or statement.</li>

 <li>
 Predicatable trees with simple correspondence to source code (parser should
 not optimize ASTs).</li>

 <li>
 Reliable source position information for all nodes.</li>

 <li>
 Scratch "data" field on each AST node for client to record an Object.</li>

 <li>
 Navigate AST upwards as well as downwards. Parent link.</li>

 <li>
 AST walker for convenient traversal.</li>

 <li>
 ASTs can be read-only or read-write.</li>

 <li>
 Read-write ASTs can be modified in terms of ASTs. AST node factories (no
 Java parser required). Cut/copy/paste.</li>

 <li>
 Cut/copy/paste between separate ASTs requires fragment cloning to preserve
 independence of storage.</li>

 <li>
 Client may be interested in cut/copy/pasting comments too.</li>

 <li>
 New nodes would not carry source positions.</li>

 <li>
 Read-write AST can be serialized back to compilation unit char[].</li>

 <ul>
 <li>
 Preserve existing comments, whitespace, and use of \u.</li>

 <li>
 Control whitespace and use of \u for insertions.</li>

 <li>
 Provide edit map for updating markers.</li>
 </ul>

 <li>
 Resolved ASTs: basic AST annotated with bindings (non-syntactic information).</li>

 <li>
 Binding information is derived from AST plus the Java model (relative to
 some project).</li>

 <li>
 Availability/validity of bindings ceases when Java model changes.</li>

 <li>
 Client must request up front that bindings be created.</li>

 <li>
 Certain AST nodes get decorated with bindings.</li>

 <li>
 Client should have ways to communicate up front what kind of bindings are
 required and where.</li>

 <li>
 Availability/validity of bindings for a read-write AST ceases when it is
 first modified.</li>

 <li>
 Bindings (non-syntactic information) includes things such as the following:</li>

 <ul>
 <li>
 Resolved names - which type, field, method, or local variable does a AST
 name refernce resolve to.</li>

 <li>
 Resolved types - what is the resolved type of an AST expression or type
 reference.</li>

 <li>
 Resolved supertypes - what are the resolved supertypes of a resolved type.</li>

 <li>
 Resolved declarations - what resolved type, field, method, or local variable
 does an AST declaration map to.</li>

 <li>
 Thrown exceptions - what are the resolved types of the exceptions thrown
 by a given expression or statement.</li>

 <li>
 Resolved members - what are the resolved members of a resolved type.</li>
 </ul>

 <li>
 Problems also should be reported with resolving. Problems are opaque: line
 position plus human-readable message.</li>

 <li>
 Space for bindings storage is significant; increases monotonically as more
 bindings are accessed.</li>

 <li>
 Space for bindings is for lifetime of the AST. Retaining bindinfs</li>

 <li>
 Advanced AST Lifecycle: client requests AST with bindings; compilation
 unit is parsed, nodes created, and perhaps some bindings created and annotated
 on nodes; root is handed back to client; AST is garbage collected after
 client lets go of all AST nodes and bindings. AST itself is not affected
 if underlying compilation unit is changed (i.e., eager parse, with no residual
 dependence on state of file in workspace). Bindings may become stale or
 invalid if workspace changes (i.e., possibly lazy and incremental construction
 of bindings using Java model).</li>

 <li>
 Bindings from two ASTs are not generally comparable.</li>
 </ul>
 AST will either extend or replace JDOM. In the latter case, JDOM would
 be deprecated.
 <p>AST will exist along side Java model.
 <h4>
 AST Node Types - Classes, interface, or both</h4>
 There are on the order of 87 node types for Java ASTs. Bindings will add
 some more. There are a couple of way this can be mapped to Java types.
 <p>(1) Use org.w3c.DOM interfaces as API. Provide a private concrete implementation
 of the DOM.
 <p>Pro: Very small, and standard API for read-write documents.
 <br>Con: API is not convenient for clients.
 <br>Con: API is not amenable to exposing bindings and other non-structural
 information.
 <p>(2) Concrete API class per AST node type.
 <p>Pro: API as small as possible.
 <br>Pro: Client can create nodes directly.
 <p>Con: Cannot easily hide implementation details; e.g. representation
 and mechanism for reassembling compilation unit text after editing; lazy
 binding creation.
 <p>Clients who create node from scratch only ever need basic constructors
 (the nodes they create do not have source positions, bindings, or other
 decorations). On the other hand, the parser needs to remember more info
 including fine-grained source positions.
 <p>(3) API interface per AST node type, along with node factory methods.
 Provide a private concrete implementation. Allow clients to reimplement
 node types (from scratch) and supply a factory.
 <p>Like JDOM (except JDOM does not permit client to reimplement) and org.w3c.DOM.
 <p>Pro: API as small as possible.
 <br>Pro: Easy to tailor different kinds of representation: read-write vs.
 read-only ASTs; raw ASTs vs. AST+bindings.
 <p>Con:&nbsp; Hidden concrete implementation classes takes more space.
 <br>Con: Using factory methods is a bit less direct than using constructors.
 <p>We should go for this last option and provide API interfaces with hidden
 implementation classes.
 <br>&nbsp;
 </body>
 </html>
	<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
	<meta name="GENERATOR" content="Mozilla/4.75 [en] (Windows NT 5.0; U) [Netscape]">
	<title>JDT - Abstract Syntax Trees</title>
	</head>
	<body>

	<h2>
	Abstract Syntax Trees</h2>
	Last revised 12:05 Friday September 14, 2001
	<p>Original work item: "Exposing the AST API."
	<p>Related work item: "Improved Java code manipulation support, address
	JDOM limitations; JDOM doesn't preserve markers and isn't document aware;
	JDOM finer grained update support (e.g. change method name, type name);
	buffer contents is duplicated in Java UI document and needs to be manually
	synchronized."
	<h4>
	Background</h4>

	<ul>
	<li>
	refactoring is a key customer (Dirk B.)</li>

	<li>
	for 2.0: have good support in place so that refactoring can be made open
	API; would need corresponding core apis for core refactoring</li>

	<li>
	current refactoring appoach uses AST visitor which runs post-resolve on
	designated compilation unit</li>

	<li>
	visitor collects info (parent stack) including source positions</li>

	<li>
	uses name environment to validate/locate</li>

	<li>
	all changes are formulated as reified undoable edits in a source buffer</li>

	<li>
	a batch of changes are made at a time</li>

	<li>
	refactoring does not reason hypothetically (changes are undoable)</li>

	<li>
	JDOM is not used by refactoring, but potentially could be (JDOM is used
	inside Java Model)</li>
	</ul>
	Dirk's wish list with regard to the AST support:
	<ul>
	<li>
	consistent positions in AST (e.g. sourceStart and sourceEnd should always
	cover the whole node).</li>

	<li>
	comment handling in AST. Some node contain preceding comments other don't</li>

	<li>
	Parent pointer in AST node (?)</li>

	<li>
	Data pointer in AST node (?).</li>

	<li>
	Expression should provide getType method (cached value of resolveType(Scope)).</li>
	</ul>

	<h4>
	Summary from Sept. 10-11 meetings</h4>
	Dirk travelled to SNZ to discuss refactoring requirements and possible
	solutions with Philippe M. and Jeem.
	<p>Some of the forms of solutions discussed, but ultimately abandoned:
	<ul>
	<li>
	A vanilla DOM.</li>

	<ul>
	<li>
	too limiting: difficult to provide for pre-computing bindings.</li>

	<li>
	clumsy for clients to use without AST node types represented by different
	Java types</li>
	</ul>

	<li>
	AST plus resolves info in form of Java model elements.</li>

	<ul>
	<li>
	Java model methods are unsuitable canonical representation for resolved
	methods because parameter types are as per source.</li>

	<li>
	Would need additional Java model elements to represent variable declarations.</li>

	<li>
	Would need special Java element implementations for local types, methods,
	and fields.</li>
	</ul>
	</ul>
	In the end, we agreed on AST plus bindings:
	<ul>
	<li>
	AST is simple tree of typed nodes as determined by parser.</li>

	<li>
	A different Java type for each AST node type.</li>

	<li>
	TBD: are node types API classes, API interfaces, or both?</li>

	<li>
	Simple case: no bindings.</li>

	<li>
	Basic AST Lifecycle: client requests AST; compilation unit is parsed and
	nodes are created; root is handed back to client; AST is garbage collected
	after client lets go of all AST nodes. AST is not affected if underlying
	compilation unit is changed (i.e., eager parse, with no residual dependence
	on state of file in workspace).</li>

	<li>
	Any syntax errors detected need to be reported to client. Parser should
	try to localize errors, e.g., to a single method declaration or statement.</li>

	<li>
	Predicatable trees with simple correspondence to source code (parser should
	not optimize ASTs).</li>

	<li>
	Reliable source position information for all nodes.</li>

	<li>
	Scratch "data" field on each AST node for client to record an Object.</li>

	<li>
	Navigate AST upwards as well as downwards. Parent link.</li>

	<li>
	AST walker for convenient traversal.</li>

	<li>
	ASTs can be read-only or read-write.</li>

	<li>
	Read-write ASTs can be modified in terms of ASTs. AST node factories (no
	Java parser required). Cut/copy/paste.</li>

	<li>
	Cut/copy/paste between separate ASTs requires fragment cloning to preserve
	independence of storage.</li>

	<li>
	Client may be interested in cut/copy/pasting comments too.</li>

	<li>
	New nodes would not carry source positions.</li>

	<li>
	Read-write AST can be serialized back to compilation unit char[].</li>

	<ul>
	<li>
	Preserve existing comments, whitespace, and use of \u.</li>

	<li>
	Control whitespace and use of \u for insertions.</li>

	<li>
	Provide edit map for updating markers.</li>
	</ul>

	<li>
	Resolved ASTs: basic AST annotated with bindings (non-syntactic information).</li>

	<li>
	Binding information is derived from AST plus the Java model (relative to
	some project).</li>

	<li>
	Availability/validity of bindings ceases when Java model changes.</li>

	<li>
	Client must request up front that bindings be created.</li>

	<li>
	Certain AST nodes get decorated with bindings.</li>

	<li>
	Client should have ways to communicate up front what kind of bindings are
	required and where.</li>

	<li>
	Availability/validity of bindings for a read-write AST ceases when it is
	first modified.</li>

	<li>
	Bindings (non-syntactic information) includes things such as the following:</li>

	<ul>
	<li>
	Resolved names - which type, field, method, or local variable does a AST
	name refernce resolve to.</li>

	<li>
	Resolved types - what is the resolved type of an AST expression or type
	reference.</li>

	<li>
	Resolved supertypes - what are the resolved supertypes of a resolved type.</li>

	<li>
	Resolved declarations - what resolved type, field, method, or local variable
	does an AST declaration map to.</li>

	<li>
	Thrown exceptions - what are the resolved types of the exceptions thrown
	by a given expression or statement.</li>

	<li>
	Resolved members - what are the resolved members of a resolved type.</li>
	</ul>

	<li>
	Problems also should be reported with resolving. Problems are opaque: line
	position plus human-readable message.</li>

	<li>
	Space for bindings storage is significant; increases monotonically as more
	bindings are accessed.</li>

	<li>
	Space for bindings is for lifetime of the AST. Retaining bindinfs</li>

	<li>
	Advanced AST Lifecycle: client requests AST with bindings; compilation
	unit is parsed, nodes created, and perhaps some bindings created and annotated
	on nodes; root is handed back to client; AST is garbage collected after
	client lets go of all AST nodes and bindings. AST itself is not affected
	if underlying compilation unit is changed (i.e., eager parse, with no residual
	dependence on state of file in workspace). Bindings may become stale or
	invalid if workspace changes (i.e., possibly lazy and incremental construction
	of bindings using Java model).</li>

	<li>
	Bindings from two ASTs are not generally comparable.</li>
	</ul>
	AST will either extend or replace JDOM. In the latter case, JDOM would
	be deprecated.
	<p>AST will exist along side Java model.
	<h4>
	AST Node Types - Classes, interface, or both</h4>
	There are on the order of 87 node types for Java ASTs. Bindings will add
	some more. There are a couple of way this can be mapped to Java types.
	<p>(1) Use org.w3c.DOM interfaces as API. Provide a private concrete implementation
	of the DOM.
	<p>Pro: Very small, and standard API for read-write documents.
	<br>Con: API is not convenient for clients.
	<br>Con: API is not amenable to exposing bindings and other non-structural
	information.
	<p>(2) Concrete API class per AST node type.
	<p>Pro: API as small as possible.
	<br>Pro: Client can create nodes directly.
	<p>Con: Cannot easily hide implementation details; e.g. representation
	and mechanism for reassembling compilation unit text after editing; lazy
	binding creation.
	<p>Clients who create node from scratch only ever need basic constructors
	(the nodes they create do not have source positions, bindings, or other
	decorations). On the other hand, the parser needs to remember more info
	including fine-grained source positions.
	<p>(3) API interface per AST node type, along with node factory methods.
	Provide a private concrete implementation. Allow clients to reimplement
	node types (from scratch) and supply a factory.
	<p>Like JDOM (except JDOM does not permit client to reimplement) and org.w3c.DOM.
	<p>Pro: API as small as possible.
	<br>Pro: Easy to tailor different kinds of representation: read-write vs.
	read-only ASTs; raw ASTs vs. AST+bindings.
	<p>Con:  Hidden concrete implementation classes takes more space.
	<br>Con: Using factory methods is a bit less direct than using constructors.
	<p>We should go for this last option and provide API interfaces with hidden
	implementation classes.
	<br>
	</body>
	</html>