blob: 114ab6862dad15617a2d184766e73caf8cc061a6 [file] [log] [blame]
% Parse Tree Analysis
\textit{NOTE. This section assumes that you are familiar with the Visitor
pattern\footnote{See Gamma, Helm, Johnson, and Vlissides, \textit{Design
Patterns}} and are familiar with symbol tables\footnote{See Aho, Sethi, and
Ullman, \textit{Compilers: Principles, Techniques, and Tools}}.}
\section{Visitors}
When you want to retrieve structural information about a program, typically
it will be done using a Visitor on the parse tree.
There are two types of Visitors available in Photran.
\begin{itemize}
\item A \texttt{ParseTreeVisitor} has a ``callback'' method for each nonterminal
in the grammar. The tree is traversed in preorder, and as each node is visited,
a call is made to the Visitor method corresponding to the nonterminal for that node.
This is usually the preferred Visitor to use.
\item A \texttt{GenericParseTreeVisitor} only distinguishes between
\texttt{ParseTreeNode}s and \texttt{Token}s. This Visitor should only be used
when all internal nodes are treated similarly. If you need to distinguish between
the types of internal nodes, use a \texttt{ParseTreeVisitor} instead.
\end{itemize}
(\texttt{ParseTreeVisitor} is generated by the parser generator and should not
be modified.)
In addition to the visit methods, both types of Visitors have a
\texttt{preparingToVisitChildrenOf} method, which is called after a node has
been visited but before its children have, and a \texttt{doneVisitingChildrenOf}
method, which is called immediately after its children have been visited but
before any sibling nodes have.
\subsection{Creating and Using a Visitor}
Create a visitor by subclassing \texttt{ParseTreeVisitor} or
\texttt{GenericParseTreeVisitor}. By default, all of the visit methods do
nothing, so you only have to override the methods that are useful to you.
Given the root of the parse tree (i.e., the \texttt{ParseTreeNode} returned
by \texttt{FortranProcessor\#parse}), use the \texttt{visitUsing} method
to traverse the parse tree; pass your visitor as the only argument.
\subsection{Sample Traversals}
As an example, consider again the ``do nothing'' program from Figure \ref{parsetree}.
When it is visited using a \texttt{GenericParseTreeVisitor}, the Visitor
methods are called in this order (methods called between \texttt{preparingToVisitChildrenOf}
and \texttt{doneVisitingChildrenOf} are indented to illustrate that child nodes
are being visited):
\begin{verbatim}
visitParseTreeNode(<xExecutableProgram>)
preparingToVisitChildrenOf(<xExecutableProgram>)
visitParseTreeNode(<xProgramUnit>)
preparingToVisitChildrenOf(<xProgramUnit>)
visitParseTreeNode(<xMainProgram>)
preparingToVisitChildrenOf(<xMainProgram>)
visitParseTreeNode(<xProgramStmt>)
preparingToVisitChildrenOf(<xProgramStmt>)
visitToken(T_PROGRAM: "program")
visitParseTreeNode(<xProgramName>)
preparingToVisitChildrenOf(<xProgramName>)
visitToken(T_IDENT: "Hello")
doneVisitingChildrenOf(<xProgramName>)
visitToken(T_EOS: (end of line))
doneVisitingChildrenOf(<xProgramStmt>)
visitParseTreeNode(<xMainRange>)
preparingToVisitChildrenOf(<xMainRange>)
visitParseTreeNode(<xEndProgramStmt>)
preparingToVisitChildrenOf(<xEndProgramStmt>)
visitToken(T_END: "end")
visitToken(T_PROGRAM: "program")
visitToken(T_EOS: (end of line))
doneVisitingChildrenOf(<xEndProgramStmt>)
doneVisitingChildrenOf(<xMainRange>)
doneVisitingChildrenOf(<xMainProgram>)
doneVisitingChildrenOf(<xProgramUnit>)
doneVisitingChildrenOf(<xExecutableProgram>)
\end{verbatim}
When it is visited using a \texttt{ParseTreeVisitor}, the Visitor
methods are called in this order:
\begin{verbatim}
visitXexecutableprogram(<xExecutableProgram>)
preparingToVisitChildrenOf(<xExecutableProgram>)
visitXprogramunit(<xProgramUnit>)
preparingToVisitChildrenOf(<xProgramUnit>)
visitXmainprogram(<xMainProgram>)
preparingToVisitChildrenOf(<xMainProgram>)
visitXprogramstmt(<xProgramStmt>)
preparingToVisitChildrenOf(<xProgramStmt>)
visitXprogramname(<xProgramName>)
preparingToVisitChildrenOf(<xProgramName>)
doneVisitingChildrenOf(<xProgramName>)
doneVisitingChildrenOf(<xProgramStmt>)
visitXmainrange(<xMainRange>)
preparingToVisitChildrenOf(<xMainRange>)
visitXendprogramstmt(<xEndProgramStmt>)
preparingToVisitChildrenOf(<xEndProgramStmt>)
doneVisitingChildrenOf(<xEndProgramStmt>)
doneVisitingChildrenOf(<xMainRange>)
doneVisitingChildrenOf(<xMainProgram>)
doneVisitingChildrenOf(<xProgramUnit>)
doneVisitingChildrenOf(<xExecutableProgram>)
\end{verbatim}
(The Visitors used to generate the output above are stored in
a JUnit test class called \texttt{ExampleVisitor}.)
\section{The \texttt{ParseTreeSearcher}}
Inside Visitor methods, typically you will want to look for particular tokens
or syntactic structures.
\subsection{An Example}
Consider the following (randomly selected) method from \texttt{DeclarationCollector},
one of the classes used for building symbol tables.
This class scans a program's parse tree, looking for declarations of programs, functions,
block data, namelists, etc., and inserts declarations for them into a symbol table.
\begin{verbatim}
public void visitXmainprogram(ParseTreeNode node)
{
// # R1101 desn"t ensure ordering as the standard requires;
// <xMainProgram> ::=
// <xMainRange> |
// <xProgramStmt> <xMainRange> ;
// # R1102
// <xProgramStmt> ::=
// <xLblDef> T_PROGRAM <xProgramName> T_EOS ;
ParseTreeNode programStmt = ParseTreeSearcher.findFirstNodeIn(node, Nonterminal.XPROGRAMSTMT);
Token name;
if (programStmt != null)
name = ParseTreeSearcher.findFirstIdentifierIn(node);
else
{
name = new Token();
name.setText("(Anonymous Main Program)");
}
...
\end{verbatim}
(The comment lines are copied from the parser generation grammar, \texttt{Fortran95.bnf}.)
In Fortran, a main program can either start with a program statement, or it
can not. So both of these are valid programs:
\begin{verbatim}
program SayHi
print *, 'Hello!' print *, 'Hello!'
end program SayHi end program SayHi
\end{verbatim}
In this Visitor method, \texttt{node} is guaranteed to be a $<$xMainProgram$>$,
due to the name of the method. As shown, by looking at the grammar, we can
determine that it will either have one child (a \texttt{ParseTreeNode} corresponding
to an $<$xMainRange$>$) or two children (both \texttt{ParseTreeNode}s, the
first corresponding to an $<$xProgramStmt$>$ and the second corresponding to
an $<$xMainRange$>$.
If it \textit{does} have an $<$xProgramStmt$>$, then, a quick look at the rules for
$<$xLblDef$>$ (an optional integer label that can appear at the beginning of any
Fortran statement) and $<$xProgramName$>$ will make it evident that the first
identifier token (T\_IDENT) under the $<$xProgramStmt$>$ is the name of the program.
So now we can understand what this method does.
\begin{itemize}
\item It checks to see if the main program has a $<$xProgramStmt$>$. This is done
by calling \texttt{ParseTreeSearcher\#findFirstNodeIn}, and telling it we want to
find the first $<$xProgramStmt$>$ that is a child of \texttt{node}. It will either
return a \texttt{ParseTreeNode} corresponding to an $<$xProgramStmt$>$, or if it
can't find one, it will return \texttt{null}.
\item If there is an $<$xProgramStmt$>$, it grabs the first T\_IDENT token, which
tells the name the user gave to the program.
\item If there was no $<$xProgramStmt$>$, the program does not have a name, so we
``fake'' a token and name the program ``(Anonymous Main Program).''
\end{itemize}
\subsection{For More Information}
So, essentially, anytime you have a parse tree (or a subtree of the parse tree)
and you want to find a particular node or token, use one of the methods in
\texttt{ParseTreeSearcher}. If there isn't one, you may have to write one yourself.
Be sure to look at the JavaDoc for the methods in that class; unless you are
doing something bizarre, one of the existing methods should work.