blob: 3403ea0d08e52bec571f56fca17f448cce3f7599 [file] [log] [blame]
\documentclass[10pt]{article}
\usepackage{xltxtra}
\usepackage{fontspec}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{color}
\definecolor{lstbg}{rgb}{0.98,0.98,0.98}
\usepackage{listings}
\lstset{
basicstyle=\footnotesize\sffamily,
%stringstyle=\ttfamily,
showstringspaces=false,
%% Print real keywords with a bold font
keywordstyle=\bfseries,
%% Print functions (2nd class keywords) italic
keywordstyle=[2]\itshape,
numbers=left,
numberstyle=\tiny,
backgroundcolor=\color{lstbg},
captionpos=b,
frame=single,
stringstyle= % Maybe some custom font for GReQL-strings?
}
\usepackage[normalem]{ulem}
%% Normal font size in \lstinline{} listings.
\makeatletter
\lst@AddToHook{TextStyle}{\let\lst@basicstyle\normalsize\sffamily}
\makeatother
% macro for MetaModelElements
\newcommand{\smme}[1]{\uline{\texttt{#1}}}
\newcommand{\tmme}[1]{\texttt{#1}}
\newcommand{\email}[1]{\href{mailto:#1}{\texttt{#1}}}
\newcommand{\myemail}[0]{\email{horn@uni-koblenz.de}}
%% Schusterjungen und Hurenkinder verhindern. Das sind einzelne Zeilen von
%% Absätzen am Anfang oder Ende einer Seite.
\clubpenalty = 10000
\widowpenalty = 10000
\displaywidowpenalty = 10000
\setmainfont{TeX Gyre Pagella}
\title{Model Transformations for Program Understanding:\\
A Reengineering Challenge}
\author{Dipl.-Inform. Tassilo Horn\\
\myemail\\
\vspace{0.3cm}\\
University Koblenz-Landau, Campus Koblenz\\
Institute for Software Technology\\
Universitätsstraße 1, 56070 Koblenz, Germany}
\begin{document}
\maketitle
\begin{abstract}
In Software Reengineering, one of the central artifacts is the source code of
the legacy system in question. In fact, in most cases it is the only
definitive artifact, because over the time, the code has diverged from the
original architecture and design documents. The first task of any
reengineering project is to gather an understanding on what the system does,
and how it does it. Therefore, a common approach is to use parsers to
translate the source code of the legacy system into a model conforming to the
abstract syntax of the programming language the system is implemented in,
which can then be subject to querying. Despite querying, transformations can
be used to regenerate more abstract models of the system's architecture.
This transformation case deals with the creation of a state machine model out
of a Java syntax graph model. It is derived from a task that originates from
a real reengineering project.
\end{abstract}
\section{Objective of the Case}
\label{sec:objective}
The objective of the reengineering case presented here is to transform an
abstract syntax graph into a simple state machine.
There are two major challenges involved in this transformation task. The first
challenge is an issue of \emph{performance} and \emph{scalability}. The input
model of the transformation is an abstract syntax graph, which is naturally
large for any reasonable program. In this concrete case, the smallest input
model contains more than 6,500 elements, although it is an abstract syntax graph
of only a very small program. Larger models with up to a million elements
might be used during the evaluation to stress-test the used transformation
language implementations.
The second and more important challenge is that the transformation task
involves \emph{complex, non-local matching} of elements. For example, the core
task, which is described in Section~\ref{sec:task-descr-proj}, demands that the
transformation should create one state for each Java class that derives
directly or indirectly from an abstract Java class named ``State.'' There are
no restrictions on the depth of the inheritance hierarchy, so the ``State''
class and its subclasses may be located arbitrarily far in the input model.
However, the structure of the path from subclass to superclass is clearly
specified by the input metamodel and must be utilized by transformations, e.g.,
by recursive pattern matching or similarily expressive matching constructs, an
example being subpattern matching as provided by GrGen.NET \cite{grgen2010}.
\section{Context of the Case}
\label{sec:context}
The SOAMIG\footnote{\url{http://www.soamig.de}} project deals with the
migration of legacy systems to Service-Oriented Architectures by means of
model-driven techniques. One of the two legacy systems on which the approach
is being evaluated is a monolithic Java system, which is operated with a Swing
graphical user interface.
This user interface consists of around 30 different masks, which often relate
to conceptually self-contained functionalities that might be implemented as
services in the reengineered target system. Furthermore, the order in which
masks are activated and which successor masks can be activated from a given
mask gives good hints about the orchestration of the target system services.
The masks are implemented as plain Java classes using the Swing toolkit. Many
masks are very complex with many user interface elements and even more input
validation code, which complicates tracking down the relationships between the
individual masks. However, the graphical user interface is based on a state
machine concept, and uses some strict coding conventions in the implementation.
As such, any masks can be seen as states, and when another mask is activated,
it can be seen as a transition. The trigger of this transition is usually a
click on some button, and possibly additional actions are performed just before
the transition, like validating and storing the user input.
In the project, a GReTL transformation has been developed, which creates a
simple state machine model consisting of states and transitions with triggers
and actions out of the syntax graph of the legacy system. This syntax graph
consists of more than 2.5 million nodes and edges. The transformation heavily
exploits the coding conventions taken as a basis for the implementation of the
graphical user interface. The resulting state machine model contains all
information about the possible sequences in which masks can be activated, what
triggers are responsible for a transition, and what additional actions are
performed when transitioning. However, it consists of less than 100 nodes and
edges, and it can be visualized and printed. Therefore, it is of great value
for the understanding of the legacy system.
The transformation case proposed here is directly derived form this
reengineering project's task. Instead of using the syntax graph of the
proprietary system, a toy example implementing the well-known TCP protocol
state machine using very similar coding conventions is used.
The next section describes how to get the up-to-date versions of this
description and the relevant metamodels and models.
Section~\ref{sec:task-descr} then provides a detailed task description with one
core task and two extensions. The evaluation criteria that are used to judge
the solutions are discussed in Section~\ref{sec:evaluation-criteria}.
\section{The Task Description Project}
\label{sec:task-descr-proj}
The project that includes this description, the input Ecore metamodel, three
input EMF syntax graph models, and the Java source code that was parsed to
generate two of the input models, as well as the output Ecore metamodel can be
cloned from the case submitter's
Mercurial\footnote{\url{http://mercurial.selenic.com/}} repository.
To do so, use the following command on the command line, or an equivalent
instruction in the Mercurial GUI of your choice\footnote{e.g.,
\emph{TortoiseHG}, available at \url{http://tortoisehg.bitbucket.org/}}.
Everyone has read access using the username \emph{anonymous} and the password
\emph{secret}.
\begin{lstlisting}[numbers=none]
% hg clone \
https://anonymous:secret@hg.uni-koblenz.de/horn/tcp-state-extract
\end{lstlisting}
This will create a new folder \texttt{tcp-state-extract} in the current working
directory containing the project contents. The subdirectory \texttt{case}
contains the task description, the subdirectory \texttt{src} contains the Java
source code that is parsed to generated the syntax graph model
\texttt{jamopp/1\_small-model.xmi} conforming to the Ecore metamodel
\texttt{jamopp/java.ecore}. \sloppy{The subdirectory \texttt{src2} contains a
slightly more complex variant of the code in \texttt{src}, and from that the
second input model \texttt{jamopp/2\_medium-model.xmi} is generated. The target
metamodel is \texttt{statemachine/StateMachine.ecore}.}
The reference solution implemented using the GReTL transformation language
\cite{report/HornEbertGReTL2010} is also included as
\texttt{solution/ExtractStateMachines.gretl}.
To update the task description project, use:
\begin{lstlisting}[numbers=none]
% hg pull -u
\end{lstlisting}
Alternatively, an archive file containing the complete repository contents can
be downloaded from
\url{http://www.uni-koblenz.de/~horn/tcp-state-extract.tar.gz}. This archive
also contains the mercurial metadata, so if you decide to install mercurial
later on, you can update to the most recent version with \lstinline{hg pull -u}
as above.
The case should be discussed at the TTC 2011 discussion groups\footnote{Follow
the \emph{Groups} link at \url{http://planet-mde.org/ttc2011}.}, and any
errors or obscurities in either the description itself or in the models will be
fixed in time.
\section{Detailed Task Description}
\label{sec:task-descr}
In this section, the transformation task is explained in details.
The overall goal of this task is to create a very simple \emph{state machine
model} for a \emph{Java syntax graph model} encoding a state machine with a
set of coding conventions a transformation has to exploit.
The task is divided into one mandatory \emph{core task}
(Section~\ref{sec:core-task}) and two optional \emph{extension tasks} with
slightly increased complexity (Section~\ref{sec:ext1-triggers} and
Section~\ref{sec:ext2-actions}).
The conventions used to implement the state machine in Java that should be
exploited by transformations are explained in terms of concrete Java syntax,
i.e., by using source code examples. However, the most relevant metamodel
elements are named, too. In the following, source metamodel elements are
written underlined (e.g., \smme{MethodCall}), and target metamodel elements are
typeset with a typewriter font (e.g., \tmme{Transition}).
\paragraph{Source Metamodel and Models.}
The primary source model of the transformation is the Java abstract syntax graph
\texttt{jamopp/1\_small-model.xmi} conforming to the Java metamodel provided as
Ecore \cite{EMF09} file \texttt{jamopp/java.ecore}. The model is generated by
parsing the source code in the \texttt{src} directory of the task description
project using the \emph{JaMoPP} (\emph{Java Model Parser and Printer},
\cite{jamopp09}) tool developed at the Technical University of Dresden.
The source code in the \lstinline{src} directory whose abstract syntax graph is
the input model of the transformation task is a toy-implementation of the
TCP-protocol state
machine\footnote{\url{http://en.wikipedia.org/wiki/Transmission_Control_Protocol}}.
The coding conventions that transformations have to utilize are discussed in
the descriptions of the core and extension tasks below.
The input model contains any information present in the source code. It
consists of about 6,500 elements.
The second input model \texttt{jamopp/2\_medium-model.xmi} is generated by
parsing the source code contained in the \texttt{src2} directory. It is
slightly larger (~6,800 elements), and it is more complex, e.g., the
specialization hierarchy is deeper, and there is a deeper nesting of statements
in method bodies.
A third input model \texttt{jamopp/3\_big-model.xmi} is also provided. It was
generated by parsing the source code of a Java project containing about 900
classes and 220.000 lines of code. Into this project, the classes implementing
the TCP state machine were integrated. This model has about the size and
complexity of the model in the original reengineering scenario.
All provided input models implement the TCP state machine according to the
conventions specified in the task description below, so the target state
machine model is always the same (without respect to the order of elements).
Although the input models are provided as EMF models, the transformation tools
are not restricted to that technological space \cite{technologicalspaces02}.
To make sure that all solutions use equivalent models, both in terms of the
metamodel as well as the model sizes, it may be imported into the native format
of the transformation framework, but it is not allowed to create new input
models using custom parsers.
\paragraph{Target Metamodel.}
As target metamodel, the very basic state machine metamodel shown in
Figure~\ref{fig:state-machine-mm} is used.
\begin{figure}[h!]
\centering
\includegraphics[width=0.5\linewidth]{StateMachine}
\caption{The target Ecore metamodel}
\label{fig:state-machine-mm}
\end{figure}
A \tmme{StateMachine} consists of an arbitrary number of \tmme{States} and
\tmme{Transitions}.
Any \tmme{Transition} starts at exactly one \tmme{src}-\tmme{State}, and it
ends at exactly one \tmme{dst}-\tmme{State}. Every \tmme{State} may have an
arbitrary number of outgoing \tmme{Transitions} (\tmme{out}), and an arbitrary
number of incoming \tmme{Transitions} (\tmme{in}).
Every \tmme{State} has a \tmme{name}, and any \tmme{Transition} may be caused
by a \tmme{trigger}, and as a result of its activation, an \tmme{action} might
be performed.
If possible, transformations should produce a valid EMF XMI
\cite{spec/omg/XMI07} instance of the
\lstinline{statemachine/StateMachine.ecore} metamodel. However, if the used
transformation language is not built upon EMF and doesn't have an EMF export
facility, then it is acceptable to provide the output model in some
understandable output format. For example, a visualization of the state
machine could be appropriate (cf. Figure~\ref{fig:ext2-task-result}), or a
plain-text representation.
\subsection{The Core Task}
\label{sec:core-task}
The core task should create a state machine model that contains all entities,
i.e., all \tmme{States} and all \tmme{Transitions} with the appropriate
references set. Additionally, the \tmme{name} attribute defined for the
\tmme{State} class must be set. The initialization of the \tmme{trigger} and
\tmme{action} attributes are left for the extension tasks.
In the following, the coding conventions used to implement the TCP state
machine in plain-java are discussed in terms of concrete Java syntax and by
using the \lstinline{SynSent} class contained in the \texttt{src} directory,
which is shown in Listing~\ref{lst:syn-sent}.
\begin{lstlisting}[language=Java, caption={The \lstinline{SynSent} class}, label={lst:syn-sent}, float={h!t}]
public class SynSent extends ListeningState {
private static State instance;
public static State Instance() {
if (instance == null) {
instance = new SynSent();
}
return instance;
}
public void close() {
Closed.Instance().activate();
}
@Override
protected void run() {
switch (getReceivedFlag()) {
case SYN:
send(Flag.SYN_ACK);
SynReceived.Instance().activate();
return;
case SYN_ACK:
send(Flag.ACK);
Established.Instance().activate();
return;
default:
break;
}
}
}
\end{lstlisting}
The coding conventions relevant for the core task are as follows:
\begin{enumerate}
\item A \lstinline{State} is a non-abstract Java class
(\smme{classifiers.Class}) that extends the abstract class named ``State''
directly or indirectly. All concrete state classes are implemented as
singletons \cite{DesignPatterns95}.
In the example from Listing~\ref{lst:syn-sent}, \lstinline{SynSent} extends
the abstract \lstinline{ListeningState} state class, and that in turn extends
the abstract \lstinline{State} class.
\item A \lstinline{Transition} is encoded by a method call
(\smme{references.MethodCall}), which invokes the next state's
\lstinline{Instance()} method (\smme{members.Method}) returning the singleton
instance of that state on which the \lstinline{activate()} method is called
in turn. This activation may be contained in any of the classes' methods
with an arbitrary deep nesting.
Transformation may assume that the activation of the next state always has
the form \lstinline{NewState.Instance().activate()}. The case that the
singleton instance is stored in a variable or returned by some other method
except for \lstinline{Instance()} can be neglected.
In the example, there are three transitions.
In the \lstinline{close()} method, there is one transition from the current
state (\lstinline{SynSent}) into the \lstinline{Closed} state (line 12).
In the \lstinline{run()} method, there are another two transitions. In line
20, there is a transition into the \lstinline{SynReceived} state, and in line
24, there is a transition into the \lstinline{Established} state.
\end{enumerate}
The target model state names should be set according to the Java classes they
were created for.
The outcome of the core task is a state machine with 11 states and 21
transitions between the states. A visualization of that state machine produced
by the reference solution is shown in Figure~\ref{fig:core-task-result}.
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{state_machine1}
\caption{The state machine produced according to the core task}
\label{fig:core-task-result}
\end{figure}
\subsection{Extension 1: Triggers}
\label{sec:ext1-triggers}
This extension task deals with the \lstinline{trigger} attribute of
transitions. There are three different coding conventions that a
transformation has to exploit to set the correct trigger value. These three
conventions and one fallback rule are specified as follows.
\begin{enumerate}
\item If activation of the next state occurs in any method except
\lstinline{run()}, then that method's name (\smme{members.Method.name}) shall
be used as the trigger.
For example, the activation of \lstinline{Closed} in line 12 of
Listing~\ref{lst:syn-sent} occurs in the \lstinline{close()} method, so the
trigger is \lstinline{close}.
\item If the activation of the next state occurs inside a non-default
\lstinline{case} block (\smme{statements.NormalSwitchCase}) of a
\lstinline{switch} statement (\smme{statements.Switch}) in the
\lstinline{run()} method, then the enumeration constant
(\smme{members.EnumConstant}) used as condition of the corresponding
\lstinline{case} is the trigger.
For example, when activating \lstinline{SynReceived} in line 20 of
Listing~\ref{lst:syn-sent}, the trigger is \lstinline{SYN}. When activating
\lstinline{Established} in line 24, the trigger is \lstinline{SYN_ACK}.
\item If the activation of the new state occurs inside a \lstinline{catch}
block (\smme{statements.CatchBlock}) inside the \lstinline{run()} method,
then the trigger is the name of the caught exception's class.
For example, when activating the \lstinline{Closed} state in line 9 of
Listing~\ref{lst:time-wait}, the trigger is \lstinline{TimeoutException}.
\item If none of the three cases above can be matched for the activation of the
next state, i.e., the activation call is inside the \lstinline{run()} method
but without a surrounding \lstinline{switch} or \lstinline{catch}, the
corresponding transition is triggered unconditionally. In that case, the
trigger attribute shall be set to \lstinline{--}.
\end{enumerate}
\begin{lstlisting}[language=Java,label={lst:time-wait}, caption={Parts of the \lstinline{TimeWait} class}, float={h!}]
public class TimeWait extends State {
// some code elided...
@Override
protected void run() {
try {
timeWait();
} catch (TimeoutException e) {
Closed.Instance().activate();
}
}
}
\end{lstlisting}
Transformations can assume that the four different cases can be matched without
ambiguity, e.g., there is no \lstinline{catch} block activating some state
inside a surrounding \lstinline{switch}, or vice versa.
The target state machine that is produced by the reference transformation
implementing the core and the first extension task is shown in
Figure~\ref{fig:ext1-task-result}.
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{state_machine2}
\caption{The state machine with the \lstinline{trigger} attribute set
according to the first extension task}
\label{fig:ext1-task-result}
\end{figure}
\subsection{Extension 2: Actions}
\label{sec:ext2-actions}
The task of the second extension is to set the \lstinline{action} attributes of
transitions. The action of a transition is specified as follows.
\begin{enumerate}
\item If the block (\smme{statements.StatementListContainer}) containing the
activation call of the next state additionally contains a method call to the
\lstinline{send()} method, then that call's enumeration constant parameter's
name is the action.
For example, the \lstinline{action} attribute of the transition corresponding
to the state activation call in line 20 of Listing~\ref{lst:syn-sent} has to
be set to \lstinline{SYN_ACK}. The \lstinline{action} attribute of the
transition corresponding to the activation call in line 24 has to be set to
\lstinline{ACK}.
\item If there is no call to \lstinline{send()} in the activation call's block,
the \lstinline{action} of the corresponding transition shall be set to
\lstinline{--}.
For example, the activation call in line 12 of Listing~\ref{lst:syn-sent},
and the activation call in line 9 of Listing~\ref{lst:time-wait} perform no
action, and thus the attribute has to be set to \lstinline{--}.
\end{enumerate}
A visualization of the complete target state machine produced by the reference
transformation is shown in Figure~\ref{fig:ext2-task-result}.
\begin{figure}[h!]
\centering
\includegraphics[width=\linewidth]{state_machine3}
\caption{The final state machine after performing the core and both extension
tasks}
\label{fig:ext2-task-result}
\end{figure}
\section{Evaluation Criteria}
\label{sec:evaluation-criteria}
The case project contains an OpenDocument spreadsheet
\lstinline{evaluation/evaluation_sheet.ods}, which is used to judge the
solutions.
As motivated in Section~\ref{sec:context}, the goal of this transformation case
is supporting program understanding. By facilitating a set of coding
conventions, model transformations can be used to accomplish the task of
extracting the implicitly encoded state machine, in contrast to modeling it
manually by thoroughly inspecting all relevant classes of the system in
question.
In order to have a feasible solution, the time needed for writing and executing
the transformation must be comparable to the time that would be needed for a
code inspection and manually modelling the state machine. However, if we
assume that the set of coding conventions derived from the initial brief
inspection is correct, we can also assume that the transformation produces a
correct output without human mistakes.
Since the speed of writing a solution cannot be judged directly, we assume that
\textbf{understandability} and \textbf{conciseness} of the solution are the
objectively ascertainable properties that directly relate to the time needed
for implementing that solution. Ideally, each coding convention described in
Section~\ref{sec:task-descr} results in a transformation rule, and the
statement of the convention is clearly visible.
The \textbf{correctness} of the solution is another important point. If the
model created by the transformation is the foundation of weighty decisions in a
reengineering project, it would be fatal if the transformation created a model
that doesn't reflect the reality.
In that respect, the \textbf{completeness} of a solution is closely entangled
to correctness, because an incomplete model may also lead to false decisions.
The last property that will be judged is the \textbf{performance}. It is much
less important than the other criteria, but of course such a transformation
should be applicable on common hardware in acceptable time.
\bibliography{case}
\bibliographystyle{alpha}
\end{document}
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: t
%%% fill-column: 79
%%% TeX-engine: xetex
%%% End: