| \documentclass[10pt]{article} |
| \usepackage{xltxtra} |
| \usepackage{fontspec} |
| \usepackage{graphicx} |
| \usepackage{hyperref} |
| |
| \usepackage{color} |
| \definecolor{lstbg}{rgb}{0.98,0.98,0.98} |
| |
| \usepackage{listings} |
| \lstset{ |
| basicstyle=\footnotesize\sffamily, |
| %stringstyle=\ttfamily, |
| showstringspaces=false, |
| %% Print real keywords with a bold font |
| keywordstyle=\bfseries, |
| %% Print functions (2nd class keywords) italic |
| keywordstyle=[2]\itshape, |
| numbers=left, |
| numberstyle=\tiny, |
| backgroundcolor=\color{lstbg}, |
| captionpos=b, |
| frame=single, |
| stringstyle= % Maybe some custom font for GReQL-strings? |
| } |
| |
| \usepackage[normalem]{ulem} |
| |
| %% Normal font size in \lstinline{} listings. |
| \makeatletter |
| \lst@AddToHook{TextStyle}{\let\lst@basicstyle\normalsize\sffamily} |
| \makeatother |
| |
| % macro for MetaModelElements |
| \newcommand{\smme}[1]{\uline{\texttt{#1}}} |
| \newcommand{\tmme}[1]{\texttt{#1}} |
| |
| \newcommand{\email}[1]{\href{mailto:#1}{\texttt{#1}}} |
| \newcommand{\myemail}[0]{\email{horn@uni-koblenz.de}} |
| |
| %% Schusterjungen und Hurenkinder verhindern. Das sind einzelne Zeilen von |
| %% Absätzen am Anfang oder Ende einer Seite. |
| \clubpenalty = 10000 |
| \widowpenalty = 10000 |
| \displaywidowpenalty = 10000 |
| |
| \setmainfont{TeX Gyre Pagella} |
| |
| |
| \title{Model Transformations for Program Understanding:\\ |
| A Reengineering Challenge} |
| \author{Dipl.-Inform. Tassilo Horn\\ |
| \myemail\\ |
| \vspace{0.3cm}\\ |
| University Koblenz-Landau, Campus Koblenz\\ |
| Institute for Software Technology\\ |
| Universitätsstraße 1, 56070 Koblenz, Germany} |
| |
| \begin{document} |
| |
| \maketitle |
| |
| \begin{abstract} |
| In Software Reengineering, one of the central artifacts is the source code of |
| the legacy system in question. In fact, in most cases it is the only |
| definitive artifact, because over the time, the code has diverged from the |
| original architecture and design documents. The first task of any |
| reengineering project is to gather an understanding on what the system does, |
| and how it does it. Therefore, a common approach is to use parsers to |
| translate the source code of the legacy system into a model conforming to the |
| abstract syntax of the programming language the system is implemented in, |
| which can then be subject to querying. Despite querying, transformations can |
| be used to regenerate more abstract models of the system's architecture. |
| |
| This transformation case deals with the creation of a state machine model out |
| of a Java syntax graph model. It is derived from a task that originates from |
| a real reengineering project. |
| \end{abstract} |
| |
| |
| \section{Objective of the Case} |
| \label{sec:objective} |
| |
| |
| The objective of the reengineering case presented here is to transform an |
| abstract syntax graph into a simple state machine. |
| |
| There are two major challenges involved in this transformation task. The first |
| challenge is an issue of \emph{performance} and \emph{scalability}. The input |
| model of the transformation is an abstract syntax graph, which is naturally |
| large for any reasonable program. In this concrete case, the smallest input |
| model contains more than 6,500 elements, although it is an abstract syntax graph |
| of only a very small program. Larger models with up to a million elements |
| might be used during the evaluation to stress-test the used transformation |
| language implementations. |
| |
| The second and more important challenge is that the transformation task |
| involves \emph{complex, non-local matching} of elements. For example, the core |
| task, which is described in Section~\ref{sec:task-descr-proj}, demands that the |
| transformation should create one state for each Java class that derives |
| directly or indirectly from an abstract Java class named ``State.'' There are |
| no restrictions on the depth of the inheritance hierarchy, so the ``State'' |
| class and its subclasses may be located arbitrarily far in the input model. |
| However, the structure of the path from subclass to superclass is clearly |
| specified by the input metamodel and must be utilized by transformations, e.g., |
| by recursive pattern matching or similarily expressive matching constructs, an |
| example being subpattern matching as provided by GrGen.NET \cite{grgen2010}. |
| |
| |
| \section{Context of the Case} |
| \label{sec:context} |
| |
| The SOAMIG\footnote{\url{http://www.soamig.de}} project deals with the |
| migration of legacy systems to Service-Oriented Architectures by means of |
| model-driven techniques. One of the two legacy systems on which the approach |
| is being evaluated is a monolithic Java system, which is operated with a Swing |
| graphical user interface. |
| |
| This user interface consists of around 30 different masks, which often relate |
| to conceptually self-contained functionalities that might be implemented as |
| services in the reengineered target system. Furthermore, the order in which |
| masks are activated and which successor masks can be activated from a given |
| mask gives good hints about the orchestration of the target system services. |
| |
| The masks are implemented as plain Java classes using the Swing toolkit. Many |
| masks are very complex with many user interface elements and even more input |
| validation code, which complicates tracking down the relationships between the |
| individual masks. However, the graphical user interface is based on a state |
| machine concept, and uses some strict coding conventions in the implementation. |
| As such, any masks can be seen as states, and when another mask is activated, |
| it can be seen as a transition. The trigger of this transition is usually a |
| click on some button, and possibly additional actions are performed just before |
| the transition, like validating and storing the user input. |
| |
| In the project, a GReTL transformation has been developed, which creates a |
| simple state machine model consisting of states and transitions with triggers |
| and actions out of the syntax graph of the legacy system. This syntax graph |
| consists of more than 2.5 million nodes and edges. The transformation heavily |
| exploits the coding conventions taken as a basis for the implementation of the |
| graphical user interface. The resulting state machine model contains all |
| information about the possible sequences in which masks can be activated, what |
| triggers are responsible for a transition, and what additional actions are |
| performed when transitioning. However, it consists of less than 100 nodes and |
| edges, and it can be visualized and printed. Therefore, it is of great value |
| for the understanding of the legacy system. |
| |
| The transformation case proposed here is directly derived form this |
| reengineering project's task. Instead of using the syntax graph of the |
| proprietary system, a toy example implementing the well-known TCP protocol |
| state machine using very similar coding conventions is used. |
| |
| The next section describes how to get the up-to-date versions of this |
| description and the relevant metamodels and models. |
| Section~\ref{sec:task-descr} then provides a detailed task description with one |
| core task and two extensions. The evaluation criteria that are used to judge |
| the solutions are discussed in Section~\ref{sec:evaluation-criteria}. |
| |
| |
| \section{The Task Description Project} |
| \label{sec:task-descr-proj} |
| |
| The project that includes this description, the input Ecore metamodel, three |
| input EMF syntax graph models, and the Java source code that was parsed to |
| generate two of the input models, as well as the output Ecore metamodel can be |
| cloned from the case submitter's |
| Mercurial\footnote{\url{http://mercurial.selenic.com/}} repository. |
| |
| To do so, use the following command on the command line, or an equivalent |
| instruction in the Mercurial GUI of your choice\footnote{e.g., |
| \emph{TortoiseHG}, available at \url{http://tortoisehg.bitbucket.org/}}. |
| Everyone has read access using the username \emph{anonymous} and the password |
| \emph{secret}. |
| |
| \begin{lstlisting}[numbers=none] |
| % hg clone \ |
| https://anonymous:secret@hg.uni-koblenz.de/horn/tcp-state-extract |
| \end{lstlisting} |
| |
| This will create a new folder \texttt{tcp-state-extract} in the current working |
| directory containing the project contents. The subdirectory \texttt{case} |
| contains the task description, the subdirectory \texttt{src} contains the Java |
| source code that is parsed to generated the syntax graph model |
| \texttt{jamopp/1\_small-model.xmi} conforming to the Ecore metamodel |
| \texttt{jamopp/java.ecore}. \sloppy{The subdirectory \texttt{src2} contains a |
| slightly more complex variant of the code in \texttt{src}, and from that the |
| second input model \texttt{jamopp/2\_medium-model.xmi} is generated. The target |
| metamodel is \texttt{statemachine/StateMachine.ecore}.} |
| |
| The reference solution implemented using the GReTL transformation language |
| \cite{report/HornEbertGReTL2010} is also included as |
| \texttt{solution/ExtractStateMachines.gretl}. |
| |
| To update the task description project, use: |
| |
| \begin{lstlisting}[numbers=none] |
| % hg pull -u |
| \end{lstlisting} |
| |
| Alternatively, an archive file containing the complete repository contents can |
| be downloaded from |
| \url{http://www.uni-koblenz.de/~horn/tcp-state-extract.tar.gz}. This archive |
| also contains the mercurial metadata, so if you decide to install mercurial |
| later on, you can update to the most recent version with \lstinline{hg pull -u} |
| as above. |
| |
| The case should be discussed at the TTC 2011 discussion groups\footnote{Follow |
| the \emph{Groups} link at \url{http://planet-mde.org/ttc2011}.}, and any |
| errors or obscurities in either the description itself or in the models will be |
| fixed in time. |
| |
| |
| \section{Detailed Task Description} |
| \label{sec:task-descr} |
| |
| In this section, the transformation task is explained in details. |
| |
| The overall goal of this task is to create a very simple \emph{state machine |
| model} for a \emph{Java syntax graph model} encoding a state machine with a |
| set of coding conventions a transformation has to exploit. |
| |
| The task is divided into one mandatory \emph{core task} |
| (Section~\ref{sec:core-task}) and two optional \emph{extension tasks} with |
| slightly increased complexity (Section~\ref{sec:ext1-triggers} and |
| Section~\ref{sec:ext2-actions}). |
| |
| The conventions used to implement the state machine in Java that should be |
| exploited by transformations are explained in terms of concrete Java syntax, |
| i.e., by using source code examples. However, the most relevant metamodel |
| elements are named, too. In the following, source metamodel elements are |
| written underlined (e.g., \smme{MethodCall}), and target metamodel elements are |
| typeset with a typewriter font (e.g., \tmme{Transition}). |
| |
| |
| \paragraph{Source Metamodel and Models.} |
| |
| The primary source model of the transformation is the Java abstract syntax graph |
| \texttt{jamopp/1\_small-model.xmi} conforming to the Java metamodel provided as |
| Ecore \cite{EMF09} file \texttt{jamopp/java.ecore}. The model is generated by |
| parsing the source code in the \texttt{src} directory of the task description |
| project using the \emph{JaMoPP} (\emph{Java Model Parser and Printer}, |
| \cite{jamopp09}) tool developed at the Technical University of Dresden. |
| |
| The source code in the \lstinline{src} directory whose abstract syntax graph is |
| the input model of the transformation task is a toy-implementation of the |
| TCP-protocol state |
| machine\footnote{\url{http://en.wikipedia.org/wiki/Transmission_Control_Protocol}}. |
| The coding conventions that transformations have to utilize are discussed in |
| the descriptions of the core and extension tasks below. |
| |
| The input model contains any information present in the source code. It |
| consists of about 6,500 elements. |
| |
| The second input model \texttt{jamopp/2\_medium-model.xmi} is generated by |
| parsing the source code contained in the \texttt{src2} directory. It is |
| slightly larger (~6,800 elements), and it is more complex, e.g., the |
| specialization hierarchy is deeper, and there is a deeper nesting of statements |
| in method bodies. |
| |
| A third input model \texttt{jamopp/3\_big-model.xmi} is also provided. It was |
| generated by parsing the source code of a Java project containing about 900 |
| classes and 220.000 lines of code. Into this project, the classes implementing |
| the TCP state machine were integrated. This model has about the size and |
| complexity of the model in the original reengineering scenario. |
| |
| All provided input models implement the TCP state machine according to the |
| conventions specified in the task description below, so the target state |
| machine model is always the same (without respect to the order of elements). |
| |
| Although the input models are provided as EMF models, the transformation tools |
| are not restricted to that technological space \cite{technologicalspaces02}. |
| To make sure that all solutions use equivalent models, both in terms of the |
| metamodel as well as the model sizes, it may be imported into the native format |
| of the transformation framework, but it is not allowed to create new input |
| models using custom parsers. |
| |
| |
| \paragraph{Target Metamodel.} |
| |
| As target metamodel, the very basic state machine metamodel shown in |
| Figure~\ref{fig:state-machine-mm} is used. |
| |
| \begin{figure}[h!] |
| \centering |
| \includegraphics[width=0.5\linewidth]{StateMachine} |
| \caption{The target Ecore metamodel} |
| \label{fig:state-machine-mm} |
| \end{figure} |
| |
| A \tmme{StateMachine} consists of an arbitrary number of \tmme{States} and |
| \tmme{Transitions}. |
| |
| Any \tmme{Transition} starts at exactly one \tmme{src}-\tmme{State}, and it |
| ends at exactly one \tmme{dst}-\tmme{State}. Every \tmme{State} may have an |
| arbitrary number of outgoing \tmme{Transitions} (\tmme{out}), and an arbitrary |
| number of incoming \tmme{Transitions} (\tmme{in}). |
| |
| Every \tmme{State} has a \tmme{name}, and any \tmme{Transition} may be caused |
| by a \tmme{trigger}, and as a result of its activation, an \tmme{action} might |
| be performed. |
| |
| If possible, transformations should produce a valid EMF XMI |
| \cite{spec/omg/XMI07} instance of the |
| \lstinline{statemachine/StateMachine.ecore} metamodel. However, if the used |
| transformation language is not built upon EMF and doesn't have an EMF export |
| facility, then it is acceptable to provide the output model in some |
| understandable output format. For example, a visualization of the state |
| machine could be appropriate (cf. Figure~\ref{fig:ext2-task-result}), or a |
| plain-text representation. |
| |
| |
| \subsection{The Core Task} |
| \label{sec:core-task} |
| |
| The core task should create a state machine model that contains all entities, |
| i.e., all \tmme{States} and all \tmme{Transitions} with the appropriate |
| references set. Additionally, the \tmme{name} attribute defined for the |
| \tmme{State} class must be set. The initialization of the \tmme{trigger} and |
| \tmme{action} attributes are left for the extension tasks. |
| |
| In the following, the coding conventions used to implement the TCP state |
| machine in plain-java are discussed in terms of concrete Java syntax and by |
| using the \lstinline{SynSent} class contained in the \texttt{src} directory, |
| which is shown in Listing~\ref{lst:syn-sent}. |
| |
| \begin{lstlisting}[language=Java, caption={The \lstinline{SynSent} class}, label={lst:syn-sent}, float={h!t}] |
| public class SynSent extends ListeningState { |
| private static State instance; |
| |
| public static State Instance() { |
| if (instance == null) { |
| instance = new SynSent(); |
| } |
| return instance; |
| } |
| |
| public void close() { |
| Closed.Instance().activate(); |
| } |
| |
| @Override |
| protected void run() { |
| switch (getReceivedFlag()) { |
| case SYN: |
| send(Flag.SYN_ACK); |
| SynReceived.Instance().activate(); |
| return; |
| case SYN_ACK: |
| send(Flag.ACK); |
| Established.Instance().activate(); |
| return; |
| default: |
| break; |
| } |
| } |
| } |
| \end{lstlisting} |
| |
| The coding conventions relevant for the core task are as follows: |
| |
| \begin{enumerate} |
| \item A \lstinline{State} is a non-abstract Java class |
| (\smme{classifiers.Class}) that extends the abstract class named ``State'' |
| directly or indirectly. All concrete state classes are implemented as |
| singletons \cite{DesignPatterns95}. |
| |
| In the example from Listing~\ref{lst:syn-sent}, \lstinline{SynSent} extends |
| the abstract \lstinline{ListeningState} state class, and that in turn extends |
| the abstract \lstinline{State} class. |
| \item A \lstinline{Transition} is encoded by a method call |
| (\smme{references.MethodCall}), which invokes the next state's |
| \lstinline{Instance()} method (\smme{members.Method}) returning the singleton |
| instance of that state on which the \lstinline{activate()} method is called |
| in turn. This activation may be contained in any of the classes' methods |
| with an arbitrary deep nesting. |
| |
| Transformation may assume that the activation of the next state always has |
| the form \lstinline{NewState.Instance().activate()}. The case that the |
| singleton instance is stored in a variable or returned by some other method |
| except for \lstinline{Instance()} can be neglected. |
| |
| In the example, there are three transitions. |
| |
| In the \lstinline{close()} method, there is one transition from the current |
| state (\lstinline{SynSent}) into the \lstinline{Closed} state (line 12). |
| |
| In the \lstinline{run()} method, there are another two transitions. In line |
| 20, there is a transition into the \lstinline{SynReceived} state, and in line |
| 24, there is a transition into the \lstinline{Established} state. |
| \end{enumerate} |
| |
| The target model state names should be set according to the Java classes they |
| were created for. |
| |
| The outcome of the core task is a state machine with 11 states and 21 |
| transitions between the states. A visualization of that state machine produced |
| by the reference solution is shown in Figure~\ref{fig:core-task-result}. |
| |
| \begin{figure}[h!] |
| \centering |
| \includegraphics[width=\linewidth]{state_machine1} |
| \caption{The state machine produced according to the core task} |
| \label{fig:core-task-result} |
| \end{figure} |
| |
| |
| \subsection{Extension 1: Triggers} |
| \label{sec:ext1-triggers} |
| |
| This extension task deals with the \lstinline{trigger} attribute of |
| transitions. There are three different coding conventions that a |
| transformation has to exploit to set the correct trigger value. These three |
| conventions and one fallback rule are specified as follows. |
| |
| \begin{enumerate} |
| \item If activation of the next state occurs in any method except |
| \lstinline{run()}, then that method's name (\smme{members.Method.name}) shall |
| be used as the trigger. |
| |
| For example, the activation of \lstinline{Closed} in line 12 of |
| Listing~\ref{lst:syn-sent} occurs in the \lstinline{close()} method, so the |
| trigger is \lstinline{close}. |
| \item If the activation of the next state occurs inside a non-default |
| \lstinline{case} block (\smme{statements.NormalSwitchCase}) of a |
| \lstinline{switch} statement (\smme{statements.Switch}) in the |
| \lstinline{run()} method, then the enumeration constant |
| (\smme{members.EnumConstant}) used as condition of the corresponding |
| \lstinline{case} is the trigger. |
| |
| For example, when activating \lstinline{SynReceived} in line 20 of |
| Listing~\ref{lst:syn-sent}, the trigger is \lstinline{SYN}. When activating |
| \lstinline{Established} in line 24, the trigger is \lstinline{SYN_ACK}. |
| \item If the activation of the new state occurs inside a \lstinline{catch} |
| block (\smme{statements.CatchBlock}) inside the \lstinline{run()} method, |
| then the trigger is the name of the caught exception's class. |
| |
| For example, when activating the \lstinline{Closed} state in line 9 of |
| Listing~\ref{lst:time-wait}, the trigger is \lstinline{TimeoutException}. |
| \item If none of the three cases above can be matched for the activation of the |
| next state, i.e., the activation call is inside the \lstinline{run()} method |
| but without a surrounding \lstinline{switch} or \lstinline{catch}, the |
| corresponding transition is triggered unconditionally. In that case, the |
| trigger attribute shall be set to \lstinline{--}. |
| \end{enumerate} |
| |
| \begin{lstlisting}[language=Java,label={lst:time-wait}, caption={Parts of the \lstinline{TimeWait} class}, float={h!}] |
| public class TimeWait extends State { |
| // some code elided... |
| |
| @Override |
| protected void run() { |
| try { |
| timeWait(); |
| } catch (TimeoutException e) { |
| Closed.Instance().activate(); |
| } |
| } |
| } |
| \end{lstlisting} |
| |
| Transformations can assume that the four different cases can be matched without |
| ambiguity, e.g., there is no \lstinline{catch} block activating some state |
| inside a surrounding \lstinline{switch}, or vice versa. |
| |
| The target state machine that is produced by the reference transformation |
| implementing the core and the first extension task is shown in |
| Figure~\ref{fig:ext1-task-result}. |
| |
| \begin{figure}[h!] |
| \centering |
| \includegraphics[width=\linewidth]{state_machine2} |
| \caption{The state machine with the \lstinline{trigger} attribute set |
| according to the first extension task} |
| \label{fig:ext1-task-result} |
| \end{figure} |
| |
| |
| \subsection{Extension 2: Actions} |
| \label{sec:ext2-actions} |
| |
| The task of the second extension is to set the \lstinline{action} attributes of |
| transitions. The action of a transition is specified as follows. |
| |
| \begin{enumerate} |
| \item If the block (\smme{statements.StatementListContainer}) containing the |
| activation call of the next state additionally contains a method call to the |
| \lstinline{send()} method, then that call's enumeration constant parameter's |
| name is the action. |
| |
| For example, the \lstinline{action} attribute of the transition corresponding |
| to the state activation call in line 20 of Listing~\ref{lst:syn-sent} has to |
| be set to \lstinline{SYN_ACK}. The \lstinline{action} attribute of the |
| transition corresponding to the activation call in line 24 has to be set to |
| \lstinline{ACK}. |
| \item If there is no call to \lstinline{send()} in the activation call's block, |
| the \lstinline{action} of the corresponding transition shall be set to |
| \lstinline{--}. |
| |
| For example, the activation call in line 12 of Listing~\ref{lst:syn-sent}, |
| and the activation call in line 9 of Listing~\ref{lst:time-wait} perform no |
| action, and thus the attribute has to be set to \lstinline{--}. |
| \end{enumerate} |
| |
| A visualization of the complete target state machine produced by the reference |
| transformation is shown in Figure~\ref{fig:ext2-task-result}. |
| |
| \begin{figure}[h!] |
| \centering |
| \includegraphics[width=\linewidth]{state_machine3} |
| \caption{The final state machine after performing the core and both extension |
| tasks} |
| \label{fig:ext2-task-result} |
| \end{figure} |
| |
| |
| \section{Evaluation Criteria} |
| \label{sec:evaluation-criteria} |
| |
| The case project contains an OpenDocument spreadsheet |
| \lstinline{evaluation/evaluation_sheet.ods}, which is used to judge the |
| solutions. |
| |
| As motivated in Section~\ref{sec:context}, the goal of this transformation case |
| is supporting program understanding. By facilitating a set of coding |
| conventions, model transformations can be used to accomplish the task of |
| extracting the implicitly encoded state machine, in contrast to modeling it |
| manually by thoroughly inspecting all relevant classes of the system in |
| question. |
| |
| In order to have a feasible solution, the time needed for writing and executing |
| the transformation must be comparable to the time that would be needed for a |
| code inspection and manually modelling the state machine. However, if we |
| assume that the set of coding conventions derived from the initial brief |
| inspection is correct, we can also assume that the transformation produces a |
| correct output without human mistakes. |
| |
| Since the speed of writing a solution cannot be judged directly, we assume that |
| \textbf{understandability} and \textbf{conciseness} of the solution are the |
| objectively ascertainable properties that directly relate to the time needed |
| for implementing that solution. Ideally, each coding convention described in |
| Section~\ref{sec:task-descr} results in a transformation rule, and the |
| statement of the convention is clearly visible. |
| |
| The \textbf{correctness} of the solution is another important point. If the |
| model created by the transformation is the foundation of weighty decisions in a |
| reengineering project, it would be fatal if the transformation created a model |
| that doesn't reflect the reality. |
| |
| In that respect, the \textbf{completeness} of a solution is closely entangled |
| to correctness, because an incomplete model may also lead to false decisions. |
| |
| The last property that will be judged is the \textbf{performance}. It is much |
| less important than the other criteria, but of course such a transformation |
| should be applicable on common hardware in acceptable time. |
| |
| |
| |
| \bibliography{case} |
| \bibliographystyle{alpha} |
| |
| \end{document} |
| |
| %%% Local Variables: |
| %%% mode: LaTeX |
| %%% TeX-master: t |
| %%% fill-column: 79 |
| %%% TeX-engine: xetex |
| %%% End: |