blob: 50e3847ee1241fbd5ada8df0fff8d86c49e5e9ff [file] [log] [blame]
\chapter{The Epsilon Comparison Language (ECL)}
\label{sec:ECL}
Model comparison is the task of identifying \emph{matching} elements between models. In general, \emph{matching} elements are elements that are involved in a relationship of interest. For example, before merging homogeneous models, it is essential to identify overlapping (common) elements so that they do not appear in duplicate in the merged model. Similarly, in heterogeneous model merging, it is a prerequisite to identify the elements on which the two models will be merged. Finally, in transformation testing, matching elements are pairs consisting of elements in the input model and their generated counterparts in the output model.
The aim of the Epsilon Comparison Language (ECL) is to enable users to specify comparison algorithms in a rule-based manner to identify pairs of matching elements between two models of potentially different metamodels and modelling technologies. In this section, the abstract and concrete syntax, as well as the execution semantics of the language, are discussed in detail.
%A comparison algorithm separates the model elements of the involved models into two categories: those that have matching elements in the opposite model and those that do not. Moreover, as the algorithm does not necessarily attempt to find matching elements for all the model elements of the involved models, the classification can be further refined to the following:
%
%\begin{enumerate}
% \item Elements for which matching elements exist in the opposite model
% \item Elements for which matching elements do not exist in the opposite model
% \begin{enumerate}
% \item Elements for which matching has been attempted but no matching elements has been found
% \item Elements for which no matching has been attempted
% \end{enumerate}
%\end{enumerate}
\section{Abstract Syntax}
In ECL, comparison specifications are organized in modules (\emph{EcLModule}). As illustrated in Figure \ref{fig:ECL}, EclModule extends EOLLibraryModule which means that it can contain user-defined operations and import other library modules and ECL modules. Apart from operations, an ECL module contains a set of match-rules (\emph{MatchRule}) and a set of \emph{pre} and \emph{post} blocks than run before and after all comparisons, respectively.
\emph{MatchRules} enable users to perform comparison of model elements at a high level of abstraction. Each match-rule declares a name, and two parameters (\emph{leftParameter} and \emph{rightParameter}) that specify the types of elements it can compare. It also optionally defines a number of rules it inherits (\emph{extends}) and if it is \emph{abstract}, \emph{lazy} and/or \emph{greedy}. The semantics of the latter are discussed shortly.
%\begin{landscape}
\begin{sidewaysfigure}
\centering
\includegraphics{images/EclAbstractSyntax.png}
\caption{ECL Abstract Syntax}
\label{fig:ECL}
\end{sidewaysfigure}
%\end{landscape}
A match rule has three parts. The \emph{guard} part is an EOL expression or statement block that further limits the applicability of the rule to an even narrower range of elements than that specified by the \emph{left} and \emph{right} parameters. The \emph{compare} part is an EOL expression or statement block that is responsible for comparing a pair of elements and deciding if they match or not. Finally, the \emph{do} part is an EOL expression or block that is executed if the \emph{compare} part returns true to perform any additional actions required.
\emph{Pre} and \emph{post} blocks are named blocks of EOL statements which as discussed in the sequel are executed before and after the match-rules have been executed respectively.
\section{Concrete Syntax}
The concrete syntax of a match-rule is displayed in Listing \ref{lst:MatchRuleConcreteSyntax}.
\begin{lstlisting}[caption=Concrete Syntax of a MatchRule, label=lst:MatchRuleConcreteSyntax, language=ECL, escapechar=!]
(@lazy)?
(@greedy)?
(@abstract)?
rule !\textit{<name>}!
match !\textit{<leftParameterName>}\textbf{:}\textit{<leftParameterType>}!
with !\textit{<rightParameterName>}\textbf{:}\textit{<rightParameterType>}!
(extends !\textit{<ruleName>}!(!\textbf{,} \textit{<ruleName>}!)*)? {
(guard (!\textbf{:}\textit{expression}!)|(!\textbf{\{}\textit{statementBlock}\textbf{\}}!))?
compare (!\textbf{:}\textit{expression}!)|(!\textbf{\{}\textit{statementBlock}\textbf{\}}!)
(do !\textbf{\{}\textit{statementBlock}\textbf{\}}!)?
}
\end{lstlisting}
\emph{Pre} and \emph{post} blocks have a simple syntax that, as presented in Listing \ref{lst:EclPrePostConcreteSyntax}, consists of the identifier (\emph{pre} or \emph{post}), an optional name and the set of statements to be executed enclosed in curly braces.
\begin{lstlisting}[caption=Concrete Syntax of Pre and Post blocks, label=lst:EclPrePostConcreteSyntax, language=ECL]
(pre|post) <name> {
statement+
}
\end{lstlisting}
\section{Execution Semantics}
\subsection{Rule and Block Overriding}
An ECL module can import a number of other ECL modules. In such a case, the importing ECL module inherits all the rules and pre/post blocks specified in the modules it imports (recursively). If the module specifies a rule or a pre/post block with the same name, the local rule/block overrides the imported one respectively.
\subsection{Comparison Outcome}
As illustrated in Figure \ref{fig:ECLRuntime}, the result of comparing two models with ECL is a trace (\emph{MatchTrace}) that consists of a number of matches (\emph{Match}). Each match holds a reference to the objects from the two models that have been compared (\emph{left} and \emph{right}), a boolean value that indicates if they have been found to be \emph{matching} or not, a reference to the \emph{rule} that has made the decision, and a Map (\emph{info}) that is used to hold any additional information required by the user (accessible at runtime through the \emph{matchInfo} implicit variable). During the matching process, a second, temporary, match trace is also used to detect and resolve cyclic invocation of match-rules as discussed in the sequel.
\begin{figure}
\centering
\includegraphics{images/ECLRuntime.png}
\caption{ECL Match Trace}
\label{fig:ECLRuntime}
\end{figure}
\subsection{Rule Execution Scheduling}
Non-abstract, non-lazy match-rules are evaluated automatically by the execution engine in a top-down fashion - with respect to their order of appearance - in two passes. In the first pass, each rule is evaluated for all the pairs of instances in the two models that have a type-of relationship with the types specified by the \emph{leftParameter} and \emph{rightParameter} of the rule. In the second pass, each rule that is marked as \emph{greedy} is executed for all pairs that have not been compared in the first pass, and which have a kind-of relationship with the types specified by the rule. In both passes, to evaluate the compare part of the rule, the guard must be satisfied.
Before the compare part of a rule is executed, the compare parts of all of the rules it extends (super-rules) must be executed (recursively). Before executing the compare part of a super-rule, the engine verifies that the super-rule is actually applicable to the elements under comparison by checking for type conformance and evaluating the guard part of the super-rule.
If the compare part of a rule evaluates to true, the optional do part is executed. In the do part the user can specify any actions that need to be performed for the identified matching elements, such as to populate the \emph{info} map of the established \emph{match} with additional information. Finally, a new match is added to the match trace that has its \emph{matching} property set to the logical conjunction of the results of the evaluation of the compare parts of the rule and its super-rules.
\subsection{The \emph{matches()} built-in operation}
To refrain from performing duplicate comparisons and to de-couple match-rules from each other, ECL provides the built-in \emph{matches(opposite : Any)} operation for model elements and collections. When the \emph{matches()} operation is invoked on a pair of objects, it queries the main and temporary match-traces to discover if the two elements have already been matched and if so it returns the cached result of the comparison. Otherwise, it attempts to find an appropriate match rule to compare the two elements and if such a rule is found, it returns the result of the comparison, otherwise it returns false. Unlike the top-level execution scheme, the \emph{matches()} operation invokes both \emph{lazy} and \emph{non-lazy} rules.
In addition to objects, the \emph{matches} operations can also be invoked to match pairs of collections of the same type (e.g. a Sequence against a Sequence). When invoked on ordered collections (i.e. \emph{Sequence} and \emph{OrderedSet}), it examines if the collections have the same size and each item of the source collection matches with the item of the same index in the target collection. Finally, when invoked on unordered collections (i.e. \emph{Bag} and \emph{Set}), it examines if for each item in the source collection, there is a matching item in the target collection irrespective of its index. Users can also override the built-in \emph{matches} operation using user-defined operations with the same name, as discussed in Section \ref{sec:Design.EOL.FeatureNavigation}, that loosen or strengthen the built-in semantics.
\subsection{Cyclic invocation of \emph{matches()}}
Providing the built-in \emph{matches} operation significantly simplifies comparison specifications. It also enhances decoupling between match-rules from each other as when a rule needs to compare two elements that are outside its scope, it does not need to know/specify which other rule can compare those elements explicitly.
On the other hand, it is possible - and quite common indeed - for two rules to implicitly invoke each other. For example consider the match rule of Listing \ref{lst:Tree2Tree} that attempts to match nodes of the simple Tree metamodel displayed in Figure \ref{fig:Tree}.
\begin{figure}
\centering
\includegraphics{images/metamodels/Tree.png}
\caption{The Tree Metamodel}
\label{fig:Tree}
\end{figure}
\begin{lstlisting}[caption=The Tree2Tree rule, label=lst:Tree2Tree, language=ECL]
rule Tree2Tree
match l : T1!Tree
with r : T2!Tree {
compare : l.label = r.label and
l.parent.matches(r.parent) and
l.children.matches(r.children)
}
\end{lstlisting}
The rule specifies that for two Tree nodes (\emph{l} and \emph{r}) to match, they should have the same label, belong to matching parents and have matching children. In the absence of a dedicated mechanism for cycle detection and resolution, the rule would end up in an infinite loop. To address this problem, ECL provides a temporary match-trace which is used to detect and resolve cyclic invocations of the \emph{match()} built-in operation.
As discussed above, a match is added to the primary match-trace as soon as the compare part of the rule has been executed to completion. By contrast, a temporary match (with its \emph{matching} property set to \emph{true}) is added to the temporary trace before the compare part is executed. In this way, any subsequent attempts to match the two elements from invoked rules will not re-invoke the rule. Finally, when a top-level rule returns, the temporary match trace is reset.
\section{Fuzzy and Dictionary-based String Matching}
\label{sec:FuzzyComparison}
In the example of Listing \ref{lst:Tree2Tree}, the rule specifies that to match, two trees must - among other criteria - have the same label. However, there are cases when a less-strict approach to matching string properties of model elements is desired. For instance, when comparing two UML models originating from different organizations, it is common to encounter ontologically equivalent classes which however have different names (e.g. Client and Customer). In this case, to achieve a more sound matching, the use of a dictionary or a lexical database (e.g. WordNet \cite{Wordnet}) is necessary. Alternatively, fuzzy string matching algorithms such as those presented in \cite{FuzzyStringMatching} can be used.
As several such tools and algorithms have been implemented in various programming languages, it is a sensible approach to reuse them instead of re-implementing them. For example, in Listing \ref{lst:FuzzyTree2Tree} a wrapper for the Simmetrics \cite{Simmetrics} fuzzy string comparison tool is used to compare the labels of the trees using the Levenshtein \cite{Levenshtein} algorithm. To achieve this, line 11 invokes the \emph{fuzzyMatch()} operation defined in lines 16-18 which uses the simmterics native tool (instantiated in lines 2-4) to match the two labels using their Levenshtein distance with a threshold of 0.5.
\begin{lstlisting}[caption=The FuzzyTree2Tree rule, label=lst:FuzzyTree2Tree, language=ECL]
pre {
var simmetrics =
new Native("org.epsilon.ecl.tools.
textcomparison.simmetrics.SimMetricsTool");
}
rule FuzzyTree2Tree
match l : T1!Tree
with r : T2!Tree {
compare : l.label.fuzzyMatch(r.label) and
l.parent.matches(r.parent) and
l.children.matches(r.children)
}
operation String fuzzyMatch(other : String) : Boolean {
return simmetrics.similarity(self,other,"Levenshtein") > 0.5;
}\end{lstlisting}
\section{Interactive Matching}
\label{sec:InteractiveModelComparison}
Using the user interaction features discussed in Section \ref{sec:Design.EOL.UserInput} the comparison can become interactive by replacing the \emph{fuzzyMatch} operation of listing \ref{lst:FuzzyTree2Tree} with the one specified in Listing \ref{lst:InteractiveTree2TreeComparison}. The fuzzyMatch operation of Listing \ref{lst:InteractiveTree2TreeComparison}, performs the fuzzy string comparison and -- as the previous version -- if the result is greater than 0.5 it returns true. However, in this updated version if the result is lower than 0.5 but greater than 0.3, it prompts the user to confirm if the two strings match, and if it is lower than 0.3 it returns false.
\begin{lstlisting}[caption=An interactive version of the fuzzyMatch operation of Listing \ref{lst:FuzzyTree2Tree}, label=lst:InteractiveTree2TreeComparison, language=ECL]
operation String fuzzyMatch(other : String) : Boolean {
var similarity : Real;
similarity = simmetrics.similarity(self,other,"Levenshtein");
if (similarity > 0.5) {
return true;
}
else if (similarity > 0.3) {
return UserInput.confirm(self + " matches " + other + "?");
}
else {
return false;
}
}\end{lstlisting}
\section{Exploiting the Comparison Outcome}
Users can query and modify the match trace calculated during the comparison process in the post sections of the module or export it into another application or Epsilon program. For example, in a post section, the trace can be printed to the default output stream or serialized into a model of an arbitrary metamodel. In another use case, the trace may be exported to be used in the context of a validation module that will use the identified matches to evaluate inter-model constraints, or in a merging module that will use the matches to identify the elements on which the two models will be merged. The topic of interoperability - that includes importing and exporting objects - between modules expressed in different Epsilon languages is discussed in Chapter \ref{chp:Workflow}.