blob: 55478ccd2907f5721cb2c6888b82811035576a46 [file] [log] [blame]
\usepackage{caption, subcaption}
\title{An OCL-based bridge from concrete to abstract syntax}
\author{Adolfo S\'{a}nchez-Barbudo Herrera\inst{1}, Edward Willink\inst{2},
Richard F. Paige\inst{1}}
Department of Computer Science, University of York, UK.\\
\email{\{asbh500, richard.paige\}\_at\}
Willink Transformations Ltd.
The problem of converting human readable programming languages into executable machine representations is an old one. EBNF and Attribute grammars provide solutions, but unfortunately they have failed to contribute effectively to model-based Object Management Group (OMG) specifications. Consequently the OCL and QVT specifications provide examples of specifications with significant errors and omissions. We describe an OCL-based internal domain specific language (DSL) with which we can re-formulate the problematic parts of the specifications as complete, checkable, re-useable models.
%One of the challenges that tool builders must address when dealing with Object Management Group (OMG) specifications for textual languages is bridging the language's concrete syntax (CS) and abstract syntax (AS). Though there has been work aiming to facilitate the generation of tooling (e.g. parsers, model transformations, etc.) for bridging this gap, and some OMG standards (e.g., OCL) attempt to describe bridges (e.g., via attribute grammars), there as yet does not exist an established, vendor-independent language that: helps OMG specification designers clearly define the CS2AS bridge; provides machine checking; and helps vendors build specification-compliant implementations. In this paper, we propose an OCL-based internal domain specific language (DSL) with the aim of effectively defining CS2AS bridges of a language. The proposed DSL provides a basis for OMG specification designers to describe ways to implement parts of their specifications, and at the same time, it provides certain challenges to tool vendors to produce more compliant implementations. %, and a prototype tool in charge of consuming instances of that language to produce a working implementation
% These set of languages resolve the syntactical vagueness of OCL's own specification.
The Object Management Group (OMG) is a consortium whose members produce open technology standards. Some of these target the Model-Driven Engineering (MDE) community. OMG provides the specifications for languages such as UML \cite{omg2012uml}, MOF \cite{omg2013mof}, OCL \cite{omg2013ocl} and QVT \cite{omg2014qvt}.
The specifications for textual languages such as OCL and QVT define a textual language and an information model using:
\item an EBNF grammar to define the textual language
\item a UML metamodel to define the abstract syntax (AS) of the language
The textual language is suitable for users and for source interchange between compliant tools. The information model facilitates model interchange between producing tools such as editors or compilers and consuming tools such as program checkers or evaluators.
On one hand, textual language designers intend to create compact grammars, without limiting the textual language capabilities and conciseness for end users. On the other hand, model information designers intend to create well designed abstract syntaxes to facilitate the model information adoption by producing and consuming tools. These intentions are not normally aligned: unless we sacrifice the interests of any of the mentioned stakeholders, we get the situation in which we have a big gap between the textual language grammar and the model information, and additional conversions between the different involved data structures are required.
Therefore, the conversion between these two representations must also be specified and may make use of an additional intermediate concrete syntax (CS) metamodel whose elements correspond to the productions and terminals of the textual language grammar\footnote{Modern language workbenches, such as Xtext, can automatically generate the CS metamodel from their input grammars}. OMG specifications tend to provide concise textual languages grammars, and well designed AS metamodels, without compromising one in favour of the other. In consequence, CS to AS conversions are defined in some OMG specifications, however, as we will see along this paper, there is room for improvement.
%Specification of the conversion currently uses semi-formal approaches with limited tool checking. As a result there is significant variation in the completeness and accuracy of specifications and of course further variation as tool implementors use their intuition to resolve the specification limitations.
\subsection{The OMG specification problem}
The OCL \cite{omg2013ocl} and QVT \cite{omg2014qvt} specifications define four languages, OCL, QVTc (Core), QVTo (Operational Mappings), and QVTr (Relations)%\footnote{QVTr language has a graphical CS as well.}
. The specifications all provide fairly detailed grammars and metamodels of their respective abstract syntaxes.
Unfortunately the grammar to AS conversion is poorly specified.
In OCL, a CS is provided and the grammar is partitioned into ambiguous productions for each CS element. Semi-formal rules define the grammar to CS correspondence, the CS to AS correspondence, name resolution and disambiguation.
QVTr has a single coherent grammar to accompany its CS and similar semi-formal rules.
QVTc has a single grammar but no CS and no semi-formal rules.
QVTo similarly has a single grammar, but no CS and no semi-formal rules. Instead, notation sections suggest a correspondence between source text and AS elements by way of examples.
Since none of the conversions are modeled, tools cannot be used to check the many details in the specifications. As a result, the major omissions identified above are augmented by more subtle oversights and inaccuracies. The specifications fail to provide the complete, consistent and accurate details to help tool vendors to provide compliant implementations of the text to AS conversions.
%Modern ; it would be beneficial for all such specifications to provide consistent and systematic support for CS2AS bridges. To accomplish this, we will further analyze the OCL specification to better understand its support: the OCL specification does a more systematic job at the CS2AS bridge, by providing a specific clause (Clause 9.3 from \cite{omg2013ocl}) to describe mappings.
\subsection{Our solution}
The intermediate CS metamodel is close to the grammar, and it can be automatically generated by modern Annotated EBNF tooling such as Xtext. It is in the CS to AS conversion that greater challenges arise.
In this paper, we take inspiration from the substantial semi-formal exposition of the OCL conversions (Clause 9.3 of \cite{omg2013ocl}) and introduce a fully modeled CS2AS bridge. The models can be used to variously check and even auto-generate a consistent specification and also to auto-generate compliant tooling. In addition to conventional CS and AS metamodels, we introduce new CS2AS mapping models, name resolution models and CS disambiguation models. We demonstrate how OCL itself can be used to provide a suitable internal DSL for these new models.
%Whilst OMG specifications usually include an exhaustive\footnote{We do not mean correct and free of errors} definition of language (or languages) abstract syntax (AS), there is substantial variation in how specifications present concrete syntax (CS): whereas we can find a fairly detailed CS for OCL, the same level of detail can't be found across the languages defined in the QVT specification.
%One of the problems that tool implementors need to face when creating OMG specification-compliant tools is how to bridge the gap between the CS and the AS. In most cases, the specification provides few or no hints as to how to define a bridge; as well, the specification may itself be inconsistent [?]. These inconsistencies might lead to different decisions taken by implementors, which is not ideal for the end user, who will then have to decide between different and incompatible tools.
% If specification designers had the means to previously define the aforementioned bridges -- including the tools to verify that those bridges are feasible to implement -- both specification and any implementing tools would be more likely to be consistent. If the means to define CS2AS bridges were given in the form of well established domain specific languages, specification designers and tools implementors could also benefit from MDE techniques to speed up the production of the corresponding deliverables \cite{kosar2010dslVsgpl} ([?] another one?).
%In this paper we propose means to bridge the textual CS and the corresponding AS of a language, exposing the problem from two specific OMG specifications -- OCL and QVT -- while showing a solution for a running example. The main technical contribution of this paper is thus an OCL-based internal Domain Specific Language (DSL) \cite{fowler2010dsl} to declaratively express those CS2AS bridges which tackle cross-cutting concerns, such as name resolution and concrete syntax disambiguation.
The paper is structured as follows. Section~\ref{sec:example} presents an example to introduce the grammar and metamodels. Section~\ref{sec:semi-formal-solution} demonstrates the semi-formal solution adopted by the OCL specification. Section~\ref{sec:solution} explains the proposed solution, i.e. an OCL-based internal DSL. Section~\ref{sec:relatedWork} describes related work and Section~\ref{sec:limitations} talks about the current shortcomings of the approach. Section~\ref{sec:futureWork} outlines some future work, including how tool implementors can benefit from the internal DSL. Finally, Section~\ref{sec:conclusions} concludes.
%\section{Challenges with OMG specifications}
%it is a different story for the descriptions of CS, as summarized in Table~\ref{tab:OCLQVTcsDetails}.
%\begin{tabular}{ l | c | c | c | c }
% & Examples & Notation & Grammar & CS2AS bridge \\
% \hline
%OCL & Yes & No & Yes & Yes \\
%QVTr & Yes & No & Yes & Yes \\
%QVTc & Yes & No & Yes & No \\
%QVTo & Yes & Yes & Yes & No \\
%\caption{CS details for OCL and QVT languages}
%\item In each language specification we can find examples to explain how textual constructs can be used to create instances of those languages.
%\item For QVTo only, we can find a dedicated notation section in the AS specification explaining the different ways we can textually realize AS concepts.
%\item All language specifications provide an EBNF \cite{wirth1996ebnf} grammar.
%\item The OCL and QVT-Relations specifications provide some explanations as to how the CS can be mapped to the language AS; in other words, a CS2AS bridge.
Our first example is a collection literal expression. This provides a simple example of the grammars and models in use. In Section~\ref{sec:semi-formal-solution} we show the semi-formal usage of these concepts by the OCL specification. In Section~\ref{sec:solution} we provide a contrast with our fully-modeled internal DSL solution. This example is too simple to demonstrate more than the CS2AS characteristics of our solution. We therefore introduce a further more relevant example later.
%In this section we introduce a running example, in which different excerpts of the OCL CS and AS are depicted. The rationale of choosing these specific excerpts are two-fold: they are rich enough to be used for explaining the main three concerns that are covered by the CS2AS internal DSL (Section~\ref{sec:solution}): CS2AS mappings, name resolution and disambiguation; secondly, they are small enough to be understood by the reader given space restrictions.
%The AS of our running example is given in terms of a metamodel, so that we can depict the example-relevant metaclasses, properties, and relationships as they are defined in the AS by the OMG specification. Figure~\ref{fig:exampleAS} depicts the OCL abstract syntax relevant to the running example.
% \centering
% \includegraphics[scale=0.75]{images/RunningaExampleAS.png}
% \caption{OCL Abstract Syntax partial Metamodel}
% \label{fig:exampleAS}
%The CS is exposed in terms of an EBNF grammar and a corresponding CS metamodel. Each grammar rule relates to a CS metaclass. In the following subsections, we individually introduce the different expressions of the running example, as well as rationale as to why they are considered here.
%\subsection{Collection literal part}
%The parts of a \textit{collection literal expression} in OCL are provided by \textit{collection literal part}s.
The listing in Figure~\ref{fig:CollectionLiteralPartAS} is an example of a \textit{collection literal expression} comprising three comma-separated collection literal parts. The adjacent diagram shows the corresponding AS metamodel elements. \emph{CollectionLiteralExp} contains many abstract \emph{CollectionLiteralPart}s. \emph{CollectionItem} and \emph{CollectionRange} are derived to support the two cases of a single value or a two-ended integer range. The example text must be converted to instances of the AS metamodel elements.
\begin{lstlisting}[label=lst:CollectionLiteralExpExample, language=OCL]
Sequence{1, 1+1, 3..9+1}
-- equivalent to:
-- Sequence{1,2,3,4,5,
-- 6,7,8,9,10}
\caption{CollectionLiteralPart Example and partial AS Metamodel}
The listing in Figure~\ref{fig:CollectionLiteralPartCS} shows the EBNF grammar that parses a \textit{collection literal part} as a \emph{CollectionLiteralPartCS} comprising one direct \emph{OclExpressionCS} or a \emph{CollectionRangeCS} comprising two \emph{OclExpressionCS}s. The adjacent diagram shows the intermediate CS model, which is similar to the AS but which omits a `redundant' \emph{CollectionItemCS} preferring to share a single/first expression from the non-abstract \emph{CollectionLiteralPartCS}.
%To be more specific, a collection literal part could either be a simple element (i.e comprising one expression) of a collection, or a collection range which represents an integer range between two expressions. This syntax element in OCL is interesting because it priovides a very simple example of another concern for our CS2AS bridge; CS disambiguation. In this case, a \emph{CollectionLiteralPartCS} can be disambiguated either to a \emph{CollectionItem} or to a \emph{CollectionRange}, depending on how many expressions were used in the collection literal part: one expression disambiguates to a \emph{CollectionItem}; two expressions separated by \emph{..} diamiguate to a \emph{CollectionRange}. Figure~\ref{fig:CollectionLiteralPartCS} shows the CS definition related to a collection literal part.
OclExpressionCS | CollectionRangeCS
OclExpressionCS '..' OclExpressionCS
% \caption{CollectionLiteralPartCS grammar }
% \label{fig:CollectionLiteralPartCS:a}
% \caption{CollectionLiteralPartCS metamodel }
% \label{fig:CollectionLiteralPartCS:b}
\caption{CollectionLiteralPartCS Grammar and partial CS Metamodel}
\section{Semi-formal solution: OCL Clause 9.3}
The OCL specification provides a full attribute grammar in which inherited and synthesized attributes are used to describe how the AS is computed from the CS. Figures~\ref{fig:CollectionLiteralPartOMG} and \ref{fig:CollectionRangeOMG} shows our first example. The specification uses OCL expressions to express how the different attributes are computed. %Attribute grammar are a suitable mechanism to describe a CS2AS bridge; however, we will add some critics about it.
\caption{OCL specification for CollectionLiteralPartCS to CollectionLiteralPart}
The first section defines the EBNF production(s). The example merges two alternate productions and so many of the rules have an \verb$[A]$ or \verb$[B]$ prefix to accommodate the alternative rules.
The AS mapping declares the type of the resulting AS element as the type of a special property of the CS element: \emph{ast}.
The synthesized attributes populate the AS element using an assignment for \verb$[A]$. The more complex \verb$[B]$ worksaround OCL 2.4's inability to construct a \emph{CollectionItem} by imposing constraints on a hypothetical \emph{CollectionItem}.
The inherited attributes contribute to the name resolution by flowing down an \textit{Environment} hierachy of all available name-element pairs from parent to child nodes using another special CS property: \emph{env}. In this case all names visible in the parent are passed without modification to the children.
The disambiguating rules provide guidance on the resolution of ambiguities. In this simple example, there is no ambiguity.
\caption{OCL specification for CollectionRangeCS to CollectionRange}
The rules for collection range follow a similar pattern. There is now just one grammar production whose two \emph{OclExpression}s are distinguished by \verb$[1]$ and \verb$[2]$ suffixes. The synthesized attributes have two properties to populate.
The presentation comes quite close to specifying what is needed, but uses an intuitive mix of five sub-languages without any tool assistance. In Figure~\ref{fig:CollectionLiteralPartOMG}, the typo whereby \emph{CollectionItem::OclExpression} rather than \emph{CollectionItem::item} is used in the final line of the synthesized attributes has gone unreported for over 10 years.
The lack of tooling also obscures the modeling challenge for the inheritances between \emph{CollectionLiteralPartCS}, \emph{CollectionRangeCS} and \emph{OclExpressionCS}. The \verb$[B]$ grammar production in Figure \ref{fig:CollectionLiteralPartOMG} requires \emph{OclExpressionCS} to inherit from \emph{CollectionLiteralPartCS}, if \emph{CollectionLiteralPartCS} is to be the polymorphic type of any collection literal part in the CS.
The lack of any underlying models makes it impossible for tool vendors to re-use the rules. Tool vendors must transcribe and risk introducing further errors.
%While attribute grammars are well-suited to describing a CS2AS bridge, they do not lend themselves to standards-compliant tool support. In particular, the declaration of Clause 9.3 from the OCL specification is imprecise, in the sense that it does not conform to any existent language, and there is no tool that vendors can rely on to help to build CS2AS bridges from it, nor to validate the attribute grammars. It would be beneficial to instead be able to use a standard language -- with tool support -- to define CS2AS bridges. Ideally, for OMG languages, it would be beneficial to be able to use a language like OCL to define CS2AS bridges.
%\textbf{Unchecked CS2AS bridges:} Due to the fact that OCL specification designers came up with their own notation, and there is no tool to check the proposed attribute grammar, we can find errors in Clause 9.3 of the OCL specification. For instance, in Figure~\ref{fig:CollectionLiteralPartOMG} we can see how the second expression that computes the synthesized attributes is incorrect, because, from the OCL AS definition, the property name comprising the OCL expression of a \emph{CollectionItem} is \emph{item}, rather than \emph{OclExpression}
%With these criticisms in mind, in this paper we present a pure OCL-based internal DSL so that existing OCL tools can be used to declare CS2AS bridges so that:
%\item OMG specification designers can express CS2AS bridges in an OMG language. These bridge definitions will be of higher quality since they can be created and checked using existing OCL tools.
%\item Implementers can benefit from those bridges to build tools supporting OMG specifications comprising textual languages, e.g. by applying MDE techniques (transformations, code generation, etc.).
\section{Modeled Solution: CS2AS internal DSL}
The critique of the semi-formal exposition highlights the lack of checkable or re-useable models. In this section we formalize the semi-formal approach using a DSL to declare the bridge between the CS and the AS of a language. The DSL is internal \cite{fowler2010dsl} and uses only facilities proposed for OCL 2.5. The DSL constrains the use of the general purpose OCL language to define a set of idioms that express CS2AS bridges.
Our rationale for choosing OCL as the host language is as follows:
\item OMG specifications have a problem with bridging the CS to AS gap, so we would like an OMG-based solution.
\item OCL contains a rich expression language which can provide enough flexibility to express non trivial CS2AS bridges in a completely declarative way.
\item Other OMG related languages could be considered (such as one of the QVT languages), however OCL is a well known OMG language and is the basis of many others. A QVT practitioner inherently knows OCL but not vice-versa.
Instances of the internal DSL take the form of Complete OCL documents and can be maintained using Complete OCL tools \cite{eclipseOclOnline}. Multiple documents can be used to partition the specification into modules %for topics such as Messages or States and
to separate the distinct mapping, name-resolution, and disambiguation concerns of the CS2AS bridge.
\subsection{Shadow Object Construction}
The internal DSL uses the proposed\footnote{Shadow object construction was called type construction in the Aachen report \cite{brucker2013aachenReport}} side-effect-free solution to the problem of constructing types in OCL. This avoids the need for the hypothetical objects used by the semi-formal approach. The proposed syntax re-uses the existing syntax for constructing a Tuple. The \emph{Tuple} keyword is replaced by the name of the type to be constructed. A \verb$Complex$ number with \verb$x$ and \verb$y$ parts might therefore be constructed as \verb$Complex{x=1.0,y=2.0}$. %The subtleties of ensuring that construction is side-effect-free are not relevant to this paper.
\subsection{CS2AS mappings}
In this subsection we explain the main CS2AS mappings description language. We start by introducing an instance of the language so that the reader can have an indication of the DSL used to describe the bridge. Listing ~\ref{lst:exampleCS2ASdesc} corresponds to the CS2AS description of the OCL constructs introduced in Section~\ref{sec:example}. The listing should be contrasted with the semi-formal equivalent in Figures~\ref{fig:CollectionLiteralPartOMG} and \ref{fig:CollectionRangeOMG}.
\begin{lstlisting}[caption=CS2AS bridge for CollectionLiteralPart and CollectionRange, label=lst:exampleCS2ASdesc, language=OCL]
context CollectionLiteralPartCS
def : ast() : ocl::CollectionLiteralPart =
ocl::CollectionItem {
item = first.ast(),
type = first.ast().type
context CollectionRangeCS
def : ast() : ocl::CollectionRange =
ocl::CollectionRange {
first = first.ast(),
last = last.ast(),
type = first.ast().type.commonType(last.ast().type)
The mapping is described by defining the \emph{ast()} operation on a CS element.
%ASBH a missleading comment about execution is introduced here
%Polymorphic dispatch ensures that the CS element-specific operation is used unlike the \emph{ast} property of the semi-formal approach.
The `abstract syntax mapping' and `synthesized attributes' of the semi-formal approach are modeled by the shadow construction of the appropriate AS type and initialization of its properties. (The initialization includes the \emph{type} property omitted by the OCL specification.)
%In this Complete OCL document we have declared the CS2AS bridge to define how AS elements are obtained from CS elements. The bridge is expressed by the means of a correspondence between CS types and AS types. This correspondence is given in terms of operation definitions, so that the operation context type -- an element type in the CS -- is the source of the correspondence, and the operation returned type -- an element type in the AS -- is the target of the correspondence. The actual characteristic function of the correspondence will be described by the body of those operations. For instance, taking as reference Listing~\ref{lst:exampleCS2ASdesc}, the \emph{ast} operation at line 2 defines the correspondence between \emph{LetExp} elements and \emph{LetExpCS} elements. The body of that operation, at lines 3-7, describes how \emph{LetExp} elements (AS) are actually computed from \emph{LetExpCS} elements (CS).
%A more detailed explanation of the whole DSL, as well as the rationale about its design decisions, follows. All the listing and line references below correspond to Listing~\ref{lst:exampleCS2ASdesc}:
\textbf{Declarativeness:} An important characteristic of the DSL is that it comprises declarative OCL constraints. The OCL constraints specify only true correspondences between AS and CS after a valid conversion. In a scenario of executing the proposed CS2AS descriptions, discovery of a suitable order in which to perform CS to AS conversions requires an implementing tool to analyze the OCL constraints and exploit their inter-dependencies. (This was also the unstated policy of the semi-formal approach.) An automated analysis is desirable since they are almost too complicated for an accurate manual formulation as a multi-pass conversion.
\textbf{Operations:} The CS2AS bridge is described using operation definitions. The underlying rationale is that operation definitions on a potentially complex class hierarchy of the CS can be overridden. Due to this overriding mechanism, we provide some flexibility to cope with language extensions such as QVT. The operation name is not relevant, but we propose the name \emph{"ast"} since it is aligned with the name used in the attribute grammar exposed in the OCL specification.
\textbf{Shadow object construction:} Shadow object constructions express how AS elements are constructed and how their properties are initialized. %This kind of expression allows the specification of which concrete type is used to bridge towards to (it might differ - be a subtype - from the returned type of the \emph{ast} operation), but also to express how all the corresponding properties of the AS element will be computed. Lines 4-6 shows how a \emph{LetExp} is created from a \emph{LetExpCS}, as well as how the \emph{LetExp::variable}, \emph{LetExp::in} and \emph{LetExp::type} properties are computed.
\textbf{Operation Calls:} To compute properties of any AS element, we need to access the AS elements to determine a CS to AS correspondence.
%Given that we are using \emph{ast} operations to set a correspondence between CS and AS elements, an \emph{ast} operation call expression (OCE) is used to denote that we want to obtain the AS element corresponding to the CS element source of that OCE.
Since \emph{ast()} is a side-effect-free query, we may call \emph{ast()} as many times as necessary to obtain the appropriate AS element.
For example, at line 4, in order to initialize the \emph{CollectionItem::item} property, we use the \emph{ast()} to obtain the \emph{OclExpression} corresponding to the \emph{first} \emph{OclExpressionCS} of the context \emph{CollectionLiteralPartCS}.
\textbf{Self-contained:} With the goal in mind of using the proposed internal DSL to rewrite part of the OMG specifications, the declaration of the CS2AS bridge for a particular CS element is complete and self-contained. The computations for all non-default-valued properties of the corresponding AS element are expressed directly in the shadow type expression since there is no constructor to share inherited computations.
\textbf{Reusable computations:} Having OCL as the host language for our internal DSL, we can factor out and define more complex and reusable expressions in new operation definitions. The operations can be reused, by just introducing operation call expressions, across the different computations of the AS element properties. For example, a t line 13 of Listing~\ref{lst:exampleCS2ASdesc}, \emph{commonType} is a reusable operation
%\footnote{The body of the operation is not relevant, hence, not described in this paper}
to compute the common supertype of source and argument types.
%\textbf{Name resolution:} As a special case of reusable computation, we want to highlight one which links with Subsection~\ref{subsec:nameReso}: name resolution. It's common a situation in which an AS element refers to a different one based a named-based lookup. In our example, at line 11, we have such scenarios, so that in the context of a \emph{VariableExp} element, there is a lookup activity triggered by the \emph{lookupVariable} OCE, which aim to find a \emph{Variable} to be referred by that \emph{VariableExp}.
%\textbf{Disambiguation:} A CS element doesn't necessarily need to be bridged to just one type of AS element. It's a common situation that preliminary ambiguous CS elements are disambiguated to different AS elements. Subsection~\ref{subsec:disamb} elaborates on this topic. %We can find varied examples in the OCL specification but \emph{CollectionLiteralPartCS} is an easy to understand one. To describe this scenario in our DSL, we will cascade a set of \emph{IfExp} in the body of the \emph{ast} operation: every condition would comprise a disambiguation rule; the \emph{then} and \emph{else} expressions comprise the shadow type expressions denoting the AS element to which the disambiguation would take place. In our running example, lines 23-33 depicts the mentioned scenario and more details about the called \emph{mapsToCollectionItem} operation will be given in Subsection~\ref{subsec:disamb}: disambiguation.
\subsection{Name resolution description}
In this subsection, we explain how name resolution is described when defining CS2AS bridges by the means of our OCL-based internal DSL. In a name resolution activity we can typically find two main roles:
\item a producer provides a name-to-element map for all possible elements in its producing scope.
\item a consumer looks up a specific element corresponding to a name in its consuming context
Our previous example had no need to resolve names, so we will now introduce a new example with a name producer and a consumer. The listing in Figure~\ref{fig:LetExpAS} is an example of a \textit{let expression} that declares and initializes a variable named \emph{var} for use within the \emph{'in'} of the \textit{let expression}. In this example the \emph{'in'} comprises just a \textit{variable expression} that references \emph{var}. The adjacent diagram shows the corresponding AS metamodel elements. A \emph{LetExp} contains the produced \emph{Variable} and an arbitrary \emph{OclExpression} \emph{'in'}. For our simple example the \emph{'in'} is just a \emph{VariableExp}. The complexity of the example lies in the initialization of the consuming \emph{VariableExp.referred Variable} to reference the producing \emph{LetExp.variable}.
\begin{lstlisting}[label=lst:letExpExample, language=OCL]
let var : String = 'something'
in var
% \caption{LetExpCS and VariableDeclarationCS grammar excerpts}
% \label{fig:LetExpCS:a}
% \caption{LetExpCS metamodel excerpt}
% \label{fig:LetExpCS:b}
\caption{LetExp/VariableExp Example and partial AS Metamodel}
%The \textit{let expression} declares a new variable \emph{var}, and the \emph{'in'} expression contains a \textit{variable expression} referring to that variable. This is represented in the AS by the reference from VariableExp to Variable in Figure~\ref{fig:exampleAS}).
%This expression is interesting because it's related to one of the main activities of the CS2AS DSL: name resolution. Whilst a let expression declares a new variable, the \emph{'in'} expression contains a variable expression referring to that variable (note the reference from VariableExp to Variable in the AS excerpt depicted by Figure~\ref{fig:exampleAS}). This activity of creating references between elements of the AS based on a name lookup is called name resolution (Subsection~\ref{subsec:nameReso}).
Figure~\ref{fig:LetExpCS} shows the corresponding grammar and CS definitions\footnote{The complexity of multi comma-separated variables has been removed, because it is not needed to explain how name resolution is described in our interal DSL}. A \emph{LetExpCS} contains a \emph{VariableDeclarationCS} and \emph{OclExpressionCS} which for our example is just a \emph{VariableExpCS}.
\begin{lstlisting}[label=lst:letExpEBNF, language=Xtext]
'let' VariableDeclarationCS
'in' OclExpressionCS
simpleName (':' TypeCS)?
('=' OclExpressionCS)?
simpleName | 'self'
% \caption{LetExpCS and VariableDeclarationCS grammar excerpts}
% \label{fig:LetExpCS:a}
% \caption{LetExpCS metamodel excerpt}
% \label{fig:LetExpCS:b}
\caption{LetExpCS/VariableExpCS Grammar and partial CS Metamodel}
%\subsubsection{Variable expression}
%A \textit{variable expression} in OCL references a variable defined elsewhere. The reference may explicitly or implictly refer to the \emph{self} context variable or to an iterator variable, or explicitly to a variable defined in an outer let expression or to a parameter of an operation definition. Discovering and prioritizing the candidate definitions is performed by the name resolution activity. The \emph{varName} provided by a \emph{VariableExpCS} in the concrete syntax is used to look up a \emph{Variable} in the AS.
In typical programming languages every use of a variable has a corresponding declaration. The variable declaration is the producer of a name-to-variable mapping. The variable usage consumes the variable by referencing its name. Name resolution searches the hierarchy of producing contexts that surround the consuming context to locate a name-element mapping for the required name.
In our example, the required cross-reference in the AS is represented in the CS by the distinct \emph{VariableDeclarationCS.varName} and \emph{VariableExpCS.varName} properties. These are both parsed with the value \emph{var} and so, when consumption of the \emph{VariableExpCS.varName} is analyzed, the analysis must discover the corresponding \emph{VariableDeclarationCS.varName} production.
The semi-formal approach adopted by the OCL specification re-uses the containment hierarchy of the CS as the scope hierarchy for its `inherited attributes'. The name-to-element mappings are maintained in an \emph{Environment} hierarchy. The mappings flow down from the root CS element to all the leaf elements which accumulate additional name-to-element mappings and/or nested environments at each intermediate CS element in the CS tree.
In Section \ref{sec:semi-formal-solution} we saw the very simple unmodified flow-down for a \emph{CollectionLiteralPart}. The equivalent exposition for a \emph{LetExp} in the OCL specification is complicated by performing the CS2AS mapping of multiple comma-separated let-variables with respect to the CS rather than the AS. We therefore present its logical equivalent in Listing~\ref{lst:semi-formal-letexpcs}.
\begin{lstlisting}[caption=Semi-formal LetExpCS equivalent, label=lst:semi-formal-letexpcs, language=OCL]
LetExpCS ::= ‘let’ VariableDeclarationCS ‘in’ OclExpressionCS
VariableDeclarationCS.env = LetExpCS.env
OclExpressionCS.env = LetExpCS.env.nestedEnvironment().addElement(VariableDeclarationCS.ast)
The environment of the LetExpCS is passed unchanged to the VariableDeclarationCS so that name resolution within the VariableDeclarationCS initializer sees the same names as the LetExpCS.
The environment for the OclExpressionCS is more interesting. A nested Environment is created containing the name-to-variable mapping for the let-variable. The use of a nested environment ensures that the let-variable name occludes any same-named mapping in the surrounding environment.
%A imperative-declarative approach discussion doesn´t seem to resemble to this. Not needed. Remove
%Although the use of the nestedEnvironment() and addElement() appears to be imperative, a declarative interpretation is possible since addElement() returns a copy of its immutable source environment with an additional name-to-element mapping.
Our modeled approach is very similar but re-uses the AS tree rather than the CS tree as the scope hierarchy.
%ASBH Very vague. I don´t even know what you want to say. Not needed. Rescued an argument commented in related work section
%This avoids the complications that arise when syntax sugar such as comma-separated let-variables also needs resolution.
The rationale is that we are interested in looking up AS elements for which we might not have the corresponding CS (e.g OCL standard library or user model elements -- classes, properties, operations, etc. --).
%In the OCL specification, Clause 9, we might find some hints related to the names resolution activity: On one hand, in Clause 9.4, an \emph{Environment} definition is provided, including some operations to deal with this concept of environment (i.e. a list of named elements); on the other hand in Clause 9.3, the attribute grammar makes uses of those definitions so the environments can be modified, propagated and queried in the CS2AS bridge declaration in order to perform the lookup activities. In the OCL attribute grammar we can spot how the consumers and producers of the lookup activity are interfaced: when an \emph{Environment::addElement/s} operation is invoked we are dealing with a producer, whilst when any form of \emph{lookup} operation is invoked we are dealing with a consumer.
%In essence, an environment comprises a list of named element which can be looked up in other parts of textual and they can be nested so a child environment can occlude a contribution of a named element with the same name as another contribution done in the parent environment.
%We now explain how name resolution is described in our OCL-based CS2AS DSL. Listing~\ref{lst:exampleNameResodesc} is the name resolution description\footnote{Due to space constraints, just context definitions are included} for our running example; it will be used as a reference when explaining the language design decisions and rationale.
\begin{lstlisting}[caption=Name resolution producers, label=lst:exampleNameResodesc, language=OCL]
context OclAny
def : env : env::Environment =
if oclContainer() <> null
then oclContainer().childEnv(self)
else env::Environment{}
def : childEnv(child : OclAny) : env::Environment =
context LetExp
def : childEnv(child : OclAny) : env::Environment =
if child = variable
then env
else env.nestedEnv().addElement(variable)
Listing~\ref{lst:exampleNameResodesc} presents the name resolution description written in our OCL-based internal DSL. Line 2 declares an \emph{env} property to hold the immutable \emph{Environment} of the AS element. \emph{env} is initialized by a containment tree descent that uses \emph{oclContainer()}\footnote{oclContainer() returns the containing element which is null at the root.}. Line 5 provides an empty environment at the root, otherwise Line 4 uses \emph{childEnv(child)} to request the parent to compute the child-specific environment.
The default definition of \emph{childEnv(child)} on lines 8-9 flows down the prevailing environment to all its children. This can be inherited by the many AS elements that do not enhance the environment.
The non-default override of \emph{childEnv(child)} for LetExp on lines 12-16 uses the \emph{child} argument to compute different environments for the \emph{Variable} and \emph{OclExpression} children. As we saw for the semi-formal approach, the environment for the \emph{Variable} is unmodified. The environment for the \emph{OclExpression} is extended by the addition of the variable in a nested environment.
The environment is exploited by consumers to satisfy their requirement to convert a textual name into the corresponding model element. The conversion comprises three steps
\item locate all candidate elements
\item apply a filtering predicate to select only the candidates of interest
\item return the selected candidate or candidates
The first stage is performed by the environment propagation described above.
The filtering predicate invariably selects just those elements whose name matches a required name. It may often provide further discrimination such as only considering Variables, Properties or Namespaces. For operations, the predicate may also match argument and parameter lists.
The final return stage returns the one successfully selected candidate which is the only possibility for a well-formed conversion. For practical tools a lookup may fail to find a candidate or may find ambiguous candidates and provide helpful diagnostics to the user. %This may use proprietary approaches such as invalid or error objects. %In the example below unsatisfactory candidates are replaced by a null value.
The specification is made more readable if the three stages are wrapped up in helper functions such as \emph{lookupVariable} or \emph{lookupProperty}\footnote{A practical implementation may provide alternative helper implementations that exploit the symmetry of the declarative exposition to search up through the containment hierarchy examining only candidates that satisfy the filtering predicate. This avoids the costs of flowing complete environments down to every AS leaf element where at most one element of the environment is of interest.}.
List \ref{lst:VariableExpCSast} shows the polymorphic \emph{ast()} operation to map \emph{VariableExpCS} to \emph{VariableExp}. The \emph{lookupVariable} helper function is used to discover the appropriate variable to be referenced by \emph{referredVariable}.
\begin{lstlisting}[caption=CS2AS bridge for VariableExpCS to VariableExp, label=lst:VariableExpCSast, language=OCL]
context VariableExpCS
def : ast() : ocl::VariableExp =
let variable = ast().lookupVariable(varName)
in ocl::VariableExp {
name = varName,
referredVariable = variable,
type = if variable = null
then null
else variable.type
%\begin{lstlisting}[caption=Name resolution description for running example, label=lst:exampleNameResodesc, language=OCL]
%-- Producers (Environment computation)
%context OclAny
%def : env() : env::Environment =
% _env(null)
%def : _env(child : OclAny) : env::Environment =
% parentEnv()
%def : parentEnv() : env::Environment =
% let parent = oclContainer()
% in if parent = null
% then env::Environment { }
% else parent._env(self)
% endif
%context VariableExp
%def : _env(child : ocl::OclAny) : env::Environment =
% parentEnv() -- By default, the computed environment is always parentEnv()
% -- (see line 5) so this declaration might be suppressed
%-- Consumers(Lookup computation).
%context OclAny
%def : _lookupVariables(env : env::Environment, vName : String) : OrderedSet(Variable) =
% let foundVs = env.namedElements->selectByKind(Variable)->select(name=vName)
% in if foundVs->isEmpty() and not env.parentEnv = null
% then _lookupVariables(env.parentEnv, vName)
% else foundVs
% endif
%def : _lookupVariable(vName : String) : Variable =
% let foundVs = _lookupVs(env(), vName)
% in if foundVs->isEmpty()
% then null
% else foundVs->first()
% endif
%context VariableExp
%def : lookupVariable(varExpCS : oclcs::VariableExpCS) : Variable =
% if varExpCS.varName = null
% then null
% else _lookupVariable(varExpCS.varName)
% endif
%-- Environment related ops
%context Environment
%def : nestedEnv() : Environment =
% Environment {
% parentEnv = self
% }
%def : addElements(elements : Collection(ocl::NamedElement)) : Environment =
% Environment {
% parentEnv = parentEnv,
% namedElements = namedElements->includingAll(elements)
% }
%def : addElement(element : ocl::NamedElement) : Environment =
% Environment {
% parentEnv = parentEnv,
% namedElements = namedElements->including(element)
% }
%ASBH. This looked like lost paragraphs. Not needed after all
%\textbf{Declarative}: The definition of consumers and producers of named element lookups is declarative.
%\textbf{Top down / Bottom up:} The semi-formal inherited attributes specify a top down environment flow. Our modeled solution uses a similar approach but also supports a more efficient implementation that uses a bottom up search.
%Whereas in the OCL specification the proposed grammar exposes a top down environment computation (in the form of inherited attributes specification), for our internal DSL, the environment computation exposes a bottom up approach. Although declaratively speaking, the approach is irrelevant, in practice the bottom up exposition enables the interpretation or generation of more efficient implementations: in essence, we don't need to carry down the computation of environments along with the entire containment hierarchy (from parents to children); we just need to enable the computation of the environment from the consumer looking for producers above in the containment hierarchy (from children to parents). Lines 2-12 show how an environment is computed by default for an arbitrary element: the bottom line of the algorithm is that, by default, the environment of an element will be the parent (container) element's environment. For a root element (no parent), by default, its environment will comprise an empty list of named elements.
%\textbf{Producer contributions:} In our DSL, a producer just specifies an \emph{childEnv} operation to declare how it contributes named elements to the environment of each of its children, using the environment \emph{addElement/s} operations to add contributions. %, being the argument expression the one to express how the contributions are obtained from the producer. Lines 18-23 shows how a \emph{LetExp} producer contributes a \emph{Variable} element to the environment. The \emph{child} parameter of the \emph{\_env} operation is important, since it represents the child element from which a bottom up lookup is being performed. Although not all \emph{\_env} need it, in this case, it is used for \emph{LetExp} at line 20, because we only want to add a \emph{Variable} to the environment in the case that the lookup is performed from the inner \emph{in} expression.
%\textbf{Nested environments:} As we can find in many programming languages, a common situation when achieving name resolution is the creation of scopes. Scopes allow name producers to contribute names which might already be defined in outer scopes. In OCL, scopes are represented by the own environments, but we need a way to specify when we want to open a new scope, i.e. create a new nested environment. In our example, \emph{LetExp} will create a new lookup scope from its parent environment, so that a \emph{nestedEnv} operation call is used at line 21. The \emph{nestEnv} operation is defined at line 50. This producer contribution definition would let variable declarations occlude variables declared by outer \emph{LetExp}, so that the expression in Listing~\ref{lst:nestedEnvExample} would be valid and would return 4 as a result.
%\begin{lstlisting}[caption=LetExp variables occlude outer variables, label=lst:nestedEnvExample, language=OCL]
%context OclAny
%def : a() : Integer =
% let a = 1
% in let a = 2
% in a + a
%\textbf{Consumer lookups:} \emph{lookup} operations are defined in the context of the consumers for which a lookup needs to be performed (see lines 41-46). Those operations will receive, as an argument, the corresponding CS element from which syntactic information will be retrieved to perform a lookup. In our example, the lookup input corresponds to the String valued \emph{VarExpCS::varName}. The usual lookup inputs will normally be String values, but they can be more complex CS structures such as a PathNameCS (Clause 9.3.7 of \cite{omg2013ocl}) which is used in the OCL specification to perform qualified name lookups.
%The actual lookup is triggered when invoking generic (defined on any OclAny) \emph{\_lookup} operations, which basically consist in computing the environment for the consumer and filtering the resulting list of named elements with the lookup input and the kind of element to look up. In our running example, when looking up \emph{Variable}s, the environment is computed at line 35 and the resulting list of named elements is filtered at line 28.
%\textbf{Bottom up lookup computations:} As the reader might have noted, the \emph{\_lookup} operations are designed to be split into two different operations. The rationale is that although we only compute the environment once, an indeterminate number of nested environments might be created. If the looked up element is not found in the list of named elements of the most deep environment, the search must be (transitively) performed in the list of named elements of its parent environment. In our example, the transitive search is exposed at line 30.
As we commented in the introduction, CS disambiguation is another important concern which needs to be addressed during the CS2AS bridge. To explain the need of disambiguation rules, we consider the simple OCL expression \verb$x.y$.
At first glance, the \emph{'y'} property of the \emph{'x'} variable is accessed using a \textit{property call expression} and a \textit{variable expression}. However \emph{'x'} is not necessarily a variable name. It could be that there is no \emph{'x'} variable. Rather \emph{'x'} may be a property of the implicit source variable, \emph{self}, since the original expression could be a short form for \emph{self.x.y}. Semantic resolution is required to disambiguate the alternatives and arbitrate any conflict.
The OCL specification provides disambiguation rules to 'resolve' grammar ambiguities. Clause 9.1 states : ``Some of the production rules are syntactically ambiguous. For such productions disambiguating rules have been defined. Using these rules, each production and thus the complete grammar becomes nonambiguous.''. Figure~\ref{fig:varExpGrammar} and Figure~\ref{fig:propCallExpGrammar} are extracted from the OCL specification. It can be seen that a \emph{simpleNameCS} with no following \verb$@pre$ matches the \verb$[A]$ production of a \emph{VariableExpCS} and the \verb$[B]$ production of a \emph{PropertyCallExpCS}.
\caption{Partial OCL Specification for VariableExpCS to VariableExp}
% \centering
\caption{Partial OCL Specification for PropertyCallExpCS to PropertyCallExp}
The disambiguation rule for \emph{VariableExpCS} is relatively simple delegating to the \emph{lookup} helper and imposing a constraint that the result must be a \emph{VariableDeclaration}. This is potentially correct, although unfortunately the specification that \emph{VariableDeclaration} is the supertype of \emph{Variable} and \emph{Parameter} is missing.
The disambiguation rule for \emph{PropertyCallExpCS} has some ambiguous wording and many details that do not correspond to the ``In OCL''. This requires intuition by the implementor who may also wish to consider how the rules apply to implicit opposite properties in EMOF.
Both of these disambiguation rules require semantic information which is not available when the syntactic parser requires it. The problem can be avoided by unifying the ambiguous alternatives as unambiguous productions that can be parsed to create a unified CS tree. Once parsing has completed, semantic analysis of the unified CS can resolve the unified elements into their disambiguated forms.
We therefore introduce additional unifying CS elements that can be resolved without semantic information. A unifying \emph{NameExpCS} element replaces \emph{PropertyCallExpCS} and \emph{VariableExpCS}.
Figure~\ref{fig:NameExpCS} shows the new unifying CS element.
%\footnote{Just CS information related to Variable ExpCS [A] and PropertyCallExpCS [B] is depicted} related to an ambiguous name expression.
%For this particular OCL example, the specification needs to introduce a concept such as \emph{NameExpCS} which simply comprises a name.
\begin{lstlisting}[label=lst:NameExpEBNF, language=Xtext]
simpleName isMarkedPreCS?
% \caption{VariableExpCS grammar excerpt}
% \label{fig:VariableExpCS:a}
% \caption{VariableExp metamodel excerpt}
% \label{fig:VariableExpCS:b}
\caption{NameExpCS Grammar and partial CS Metamodel}
%Then, additional disambiguation rules can make use of information from the CS and/or AS, in order to discern which AS element is created from that \emph{NameExpCS} i.e a \emph{VariableExp}, otherwise a \emph{PropertyCallExp}. For instance,
Listing~\ref{lst:nameExpDisambiguation} shows the definition for the CS2AS mapping of a \emph{NameExpCS}, in which the \emph{isAVariableExp()} at line 3 is a call of the operation providing the disambiguation rule. The return selects whether a \emph{NameExpCS} is mapped to a \emph{VariableExp} (lines 5-13), otherwise a \emph{PropertyCallExp} (lines 15-23).
%\item VariableExpCS and PropertyCallExpCS metaclasses are not needed any more. The corresponding VariableExp and PropertyCallExp are obtained from this ambiguous NameExpCS
%\item The ambiguous grammar issue commented before vanishes, since we now have only one grammar rule for NameExpCS. In other words, we have created a CS disambiguation to be resolved in the CS2AS bridge, rather than having an ambiguous grammar.
%NB ASBH. Problem with this definition, is that we need to define a lookupProperty. It is incorrect/incomplete , and we might to get into a new mess. Omitting the then/else expression might an alternative, but I don't like it....
\begin{lstlisting}[caption=CS2AS description for an ambiguous name expression, label=lst:nameExpDisambiguation, language=OCL]
context NameExpCS
def : ast() : ocl::OclExpression =
if isAVariableExp()
let variable = ast().lookupVariable(name)
in ocl::VariableExp {
name = name,
referredVariable = variable,
type = if variable = null
then null
else variable.type
let property = ast().lookupProperty(name)
in ocl::PropertyCallExp {
name = name,
referredProperty = property,
type = if property = null
then null
else property.type
This approach has the benefit of localizing the disambiguation in the \emph{isAVariableExp()} operation, and so making \emph{VariableExpCS} and \emph{PropertyCallExpCS} redundant. The simple two-way disambiguation decision is shown in Listing~\ref{lst:nameExpDisambiguationRule}
\begin{lstlisting}[caption=NameExpCS disambigutation rule, label=lst:nameExpDisambiguationRule, language=OCL]
context NameExpCS
def : isAVariableExp() : Boolean =
let variable = ast().lookupVariable(name)
in variable <> null
Simple choices such as the various forms of \emph{CollectionLiteralPartCS} can be resolved syntactically. Semantic decisions are required for the unified name example above.
%ASBH. Some vagueness here, but let it go
The conflicts between the use of parentheses for template arguments, operation calls and iteration calls can be resolved in the same way but with a more complex semantic decision tree.
%Disambiguation rules might be simple ones which rely on CS information -- syntactic information --, or more complex ones which also involve AS information -- semantic information --. In the introduced example, we need to rely on AS information to conclude if \emph{'x'} corresponds to a \emph{VariableExp} or to a \emph{PropertyCallExp}. Specifically, a lookup using the name comprised by the \emph{NameExpCS} is needed. If we find a \emph{Variable} with that name, we disambiguate \emph{NameExpCS} to a \emph{VariableExp}; otherwise to a \emph{PropertyCallExp}.
%Listing~\ref{lst:nameExpDisambiguationRule} shows the definition of the disambiguation rule for an ambiguous CS element such as \emph{NameExpCS}. The reader should note that we don't consider the case where \emph{'x'} is not neither a variable nor a property of the implicit \emph{self} context variable (an element is not found in the lookup process). We want to provide a declarative and clean CS2AS bridge for the OMG specification. A strict interpretation of the proposed CS2AS bridge for \emph{NameExpCS}, ends up with a \emph{PropertyCallExp} with a null value for the referred property.
%To conclude the explanation of the proposed OCL based internal DSL, we briefly discuss the CS disambiguation rules which let us produce different AS elements from ambiguous CS element. Disambiguation is a concern different to name resolution, but it is similarly used across the CS2AS mappings definition. The OCL-based descriptions can be defined in their own file and therefore comprise the disambiguation rules for the OMG specification.
%As we saw in subsection~\ref{subsec:mappings}, in our running example we identified the \emph{mapsToCollectionItem} operation call as the condition required to disambiguate a \emph{CollectionLiteralPartCS} towards either a \emph{CollectionItem} or a \emph{CollectionRange}. Now, Listing~\ref{lst:CS2ASdisambiguation} shows the definition of that \emph{mapsToCollectionItem}, exposing how the rule to disambiguate a \emph{CollectionLiteralPartCS} will depend on the presence (or not) of the \emph{last} expression.
%\begin{lstlisting}[caption=CS disambiguation rule of the running example, label=lst:CS2ASdisambiguation, language=OCL]
%context CollectionLiteralPartCS
%def : mapsToCollectionItem() : Boolean =
% self.last = null
%As the reader might note, this disambiguation scenario is a trivial one because very little CS information is required to decide if we disambiguate a \emph{CollectionLiteralPartCS} towards either a \emph{CollectionItem} or a \emph{CollectionRange}. Potentially, a more elaborated grammar definition would allow us to prevent any need of a disambiguation rule in this case. However:
%\item The grammar excerpt corresponding to the running example is the one proposed by the specification. Our goal is to provide a flexible DSL that lets us tackle any CS2AS bridging scenario in which we might encounter big CS2AS gaps in favour of more concise grammars (which comprise more ambiguous CS elements when compared with more elaborated grammars).
%\item We can also find more complex disambiguation scenarios in which not only CS (syntactic) information is required, but also AS (semantic) information is needed to disambiguate an ambiguous CS element in a given context. A typical scenario in OCL is when lookups of AS named elements are needed to know if a simple name preceding a \emph{'.'} corresponds to either a variable (hence, VariableExp is the disambiguation result) or to a property of the implicit self variable (hence, PropertyCallExp is the disambiguation result). We can find other related examples in \cite{willink2010oclXtext}.
\section{Related work}
In this section we briefly discuss how the proposed OCL-based CS2AS bridge relates to previous work. To the best of our knowledge there does not exist a DSL approach based on OMG specifications to describe bridges between CS and AS. The Complete OCL document based approach was introduced in \cite{sanchez2014enhancingXtext} and this paper aims to explain the whole approach (i.e. the internal DSL). Recently, OCLT \cite{jouault2015oclt} has been proposed as a functional transformation language to tackle model transformations. Apart from being too novel to be considered in this work, OCLT is not domain specific and it needs additional constructs (e.g. pattern matching) in order to cover more complex transformation scenarios.
We can find languages conceived to sort out the CS2AS bridges in other contexts, i.e in the context of some specific tools. %Although our OCL-based internal DSL is appropriate to be used in OMG specifications, we admit that some related works create more convenient constructs for instance, when defining names resolution, or when dealing with reusable bridges defined on abstract classes in the corresponding CS/AS classes hierarchy.
We highlight two of them:
\textbf{NaBL \cite{konat2013decNameRes} \& Stratego \cite{visser2004stratego}:} These are two separate languages for different purposes used by the Spoofax language workbench \cite{spoofaxOnline}. The former is used to declare name resolution and the latter to declare syntax rewrites (tree based structure transformations). As a main difference with respect to our approach, these languages are completely unrelated: whereas the former is integrated during the parsing activities in order to resolve cross-references when producing the CS tree, the latter is a general purpose program transformation language further used to obtain the potentially different AS tree. In our approach, we integrate the name resolution language into a further CS2AS activity, provided that the parsing activity first produces a CS tree. As it was commented in Section~\ref{subsec:nameReso}, the name lookups are performed on AS elements rather than on CS ones.
\textbf{Gra2Mol \cite{canovas2012gra2mol}:} Gra2Mol is an approach that is closer in objective to the approach presented in this paper. It is a domain specific transformation language conceived to define those bridges, and as our approach does, the name resolution activity is also declared as part of the transformation language. However, whilst their name resolution relies on explicitly specifying a direct search (thus, the name consumer needs to know where the name producer is located in the syntax tree), our approach for specifying name resolution is more declarative based on an independent declaration of name producers and consumer (thus, the name consumer doesn't need to know where the producer is located in the syntax tree). Another difference is that whilst we use OCL as the expression language to express the bridges, they define a structure-shy\footnote{Xpath is an example of this kind of language} query language instead. They claim that the usage of their query language is more compact and less verbose when compared to using OCL expressions. However such languages are not suitable from the point of view of OMG specifications. Besides, we can add that structure-shy languages are more error prone or sensitive to changes in the involved metamodels (metamodel evolution): when having a static typed language such OCL, supporting tools can better assist with metamodel evolution.
\section{Limitations and shortcomings}
From the point of view of the OMG specification, we do not see any limitations of the proposed internal DSL. Having OCL as the host language is a good solution for OMG specifications, because the instances of the DSL can be directly ported to those specifications in order to precisely define the corresponding CS2AS bridges. Likewise, the flexibility and modularity that Complete OCL documents provide has promise in addressing very large CS2AS gaps.
On the other hand, from the final user point of view, i.e the user of the DSL, and specially when comparing with related work, we perceive that having an external DSL fully designed to deal with concepts related to name resolution (e.g. NaBL) or disambiguation may be more convenient. We discuss this further in the next section when talking about future work.
Another shortcoming to mention is that the DSL is based on the concept of shadow type expression, which is not yet part of the OCL specification, although it is planned to be included in the next OCL version (2.5) \cite{brucker2013aachenReport}\footnote{It is cited in the report as type construction expression, Section 3.1}. The number of OCL tools which can currently be used to validate the CS2AS bridges is therefore limited (we are using Eclipse OCL\cite{eclipseOclOnline} which prototypes some proposed OCL 2.5 features).
\section{Ongoing and future work}
Apart from using this OCL-based internal DSL to define CS2AS bridges, we are also producing the Java based source code responsible for obtaining AS models from CS ones. This ongoing work follows the line drawn in the introduction which highlights that the CS2AS internal DSL can be exploited by tool implementers. Although in this paper we are unable to go into further detail, we can point the reader out to some JUnit test cases\footnote{
compiler.tests/src/org/eclipse/qvtd/cs2as/compiler/tests/} working on small examples, which demonstrate that the instances of the CS2AS internal DSL can be transformed to executable code and perform the CS2AS gap resolution of a language.
In terms of future work, we highlight the following.
\item \textbf{Definition of CS2AS bridges for OCL and QVT.} We will apply the proposed OCL-based internal DSL to provide complete CS2AS bridge descriptions for the whole OCL and the three QVT languages. We expect these CS2AS bridge specifications to be included as part of the future OCL and QVT specifications. Likewise, we expect auto-generated code from from these bridge specifications to be used in future releases of the Eclipse OCL and QVTd projects. This should eliminate errors attributable to hand-written conversion source code.
\item \textbf{Incremental CS2AS bridges.} Since generation of code from the declarative CS2AS bridges requires a detailed dependency analysis to identify a valid conversion schedule, we plan to exploit this analysis to synthesize incremental code for use in interactive contexts such as OCL editors. This should improve accuracy and performance dramatically since accurate efficient incremental code is particularly hard to write manually and pessimistic simplifications to improve accuracy are not always sound.
\item \textbf{Creation of an external DSL.} By bringing together the good aspects of other related languages such as NaBL or Gra2Mol, we plan to create an external DSL and with a higher level of abstraction and more concise than the one presented here, to ease even more the creation of those bridges. This external DSL can embed the OCL expressions language, and the supporting tooling can include a code generator to modularly produce the instances of the internal DSL presented in this paper.
\item \textbf{Integration with existing language workbenches.} As added value of the DSL and to provide more proofs about how tool vendors may benefit from it (not covered in this paper), we want to exploit the proposed DSL in the context of a modern language workbench called Xtext.
%\item \textbf{OMG specification generation.} Due to the fact CS2AS bridges
%OMG specifications, comprising textual languages, can be improved by providing DSLs to express how the CS can be bridged to the AS. Although some specifications attempt to define those bridges, for instance, by the means of an attribute grammar formalism, these CS2AS bridge definitions contain errors or are incomplete, introducing the motivation to improve the existing means to define them. We have proposed an OCL-based internal DSL for that purpose, and explained, along with a running example, the different aspects of the language. Given the flexibility, modularity and reuse facilities the OCL host language provides, we have showed how CS2AS mappings, name resolution and CS disambiguation can be described in a declarative, modular and sound way. To conclude, we have mentioned all the potential work that this CS2AS internal DSL can provide, and pointed out some publicly available examples in which this CS2AS bridges were exercised, including ongoing work about the generation of executable source code to perform the CS2AS gap resolution. We claim that those CS2AS bridge descriptions are free of typos, because an existing OCL tool was used to specify them, which will increase the quality of OMG specifications as soon as those descriptions are contributed.
We have introduced a Concrete Syntax to Abstract Syntax bridge that is:
\item \textbf{Sound}. We have shown how intuitive aspects of the current OCL specification are formalized by OCL definitions and faults corrected.
\item \textbf{Executable}. We can use the dependencies behind the OCL definitions to establish an execution schedule.
\item \textbf{Extensible}. We can reuse the formalization of the OCL bridge in a QVT bridge.
Our bridge modularizes and separates the specification concerns:
\item \textbf{Mapping}. An OCL operation hierarchy maps CS artifacts to the AS.
\item \textbf{Name Resolution}. An OCL operation hierarchy flows the visible names down to the point of access.
\item \textbf{Disambiguation}. Unified CS artifacts, plus CS disambiguation rules, avoid the need for semantic resolution within a syntactic parser.
Our bridge is currently ready-to-go; it works on test examples. It will now be applied to replace manual tooling in Eclipse OCL and QVT by tooling generated direct from the potential OCL 2.5 specification models.
We gratefully acknowledge the support of the UK Engineering and Physical Sciences Research Council, via the LSCITS initiative,