blob: fd2df39eba6fcd414cf8b7f042edb1bd7bb49d5d [file] [log] [blame]
h1. Transforming A Large Source Model with ETL
This project provides a performance test case for ETL. The models and metamodels are taken from the "Model Transformations for Program Understanding: A Reengineering Challenge" case study from the "2011 edition of the TTC workshop":http://planet-mde.org/ttc2011/. (All of the resources from the TTC case are included under the *case* folder). For the large model, Epsilon is about 10 times slower than the (GReTL) reference solution (*case/solution/ExtractStateMachines.gretl*).
The ETL solution, a transformation called @Java2StateMachine.etl@, consumes a model of a Java program (conforming to *case/jamopp/java.ecore*) and produces a StateMachine model (conforming to *case/statemachine/StateMachine.ecore*). The transformation utilises conventions in the Java code to construct the state machine. For example, an instance of @State@ is created for every non-abstract @Class@ that is a subtype of the @Class@ with name @"State"@. A detailed discussion of all the Java code conventions can be found in the "case description":case/doc/ttc2011-reengineering-case.pdf.
h2. Running the transformation
The *run.xml* Ant script provides a simple way to load the models, run the transformation and dispose the models. To aid profiling the ETL solution, timestamps are emitted around each of these three phases. Executing *run.xml* will produce a state machine, *StateMachine.model* that is equivalent to the reference state machine solution from the case (*case/statemachine/reference.xmi*).
h2. Current findings wrt ETL performance
12/05/2011 (Louis)
I was using:
* SVN revision 1410 of Epsilon (pre 0.9.1)
* Eclipse Helios
* java -version "1.6.0_24" (Mac OS)
* 2.4GHz Intel Core 2 Duo, 4GB 1067 MHz DDR3 RAM, Mac OS X 10.6.7
The large model takes approximately 4 minutes to transform:
* ~14 seconds to load the model,
* ~23 seconds to transform the model
* ~3 minutes to dispose the model.
The GReTL solution *case/solution/ExtractStateMachines.gretl* runs in about 2 seconds, according to "this forum post":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=30&Itemid=150, and the "GrGen.NET solution runs roughly 10 times quicker than the GReTL solution":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=36&Itemid=150.
h3. Observation 1: Disposing the (large) source model takes about 10 times longer than loading it
Disposal seems to take a long while due to @EmfModelResourceFactory@'s call to @Resource#unload()@. This method "proxifies" all of the model elements in the resource and removes them from memory. Replacing the call to @Resource#unload()@ with a call to @Resource#getResourceSet()#getResources()#remove()@ drastically reduces the amount of time require to dispose the models (from ~3 minutes to ~30 milliseconds), but I'm not sure whether this is truly freeing up the memory occupied by the models. I've included a patch for this change to @EmfModelResourceFactory@ in *patches/FasterDispose.patch*.
h3. Observation 2: Using a separate @IdentifierReference@ -> @Transition@ rule is far slower than inlining it
Using separate rules for @Class@ -> @State@ and @IdentifierReference@ -> @Transition@ seems to slow down the transformation significantly (see @Java2StateMachine-SeparateRules.etl@). I think this is because:
* There are significantly more instances of @IdentifierReference@ (198,141) than @Class@ (4,586) in the large model, and we only need to transform those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes). Using separate rules for transforming @Class@ and @IdentifierReference@ causes ETL to check whether each of the 198,141 instances of @IdentifierReference@ should be transformed to a @Transition@. Combining the rules causes ETL to only consider transforming those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes).
* Annotating the @IdentifierRule@ with @lazy does not seem to improve matters, possibly because ETL invokes @DefaultTransformationStrategy@ (rather than @FastTransformationStrategy@) for transformations that contain any rule annotated with @lazy.
h2. Profiling
I found the Epsilon profiling tool very useful for investigating the performance of each operation in the transformation. In fact, I hacked @EolOperation@ so that every operation execution was included in the profiling results. The patch for this functionality is in *patches/AutomaticallyProfileOperations.patch*.