performance/TransformLargeSourceModel/README.textile - epsilon/org.eclipse.epsilon - Git at Google

 h1. Transforming A Large Source Model with ETL

 This project provides a performance test case for ETL. The models and metamodels are taken from the "Model Transformations for Program Understanding: A Reengineering Challenge" case study from the "2011 edition of the TTC workshop":http://planet-mde.org/ttc2011/. (All of the resources from the TTC case are included under the *case* folder). For the large model, Epsilon is about 10 times slower than the (GReTL) reference solution (*case/solution/ExtractStateMachines.gretl*).

 The ETL solution, a transformation called @Java2StateMachine.etl@, consumes a model of a Java program (conforming to *case/jamopp/java.ecore*) and produces a StateMachine model (conforming to *case/statemachine/StateMachine.ecore*). The transformation utilises conventions in the Java code to construct the state machine. For example, an instance of @State@ is created for every non-abstract @Class@ that is a subtype of the @Class@ with name @"State"@. A detailed discussion of all the Java code conventions can be found in the "case description":case/doc/ttc2011-reengineering-case.pdf.


 h2. Running the transformation

 The *run.xml* Ant script provides a simple way to load the models, run the transformation and dispose the models. To aid profiling the ETL solution, timestamps are emitted around each of these three phases. Executing *run.xml* will produce a state machine, *StateMachine.model* that is equivalent to the reference state machine solution from the case (*case/statemachine/reference.xmi*).


 h2. Current findings wrt ETL performance

 12/05/2011 (Louis)

 I was using:

 * SVN revision 1410 of Epsilon (pre 0.9.1)
 * Eclipse Helios
 * java -version "1.6.0_24" (Mac OS)
 * 2.4GHz Intel Core 2 Duo, 4GB 1067 MHz DDR3 RAM, Mac OS X 10.6.7

 The large model takes approximately 4 minutes to transform:

 * ~14 seconds to load the model,
 * ~23 seconds to transform the model
 * ~3 minutes to dispose the model.

 The GReTL solution *case/solution/ExtractStateMachines.gretl* runs in about 2 seconds, according to "this forum post":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=30&Itemid=150, and the "GrGen.NET solution runs roughly 10 times quicker than the GReTL solution":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=36&Itemid=150.

 h3. Observation 1: Disposing the (large) source model takes about 10 times longer than loading it

 Disposal seems to take a long while due to @EmfModelResourceFactory@'s call to @Resource#unload()@. This method "proxifies" all of the model elements in the resource and removes them from memory. Replacing the call to @Resource#unload()@ with a call to @Resource#getResourceSet()#getResources()#remove()@ drastically reduces the amount of time require to dispose the models (from ~3 minutes to ~30 milliseconds), but I'm not sure whether this is truly freeing up the memory occupied by the models. I've included a patch for this change to @EmfModelResourceFactory@ in *patches/FasterDispose.patch*.


 h3. Observation 2: Using a separate @IdentifierReference@ -> @Transition@ rule is far slower than inlining it

 Using separate rules for @Class@ -> @State@ and @IdentifierReference@ -> @Transition@ seems to slow down the transformation significantly (see @Java2StateMachine-SeparateRules.etl@). I think this is because:
 * There are significantly more instances of @IdentifierReference@ (198,141) than @Class@ (4,586) in the large model, and we only need to transform those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes). Using separate rules for transforming @Class@ and @IdentifierReference@ causes ETL to check whether each of the 198,141 instances of @IdentifierReference@ should be transformed to a @Transition@. Combining the rules causes ETL to only consider transforming those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes).
 * Annotating the @IdentifierRule@ with @lazy does not seem to improve matters, possibly because ETL invokes @DefaultTransformationStrategy@ (rather than @FastTransformationStrategy@) for transformations that contain any rule annotated with @lazy.


 h2. Profiling

 I found the Epsilon profiling tool very useful for investigating the performance of each operation in the transformation. In fact, I hacked @EolOperation@ so that every operation execution was included in the profiling results. The patch for this functionality is in *patches/AutomaticallyProfileOperations.patch*.
	h1. Transforming A Large Source Model with ETL

	This project provides a performance test case for ETL. The models and metamodels are taken from the "Model Transformations for Program Understanding: A Reengineering Challenge" case study from the "2011 edition of the TTC workshop":http://planet-mde.org/ttc2011/. (All of the resources from the TTC case are included under the case folder). For the large model, Epsilon is about 10 times slower than the (GReTL) reference solution (case/solution/ExtractStateMachines.gretl).

	The ETL solution, a transformation called @Java2StateMachine.etl@, consumes a model of a Java program (conforming to case/jamopp/java.ecore) and produces a StateMachine model (conforming to case/statemachine/StateMachine.ecore). The transformation utilises conventions in the Java code to construct the state machine. For example, an instance of @State@ is created for every non-abstract @Class@ that is a subtype of the @Class@ with name @"State"@. A detailed discussion of all the Java code conventions can be found in the "case description":case/doc/ttc2011-reengineering-case.pdf.


	h2. Running the transformation

	The run.xml Ant script provides a simple way to load the models, run the transformation and dispose the models. To aid profiling the ETL solution, timestamps are emitted around each of these three phases. Executing run.xml will produce a state machine, StateMachine.model that is equivalent to the reference state machine solution from the case (case/statemachine/reference.xmi).


	h2. Current findings wrt ETL performance

	12/05/2011 (Louis)

	I was using:

	* SVN revision 1410 of Epsilon (pre 0.9.1)
	* Eclipse Helios
	* java -version "1.6.0_24" (Mac OS)
	* 2.4GHz Intel Core 2 Duo, 4GB 1067 MHz DDR3 RAM, Mac OS X 10.6.7

	The large model takes approximately 4 minutes to transform:

	* ~14 seconds to load the model,
	* ~23 seconds to transform the model
	* ~3 minutes to dispose the model.

	The GReTL solution case/solution/ExtractStateMachines.gretl runs in about 2 seconds, according to "this forum post":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=30&Itemid=150, and the "GrGen.NET solution runs roughly 10 times quicker than the GReTL solution":http://planet-research20.org/ttc2011/index.php?option=com_community&view=groups&task=viewdiscussion&groupid=10&topicid=36&Itemid=150.

	h3. Observation 1: Disposing the (large) source model takes about 10 times longer than loading it

	Disposal seems to take a long while due to @EmfModelResourceFactory@'s call to @Resource#unload()@. This method "proxifies" all of the model elements in the resource and removes them from memory. Replacing the call to @Resource#unload()@ with a call to @Resource#getResourceSet()#getResources()#remove()@ drastically reduces the amount of time require to dispose the models (from ~3 minutes to ~30 milliseconds), but I'm not sure whether this is truly freeing up the memory occupied by the models. I've included a patch for this change to @EmfModelResourceFactory@ in patches/FasterDispose.patch.


	h3. Observation 2: Using a separate @IdentifierReference@ -> @Transition@ rule is far slower than inlining it

	Using separate rules for @Class@ -> @State@ and @IdentifierReference@ -> @Transition@ seems to slow down the transformation significantly (see @Java2StateMachine-SeparateRules.etl@). I think this is because:
	* There are significantly more instances of @IdentifierReference@ (198,141) than @Class@ (4,586) in the large model, and we only need to transform those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes). Using separate rules for transforming @Class@ and @IdentifierReference@ causes ETL to check whether each of the 198,141 instances of @IdentifierReference@ should be transformed to a @Transition@. Combining the rules causes ETL to only consider transforming those instances of @IdentifierReference@ that are contained in a @Class@ which is a @State@ (only 11 of the 4586 classes).
	* Annotating the @IdentifierRule@ with @lazy does not seem to improve matters, possibly because ETL invokes @DefaultTransformationStrategy@ (rather than @FastTransformationStrategy@) for transformations that contain any rule annotated with @lazy.


	h2. Profiling

	I found the Epsilon profiling tool very useful for investigating the performance of each operation in the transformation. In fact, I hacked @EolOperation@ so that every operation execution was included in the profiling results. The patch for this functionality is in patches/AutomaticallyProfileOperations.patch.