plugins/org.eclipse.app4mc.amalthea.model.help/docu/user_timing.textile - app4mc/org.eclipse.app4mc - Git at Google


 h3. Timing in Amalthea Primer

 An Amalthea model per se is a static, structural model of a hardware/software system. The basic structural model consists of software model elements (tasks, runnables), hardware model elements (processing units, memories, connection handlers), stimuli that are used to activate execution, and mappings of software model elements to hardware model elements. Semantics of the model then allows for definite and clear interpretation of the static, structural model regarding its behavior over time.

 h4. Different Levels of Model Detail

 Amalthea provides a meta-model suitable for many different purposes in the design of hardware/software systems. Consequently, there is not _the_ level of Amalthea model detail - modeling often is purpose driven. Regarding timing analysis, we exemplary discuss three different levels of detail to clarify this aspect of Amalthea. Note that we focus on level of detail of the hardware model and assume other parts of the model (software, mapping, etc.) fixed.

 Essential software model elements are runnables, tasks, and data elements (labels and channels). Runnable items of a runnable specifies its runtime behavior. You may specify the runnable items as a directed, acyclic graph (DAG). Amalthea has different categories of runnable items, we focus on the following four:
 * Items that *branch paths* of the DAG (runnable mode switch, runnable probability switch).
 * Items that *signal* or call other parts of the model (custom event trigger, runnable call, etc.).
 * Items that specify *data access* of data elements (channel receive/send, label access).
 * Items that specify *execution time* (ticks, execution needs).

 Tasks may call runnables in a activity graph. Note that a runnable can be called by several tasks. We then map tasks to task schedulers in the operating system model, and we map every task scheduler to one or more processing units of the hardware model. Further, we map data elements to memories of the hardware model.

 This coarse level of hardware model detail - the hardware model consists only of *mapping targets*, without routing or timing for data accesses - may already be sufficient for analysis focusing on event-chains or scheduling.

 As the second example of model detail, we now add access elements to all processing units of the hardware model, *modeling data access latencies or data rates* when accessing data in memories - still ignoring routing and contention. This level of detail is sufficient, for example, for optimizing data placement in microcontrollers using static timing analysis.

 A more detailed hardware model, our third and last example of model detail, will contain information about *data routing and congestion handling*. Therefore, we add connection handlers to the hardware model for every possible contention point, we add ports to the hardware elements, and we add connections between ports. For every access element of the processing units, we add an access path, modeling the access route over different connections and connectionHandlers. The combination of all access elements are able to represent the full address space of a processing unit. This level of detail is well suited for dynamic timing simulation considering arbitration effects for data accesses.

 In the following, we discuss 'timing' of Amalthea in context of level of detail of the last example and discrete-event simulation.

 h4. Discrete-Event Simulation

 For dynamic timing analysis, the _initial_ state of the static system model is the starting point, and a series of _state changes_ is subject to model analysis. The _state_ of a model mainly consist of states of HW elements (processing units and connection handlers). During analysis, a state change is then, for example, a processing unit changing from _idle_ to _execute_.

 When we are interested in the _timing_ of a model, a common way is using discrete-event simulation. In discrete-event simulation, a series of _events_ changes the state of the system and a simulated clock. Such a simulation event is, for instance, a _stimuli_ event for a task to execute on a processing unit, what in turn may change the _state_ of this processing unit from _idle_ to _execute_.

 Note that every event occurs at a specified point in simulated time; for instance, think of a new _stimuli event_ that shall activate a task in 10ms from the current value of the simulated clock. Unprocessed (future) events are stored in an event list. The simulator then continuously processes the event occurring next to the current simulated time, and forwards the simulated clock correspondingly, thereby removing the event from the event list. Note that this is a very simplified description. For instance, multiple events at the same point in simulated time are possible. Processing events may lead to generation of new events. For instance, processing an end _task execution event_ may lead to a new _stimuli event_ that shall occur 5ms from the current value of the simulation clock added to the event list.

 After sketching the basic idea of discrete-event simulation, we can be more precise with the term *Amalthea timing*: we call the trace of state changes and events over time the _dynamic behavior_ or simply the _timing_ of the Amalthea model. This timing of the model than may be further analyzed, for instance, regarding timing constraints.

 Stimulation of task execution with corresponding stimuli events, and scheduling in general, is not further discussed here. In the following, we focus on timing of execution at processing units and data accesses.

 h4. Execution Time

 The basic mechanism to specify execution time at a processing unit is the modeling element _ticks_. _Ticks_ are a generic abstraction of time for software, independent of hardware. Regarding hardware, one may think of _ticks_ as clock ticks or cycles of a processing unit. You can specify _ticks_ at several places in the model, most prominently as a runnable item of a runnable. The _ticks_ value together with a mapping to a specific processing unit that has a defined frequency then allows computation of an _execution time_. For instance, 100 ticks require 62.5ns simulated time at a processing unit with 1.6GHz, while 100 ticks require 41.6ns at a processing unit with 2.4GHz.

 When the discrete-event simulator simulates execution of a runnable at a processing unit, it actually processes runnable items and translates their semantics into simulation events. We already discussed the runnable item _ticks_: when _ticks_ are processed, we compute a corresponding simulation time value based on the executing processing unit's frequency, and store a simulation event in the list of simulation events for when the execution will finish.

 In that sense, ticks translate into a fixed or *static timing behavior* - when execution starts, it is always clear when this execution will end. Note that the current version of Amalthea (0.9.3) also prepares an additional concept for specification of execution timing besides using _ticks_: _execution needs_. Execution needs will allow sophisticated ways of execution time specification, as required for heterogeneous systems. _Execution needs_ define the number of usages of user-defined needs; a later version of Amalthea (> 0.9.3) then will introduce _recipes_ that translate such execution needs into ticks, taking _hardware features_ of the executing processing unit into account. Note that, by definition, a sound model for timing simulation always allows to compute ticks from execution needs. Consequently, for timing analysis using discrete-event simulation as described above, we first translate execution needs into ticks, resulting in a model we call _ticks-only model_. Thus, we can ignore _execution needs_ for timing analysis.

 h4. Data Accesses

 For data accesses, in contrast to ticks, the duration in simulation time is not always clear. A single data access may result in series of simulation events. Occurrence of these events in simulation time depends on several other model elements, for example, access paths, mapping of data to memory, and state of connection handlers. Thus, a data access may result in *dynamic timing behavior*. Note that there are plenty of options in Amalthea for hardware modeling; consequently, options for modeling and simulation of data accesses are manifold (see above _Different Levels of Model Detail_). In the following, we discuss some modeling patterns for data accesses.

 Consider a hardware model consisting of two processing units, a connection handler, and a single memory. We add a read and a write latency to all connections and the connection handler. Additionally, we add an access latency to the memory. There are no read or write latency added to access elements of the processing units. Only access paths specify the routes from the specific processing unit to the memory across the connection handler, see the following screenshot.

 !(scale)../pictures/user_timing_model_example.png!

 As described in the beginning of this section, a single data access may result in a series of events. Expected simulation behavior is as follows: When the discrete-event simulator encounters a runnable item for a data read access at CPU0, we add an event for one tick later to the event queue of the simulator, denoting passing the connection from CPU0 to the connection handler MyConnectionHandler. (For simplicity, we do not use time durations calculated from the CPU0's frequency here, what would be required to determine the correct point in simulated time for the event). After passing the connection, the state of MyConnectionHandler is relevant for the next event: Either MyConnectionHandler is occupied by a data access (from CPU1), a data access arrives at the same time, or MyConnectionHandler is available.
 * If MyConnectionHandler is available, we add an event for three ticks later-this is the read latency of the connection handler.

 * If a data access from CPU1 arrives at the same time, the connection handler handles this data access based on the selected arbitration mechanism (attribute not shown in the screen shot - we assume priority based with a higher priority for CPU1). We add an event for three ticks later, when MyConnectionHandler is available again. We then check again for data accesses from CPU1 at the same time and set a corresponding event.

 * If MyConnectionHandler is occupied, we add an event for whenever the connection handler is not occupied anymore. We then check for data accesses from CPU1 at the same time and set a corresponding event.

 Eventually, MyConnectionHandler may handle the read access from CPU0, and when the simulator reacts on the corresponding event, we add an event for one tick later, as this is the read latency for the connection between the connection handler and the memory. The final event for this read data access from CPU0 results from the access latency of the memory (twelve ticks). When the simulator reacts on that final event, the read access from CPU0 is completed and the simulator can handle the next runnable item at CPU0, if available.

 Note that in the above example there is only one contention point. We thus could reduce the number of events for an optimized simulation. Further, note that we ignore size of data elements in this example. We could use *data rates* instead of latencies at connections, the connection handler, and the memory to respect data sizes in timing simulation or work with the bit width of ports. Furthermore beside constant latency values it is also possible to use distributions, in this case the simulator role the dice regarding the distribution. At a last note, adding to the discussion of different detail levels: Depending on use-case, modeling purpose, and timing analysis tool, there may be best practice defined for modeling. For instance, one tool may rely on data rates, while other tools require latencies but only at memories and connection handlers, not connections. Tool specific model transformations and validation rules should handle and define such restrictions.

	h3. Timing in Amalthea Primer

	An Amalthea model per se is a static, structural model of a hardware/software system. The basic structural model consists of software model elements (tasks, runnables), hardware model elements (processing units, memories, connection handlers), stimuli that are used to activate execution, and mappings of software model elements to hardware model elements. Semantics of the model then allows for definite and clear interpretation of the static, structural model regarding its behavior over time.

	h4. Different Levels of Model Detail

	Amalthea provides a meta-model suitable for many different purposes in the design of hardware/software systems. Consequently, there is not _the_ level of Amalthea model detail - modeling often is purpose driven. Regarding timing analysis, we exemplary discuss three different levels of detail to clarify this aspect of Amalthea. Note that we focus on level of detail of the hardware model and assume other parts of the model (software, mapping, etc.) fixed.

	Essential software model elements are runnables, tasks, and data elements (labels and channels). Runnable items of a runnable specifies its runtime behavior. You may specify the runnable items as a directed, acyclic graph (DAG). Amalthea has different categories of runnable items, we focus on the following four:
	* Items that branch paths of the DAG (runnable mode switch, runnable probability switch).
	* Items that signal or call other parts of the model (custom event trigger, runnable call, etc.).
	* Items that specify data access of data elements (channel receive/send, label access).
	* Items that specify execution time (ticks, execution needs).

	Tasks may call runnables in a activity graph. Note that a runnable can be called by several tasks. We then map tasks to task schedulers in the operating system model, and we map every task scheduler to one or more processing units of the hardware model. Further, we map data elements to memories of the hardware model.

	This coarse level of hardware model detail - the hardware model consists only of mapping targets, without routing or timing for data accesses - may already be sufficient for analysis focusing on event-chains or scheduling.

	As the second example of model detail, we now add access elements to all processing units of the hardware model, modeling data access latencies or data rates when accessing data in memories - still ignoring routing and contention. This level of detail is sufficient, for example, for optimizing data placement in microcontrollers using static timing analysis.

	A more detailed hardware model, our third and last example of model detail, will contain information about data routing and congestion handling. Therefore, we add connection handlers to the hardware model for every possible contention point, we add ports to the hardware elements, and we add connections between ports. For every access element of the processing units, we add an access path, modeling the access route over different connections and connectionHandlers. The combination of all access elements are able to represent the full address space of a processing unit. This level of detail is well suited for dynamic timing simulation considering arbitration effects for data accesses.

	In the following, we discuss 'timing' of Amalthea in context of level of detail of the last example and discrete-event simulation.

	h4. Discrete-Event Simulation

	For dynamic timing analysis, the _initial_ state of the static system model is the starting point, and a series of _state changes_ is subject to model analysis. The _state_ of a model mainly consist of states of HW elements (processing units and connection handlers). During analysis, a state change is then, for example, a processing unit changing from _idle_ to _execute_.

	When we are interested in the _timing_ of a model, a common way is using discrete-event simulation. In discrete-event simulation, a series of _events_ changes the state of the system and a simulated clock. Such a simulation event is, for instance, a _stimuli_ event for a task to execute on a processing unit, what in turn may change the _state_ of this processing unit from _idle_ to _execute_.

	Note that every event occurs at a specified point in simulated time; for instance, think of a new _stimuli event_ that shall activate a task in 10ms from the current value of the simulated clock. Unprocessed (future) events are stored in an event list. The simulator then continuously processes the event occurring next to the current simulated time, and forwards the simulated clock correspondingly, thereby removing the event from the event list. Note that this is a very simplified description. For instance, multiple events at the same point in simulated time are possible. Processing events may lead to generation of new events. For instance, processing an end _task execution event_ may lead to a new _stimuli event_ that shall occur 5ms from the current value of the simulation clock added to the event list.

	After sketching the basic idea of discrete-event simulation, we can be more precise with the term Amalthea timing: we call the trace of state changes and events over time the _dynamic behavior_ or simply the _timing_ of the Amalthea model. This timing of the model than may be further analyzed, for instance, regarding timing constraints.

	Stimulation of task execution with corresponding stimuli events, and scheduling in general, is not further discussed here. In the following, we focus on timing of execution at processing units and data accesses.

	h4. Execution Time

	The basic mechanism to specify execution time at a processing unit is the modeling element _ticks_. _Ticks_ are a generic abstraction of time for software, independent of hardware. Regarding hardware, one may think of _ticks_ as clock ticks or cycles of a processing unit. You can specify _ticks_ at several places in the model, most prominently as a runnable item of a runnable. The _ticks_ value together with a mapping to a specific processing unit that has a defined frequency then allows computation of an _execution time_. For instance, 100 ticks require 62.5ns simulated time at a processing unit with 1.6GHz, while 100 ticks require 41.6ns at a processing unit with 2.4GHz.

	When the discrete-event simulator simulates execution of a runnable at a processing unit, it actually processes runnable items and translates their semantics into simulation events. We already discussed the runnable item _ticks_: when _ticks_ are processed, we compute a corresponding simulation time value based on the executing processing unit's frequency, and store a simulation event in the list of simulation events for when the execution will finish.

	In that sense, ticks translate into a fixed or static timing behavior - when execution starts, it is always clear when this execution will end. Note that the current version of Amalthea (0.9.3) also prepares an additional concept for specification of execution timing besides using _ticks_: _execution needs_. Execution needs will allow sophisticated ways of execution time specification, as required for heterogeneous systems. _Execution needs_ define the number of usages of user-defined needs; a later version of Amalthea (> 0.9.3) then will introduce _recipes_ that translate such execution needs into ticks, taking _hardware features_ of the executing processing unit into account. Note that, by definition, a sound model for timing simulation always allows to compute ticks from execution needs. Consequently, for timing analysis using discrete-event simulation as described above, we first translate execution needs into ticks, resulting in a model we call _ticks-only model_. Thus, we can ignore _execution needs_ for timing analysis.

	h4. Data Accesses

	For data accesses, in contrast to ticks, the duration in simulation time is not always clear. A single data access may result in series of simulation events. Occurrence of these events in simulation time depends on several other model elements, for example, access paths, mapping of data to memory, and state of connection handlers. Thus, a data access may result in dynamic timing behavior. Note that there are plenty of options in Amalthea for hardware modeling; consequently, options for modeling and simulation of data accesses are manifold (see above _Different Levels of Model Detail_). In the following, we discuss some modeling patterns for data accesses.

	Consider a hardware model consisting of two processing units, a connection handler, and a single memory. We add a read and a write latency to all connections and the connection handler. Additionally, we add an access latency to the memory. There are no read or write latency added to access elements of the processing units. Only access paths specify the routes from the specific processing unit to the memory across the connection handler, see the following screenshot.

	!(scale)../pictures/user_timing_model_example.png!

	As described in the beginning of this section, a single data access may result in a series of events. Expected simulation behavior is as follows: When the discrete-event simulator encounters a runnable item for a data read access at CPU0, we add an event for one tick later to the event queue of the simulator, denoting passing the connection from CPU0 to the connection handler MyConnectionHandler. (For simplicity, we do not use time durations calculated from the CPU0's frequency here, what would be required to determine the correct point in simulated time for the event). After passing the connection, the state of MyConnectionHandler is relevant for the next event: Either MyConnectionHandler is occupied by a data access (from CPU1), a data access arrives at the same time, or MyConnectionHandler is available.
	* If MyConnectionHandler is available, we add an event for three ticks later-this is the read latency of the connection handler.

	* If a data access from CPU1 arrives at the same time, the connection handler handles this data access based on the selected arbitration mechanism (attribute not shown in the screen shot - we assume priority based with a higher priority for CPU1). We add an event for three ticks later, when MyConnectionHandler is available again. We then check again for data accesses from CPU1 at the same time and set a corresponding event.

	* If MyConnectionHandler is occupied, we add an event for whenever the connection handler is not occupied anymore. We then check for data accesses from CPU1 at the same time and set a corresponding event.

	Eventually, MyConnectionHandler may handle the read access from CPU0, and when the simulator reacts on the corresponding event, we add an event for one tick later, as this is the read latency for the connection between the connection handler and the memory. The final event for this read data access from CPU0 results from the access latency of the memory (twelve ticks). When the simulator reacts on that final event, the read access from CPU0 is completed and the simulator can handle the next runnable item at CPU0, if available.

	Note that in the above example there is only one contention point. We thus could reduce the number of events for an optimized simulation. Further, note that we ignore size of data elements in this example. We could use data rates instead of latencies at connections, the connection handler, and the memory to respect data sizes in timing simulation or work with the bit width of ports. Furthermore beside constant latency values it is also possible to use distributions, in this case the simulator role the dice regarding the distribution. At a last note, adding to the discussion of different detail levels: Depending on use-case, modeling purpose, and timing analysis tool, there may be best practice defined for modeling. For instance, one tool may rely on data rates, while other tools require latencies but only at memories and connection handlers, not connections. Tool specific model transformations and validation rules should handle and define such restrictions.