blob: 600eb5f16062e2312616c960ab51a935a6d4fdf9 [file] [log] [blame]
<!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>ptp design</title> <link rel="stylesheet" href="../../default_style.css" type="text/css"> </head> <body text="#000000" bgcolor="#ffffff" link="#0000ee" vlink="#551a8b" alink="#ff0000"> <!-- Document Header --> <table cellpadding="2" width="100%" border="0"> <tbody> <tr> <td align="left" width="72%"> <font class="indextop">ptp design document</font><br> <font class="indexsub">parallel tools platform subproject</font><span class="indexsub"><br> </span></td> <td width="28%"><img height="86" src="../../images/Idea.jpg" width="120" alt=""> </td> </tr> </tbody> </table> <table width="331"> <tbody> <tr> <td width="100">Author</td> <td width="253"> : <a href="">Greg Watson </a></td> </tr> <tr> <td>Revision Date</td> <td> : 8 April 2005 - Version: 0.2.0 </td> </tr> <tr> <td>Change History</td> <td> : 0.1.0 - Document Creation</td> </tr> <tr> <td>&nbsp;</td> <td>: 0.2.0 - Update references </td> </tr> </tbody> </table> <br> <div class="section">Table of Contents </div> <br> <div class="indent"><a href="#OVR">1. Overview</a></div> <div class="indent"><a href="#PEE">2. Parallel Execution Environment </a> <div class="indent"><a href="#2.1">2.1 Parallel Runtime Model </a></div> <div class="indent"><a href="#2.2">2.2 Parallel Runtime Controller </a> <div class="indent"><a href="#2.2.1">2.2.1 Parallel Runtime API </a></div> </div> <div class="indent"><a href="#2.3">2.3 Parallel User Interface </a></div> <div class="indent"><a href="#2.4">2.4 Parallel Launch Configuration</a></div> </div> <div class="indent"> <div class="indent"><a href="#2.5">2.5 Future Work </a> <div class="indent"><a href="#2.5.1">2.5.1 Pre/Post Execution Data File Management</a></div> <div class="indent"><a href="#2.5.2">2.5.2 Remote Build/Launch Capability</a></div> </div> <a href="#PDB">3. Parallel Debugger</a> <div class="indent"><a href="#3.1">3.1 Parallel Debug Model </a></div> <div class="indent"><a href="#3.2">3.2 Parallel Debug User Interface </a></div> <div class="indent"><a href="#3.3">3.3 Parallel Debug Controller </a> <div class="indent"><a href="#3.3.1">3.3.1 Parallel Debug API </a></div> </div> <div class="indent"><a href="#3.4">3.4 Scalable Debug Manager </a></div> <div class="indent"><a href="#3.5">3.5 Debug Data Model </a></div> <div class="indent"><a href="#3.6">3.6 Future Work </a></div> <a href="#PDB"> </a></div> <div class="indent"><a href="#PTI">4. Tool Integration </a></div> <div class="indent"> <div class="indent"><a href="#4.1">4.1 Parallel Runtime Services </a></div> <div class="indent"><a href="#4.2">4.2 Parallel Debug Services</a></div> <div class="indent"><a href="#4.3">4.3 Parallel User Interface Components </a></div> <div class="indent"><a href="#4.4">4.4 Future Work </a></div> <a href="#EUS">5. End-User Support</a></div> <div class="indent"><a href="#FDT">6. Fortran Development Tools </a> <div class="indent"><a href="#6.1">6.1 Future Work </a></div> </div> <br> <div class="section"><a name="OVR"></a>1. Overview </div> <p>The Parallel Tools Platform (PTP) is a portable, scalable, standards-based integrated development environment specifically suited for application development for parallel computer architectures. The PTP combines existing functionality in the Eclipse Platform, the C/C++ Development Tools, new services specifically designed to interface with parallel computing systems, and new Fortran language support, to enable the development of parallel programs suitable for a range of scientific, engineering and commercial applications. </p> <p>This document describes the major design elements of the Parallel Tools Platform and outlines the major objectives for achieving the first release of software. </p> <div class="section"><a name="PEE" id="PEE">2. Parallel Execution Environment </a></div> <p>The parallel execution environment provides the interface between Eclipse and a parallel runtime system that is used to execute programs on a parallel machine. Unlike a normal sequential runtime, launching a parallel program is complicated by the number of different parallel architectures, each with its own specialized commands for managing the execution of programs. Although there is some standardization in the way to write parallel codes (such as MPI), there is little standardization in how to launch, control and interact with a parallel program. To further complicate matters, many parallel systems employ some form of resource allocation system, such as a job scheduler. In many cases, execution of a parallel program must be managed by the resource allocation system, rather than by direct invocation by the user.</p> <p>Due to the complex nature of interfacing with many different parallel runtime systems, our plan is to use an abstract parallel model within the Eclipse environment and provide a <em>single</em> parallel runtime interface to the outside world. Architecture specific details of the parallel system(s) will then be managed by a middleware layer that will provide a common set of services for interacting with arbitrary parallel runtime and resource allocation systems. This middleware layer will initially use Open RTE (a separate, and independent component of <a href="">Open MPI</a>). If necessary, other middleware components can be added at a later time. Open RTE has planned support for a wide range of legacy message passing systems, and can also be used for parallel programs that use the shared memory model.</p> <p>The following diagram shows the proposed architecture.</p> <p align="center"><img src="images/runtime.png" width="406" height="519"></p> <p align="left">The execution environment comprises five main components: abstract parallel model, parallel runtime controller, parallel runtime API, parallel user interface, and parallel launch configuration. Each of these is discussed in more detail below. </p> <div class="subsection"> <a name="2.1" id="2.1"></a>2.1 Parallel Runtime Model</div> <p>Unlike execution support for sequential languages, the parallel execution environment must maintain an internal model that represents the state of external components, such as parallel machines, resource allocation systems, and the executing program themselves. The following is a conceptual diagram of the model.</p> <p align="center"><img src="images/model2.png" width="512" height="353"> </p> <p align="left">The <em>universe</em> is the top most object for managing the execution environment. There can be any number of <em>machines</em> and <em>jobs </em>in a universe. Each machine is composed of an arbitrary number of <em>nodes</em>. A node is where computation is undertaken, and may be a remote system in the case of a distributed memory architecture, or a local processor in the case of an SMP machine. A job is a unit of work that is suitable for a resource allocation system. Once a job is scheduled for execution, it causes <em>processes</em> to be started on the appropriate nodes of one or more machines. A process is an instruction stream that performs some computation. The job then provides a reference point for each process that is participating in the parallel execution. Machines, nodes, processes, and jobs all have attributes that reflect the status of the particular component. </p> <div class="subsection"><a name="2.2" id="2.2"></a>2.2 Parallel Runtime Controller </div> <p>The parallel runtime controller is responsible for controlling interaction between the parallel tools platform components and the external runtime. It provides services to support the following actions:</p> <ul> <li>starting and stopping the external runtime interface</li> <li>reconnection to existing parallel jobs </li> <li>launching a parallel job</li> <li>terminating a parallel job </li> <li>notification of status changes</li> <li>managing I/O streams</li> </ul> <p>Communication between the runtime controller and the external parallel runtime systems is via the parallel runtime API. </p> <div class="subsection"> <a name="2.2.1" id="2.2.1"></a>2.2.1 Parallel Runtime API </div> <p align="left">The parallel runtime API provides a generic interface to external parallel runtime systems. The services supported by the API include:</p> <ul> <li>remote process management</li> <li>event notification</li> <li>job connection/reconnection</li> <li>remote I/O management</li> <li>resource management</li> <li>state of health services</li> </ul> <p>These services are provided using an architecture neutral API that vastly simplifies the interface between Eclipse and the many parallel runtime systems. Communication between Eclipse and the external runtime using this API employs Java Native Interface calls to a C library. This leads to a clean, efficient API that is suitable for use with other runtime systems in the future. The initial version of the API will provide support for Open RTE. </p> <div class="subsection"> <a name="2.3" id="2.3"></a>2.3 Parallel User Interface</div> <p align="left">Another significant difference between runtime support for sequential and parallel programs, is the need to be able to monitor the status of machine during program execution. In addition, where a resource allocation system is employed, a visual indication of the status of jobs as they progress through the queues is required. To accomplish this, our intention is to develop a number of user interface elements that, by utilizing services supplied by the middleware layer, will provide the user with the ability to monitor and control system and job status. A prototype user interface element for displaying machine, node and process status information is shown below.</p> <blockquote> <table width="64%" border="0" align="center" cellpadding="5" cellspacing="5"> <tr> <td height="528"><img src="images/node_status.png" width="312" height="528"></td> <td align="left" valign="top"><div align="left"><img src="images/legend.png" width="191" height="340"></div></td> </tr> </table> </blockquote> <p>User interface elements that are required for the execution environment are:</p> <ul> <li>machine, node and process status</li> <li>job status</li> <li>resource allocation status</li> <li>launch configuration</li> <li>process details</li> <li>process standard I/O</li> <li><em>others?</em></li> </ul> <div class="subsection"> <a name="2.4" id="2.4"></a>2.4 Parallel Launch Configuration</div> <p>The parallel launch configuration uses the launch framework to manage the execution of a parallel program. The parallel launch configuration allows the user to specify the resource requirements necessary for correct execution of the program. This might include resources such as the number of processes, the type of network to use for interprocess communication, the amount of memory required, and the amount of time required to execute the program. The image below shows an early implementation of the parallel launch configuration dialog. </p> <p align="center"><img src="images/run.png" height="528"> </p> <p>After the resource information has been specified, the job is ready to be launched. Pressing the 'run' button will cause the launch configuration to pass the job information to the runtime controller, which will schedule the job for execution by the external runtime. Jobs that require resources to be allocated before execution will be placed in the appropriate queue, and the user will be notified once the job begins execution. Jobs that can be run interactively will be scheduled for immediate execution by the external runtime system. </p> <div class="subsection"><a name="2.5" id="2.5"></a>2.5 Future Work </div> <p>The following sections identify additional components of the parallel execution environment that will be considered for future releases. </p> <div class="subsection"><a name="2.5.1" id="2.5.1"></a>2.5.1 Pre/Post Execution Data File Management</div> <p>Parallel programs typically require one or more input data files, and may generate one or more output data files. Unlike sequential execution where the program executes on the local machine, parallel programs will normally execute on one or more remote machines. This complicates data file management, because the correct data file must be available to each process prior to execution. In addition, if the parallel processes produce individual output data files, these may need to collected from remote systems, then post-processed in some manner. There are a variety of methods for managing input/output data files, including network file systems, scripts, etc. Ideally, however, the execution environment would provide some standard mechanisms for managing data files in a range of different environments. </p> <div class="subsection"><a name="2.5.2" id="2.5.2"></a>2.5.2 Remote Build/Launch Capability</div> <p>Although it is possible to run Eclipse in the same environment as the parallel programs (e.g. using X-Windows), it would be much more convenient if Eclipse was running on the users local workstation (laptop, etc.). Open RTE supports this model of parallel execution, however there are a number of other issues that also need to be considered. In particular, it is unlikely that the users workstation contains the necessary tool chain, libraries and header files that are required to build the parallel program. To overcome this, it will be necessary to provide a remote build environment in addition to the remote execution environment.</p> <div class="section"><a name="PDB" id="PDB"></a>3. Parallel Debugger </div> <p>The parallel debugger is a key component of the Parallel Tools Platform. The debugger relies on the services of the execution environment to launch a parallel program so that each process is individually controlled by the debugger. The user is then able to control the processes, either individually, or as groups, by setting breakpoints, single stepping, etc. The debugger also provides a user interface that allows the user to examine process state information, and view variables within the executing processes. Since a parallel program can consist of many thousands of processes, the control mechanism and user interface must be implemented in such a way that it is scalable and efficient. The following diagram shows the architecture of the parallel debugger:</p> <p align="center"><img src="images/debug.png" width="365" height="510"></p> <p align="left">The main components of the parallel debugger are the parallel debug model, the parallel debug user interface, the parallel debug controller and debug API, and the scalable debug interface. An additional component is the debug data model, which is not shown in the diagram. Each of these components is described in more detail below. </p> <div class="subsection"><span class="section"><a name="3.1" id="3.1"></a></span>3.1 Parallel Debug Model</div> <p align="left">The parallel debug model is an extension of the platform debug model. As it is currently implemented, the platform debug model supports the notion of multiple threads, but only a single executing process. The parallel debug model extends this to support the notion of multiple processes, each of which can have multiple threads of execution. The following diagrams shows these extensions:</p> <p align="center"><img src="images/debug_model.png" width="580" height="311"> </p> <p align="left">It is expected that by preserving the existing interfaces, these extensions should be able to be implemented with minimal impact on the platform.<br> </p> <div class="subsection"><a name="3.2" id="3.2"></a>3.2 Parallel Debug User Interface</div> <p>In addition to extending the debug model, the parallel debugger also provides a number of additional user interface elements to the standard debug user interface. These elements are required in order to manage large numbers of objects (processes, variables, etc.) in an way that avoids overwhelming the IDE. These new user interface elements include: </p> <ul> <li>a new launch viewer that is able to display a large number of processes in a compact and scalable manner</li> <li>a new variables viewer that is able to display variables from many processes in a compact and scalable manner </li> <li>the ability to define process sets, and apply commands to sets of processes </li> <li>the ability to display and manage events from a large numbers of sources </li> <li>new methods for displaying data (e.g. a spreadsheet viewer for arrays)</li> </ul> <div class="subsection"><a name="3.3" id="3.3"></a>3.3 Parallel Debug Controller </div> <p>The parallel debug controller is responsible for managing the interactions between the parallel debugger and the external runtime and scalable debug interface. The main functions performed by the parallel debug controller include: </p> <ul> <li>launch a parallel program under the control of a debugger</li> <li>update the parallel user interface and debug model in response to debug events</li> <li> translate user interface actions into debug commands</li> <li>manage clean process termination and shutdown of a debug session </li> </ul> <p>Communication between the debug controller and the external components is via the parallel debug API.</p> <div class="subsection"><a name="3.3.1" id="3.3.1"></a>3.3.1 Parallel Debug API </div> <p>The parallel debug API provides an architecture neutral interface to the external parallel debugger components. The API supports a range of high-level debugger concepts, including:</p> <ul> <li>process management (start, stop, single step, etc.)</li> <li>breakpoint management (set, display, etc.) </li> <li>data management (variable display, expression evaluation, etc.) </li> <li>stack and thread management (display, navigation, etc.) </li> <li>source code and file management</li> </ul> <p>Most of API functionality is directly implemented by the scalable debug manager. </p> <div class="subsection"> <a name="3.4" id="3.4"></a>3.4 Scalable Debug Manager </div> <p>The scalable debug manager is an external component of the parallel debugger architecture that is responsible for coordinating a debug session involving large numbers of cooperating processes. This manager utilizes the services of the external runtime to launch the remote processes under debug control, then manages all communication between the debugger and the remote processes. The primary goal of this component is to achieve scalability when a debug session may involve thousands of processes. The main functions of the scalable debug manager include: </p> <ul> <li>efficiently start a large number of processes under the control of the debugger</li> <li>manage communication between debugger and processes in a scalable manner</li> <li>manage large numbers of debug events in order to prevent overwhelming the IDE</li> <li>cleanly terminate the debug engines and parallel program being debugged </li> </ul> <div class="subsection"><a name="3.5" id="3.5"></a>3.5 Debug Data Model </div> <p>The existing platform debug model provides two generic interfaces for dealing with debug data:</p> <ul> <li>IVariable - represents a visible data structure on a stack frame</li> <li>IValue - represents the value contained in a variable</li> </ul> <p>The expectation is that a particular debugger implementation will extend these interfaces to provide language specific functionality. Indeed, this is done in the CDT plugin with the ICVariable and ICValue interfaces, and CDT also adds another interface ICType that represents the type of a variable. In both the platform debug mode and the CDT debug model, the usage model is that variables, values and types are predominately for the display of program data structures in the debug user interface. Indeed, the structure of these interfaces matches how a user interacts with the debug user interface. For example, the only way to display the contents of an array is to manually select each element in turn. While this approach is adequate for the current suite of languages and debuggers, it presents some limitations for debugging parallel programs, and for the adoption of highly integrated, sophisticated parallel tools. </p> <p>In contrast, for debugging parallel programs, and in order to support the type of parallel tools we envision being integrated into the Eclipse platform, a more sophisticated data model is required. One possible model is shown in the diagram below.</p> <p align="center"><img src="images/debug_data.png" width="642" height="443"></p> <p>In this model, we propose to maintain both type information and value information within a single object. In addition, the type information will completely describe the data format of the value, and will be flexible enough to represent <em>any</em> data type in a language-neutral manner. Both the type and value information will be accessible in the model, and the data value will implement a lazy evaluation scheme that optimizes extracting the data from an executing program to how the data is used within the platform. In addition, the model will provide a range of methods for manipulation of the data (arithmetic operations, array operations, etc.) while in the intermediate format, and for conversion between the model and java native data types.</p> <div class="subsection"> <a name="3.6" id="3.6"></a>3.6 Future Work</div> <p>T.B.D.</p> <div class="section"><a name="PTI" id="PTI"></a>4. Tool Integration</div> <p>The tool integration component of the parallel tools platform provides a range of services that support the integration of parallel tools into the Eclipse platform. Initially, these services will comprise well-defined interfaces to the models, user interface components, and controllers that are contributed by other components of the parallel tools platform. However, as tool integration progresses, it is expected that other core services that can be shared between tool implementations, will be identified and defined. The following sections describe services that will be available in the initial implementation. <br> </p> <div class="subsection"> <a name="4.1" id="4.1"></a>4.1 Parallel Runtime Services </div> <p>The parallel runtime services will prove integrated tools with information about, and control of, the parallel environment. These services include:</p> <ul> <li> resource discovery, such as available parallel machines</li> <li>parallel machine architecture information </li> <li>node counts and node information, such as the number of physical processors, memory, etc.</li> <li> status information about the physical systems</li> <li>job and resource allocation status</li> <li>process status information</li> <li>status change event notification</li> <li>job launch facilities </li> </ul> <p>This information can be used by parallel tools to determine the operational environment, deploy tool specific functions on parallel machines, and interact with running programs. <br> </p> <div class="subsection"> <a name="4.2" id="4.2"></a>4.2 Parallel Debug Services</div> <p>The parallel debug services allow integrated parallel tools to interact directly with processes of an executing parallel program. This includes:</p> <ul> <li>launching parallel programs under debugger control</li> <li>programmatic access to a range of typical debug commands</li> <li>the ability to extract data from an executing program</li> <li>the ability to manipulate data from a program in a language independent manner</li> <li>notification of debugger events </li> </ul> <p>Parallel tools can use these models to access a wide range of functionality that is normally only available to a debugger. </p> <div class="subsection"><a name="4.3" id="4.3"></a>4.3 Parallel User Interface Components </div> <p>A range of user interface components are available for use by integrated parallel tools. These components provide functionality that is specifically designed for the compact and efficient display of large numbers of objects. Examples of such components include:</p> <ul> <li>resource discovery view</li> <li>parallel machine status view </li> <li>process status view</li> <li>resource allocation status view</li> <li>data visualization</li> </ul> <p>By utilizing these components, parallel tool developers are relieved of the necessity to develop their own components to provide similar functionality. In addition, the user benefits from a consistent look and feel across a range of different tools. The model is also designed to encourage contribution of new user interface components by tool developers, who are able to minimize their support requirements for common user interface features, while concentrating on their core tool functionality. </p> <div class="subsection"><a name="4.4" id="4.4"></a>4.4 Future Work </div> <p>T.B.D.</p> <div class="section"><a name="EUS" id="EUS"></a>5. End-User Support </div> <p>The end-user support component of the parallel tools platform aims to produce a platform that is designed to assist the <em>end-users</em> of parallel programs (as apposed to the <em>developers </em>of parallel programs) to effectively work in a parallel programming environment. It is envisaged that this component will take advantage of the rich client platform capability of eclipse to provide a non-IDE application that will retain core components of the parallel tools platform. The resulting application will provide the following functionality: </p> <ul> <li>the ability to launch parallel programs</li> <li>specification of resource requirements</li> <li>monitoring progress through resource management systems</li> <li>monitoring machine status information</li> <li>monitoring program status information</li> <li>notification of special events</li> <li>access to visualization and end-user tool services <br> </li> </ul> <div class="section"><a name="FDT"></a>6. Fortran Development Tools </div> <p>The Fortran Development Tools is a stand-alone component, since it is not specifically required for parallel tools platform support. The aim is to provide Fortran language support for the Eclipse IDE with a similar level of integration to that provided by the C/C++ Development Tools. The initial version of FDT will focus on providing the following features:</p> <ul> <li>Fortran project wizard to assist in creating Fortran projects </li> <li>Fortran nature to recognize Fortran file extensions </li> <li>Support for Fortran with standard and managed builders </li> <li>Tool chain support for popular Fortran compilers, initially to include: <ul> <li>GNU gfortran (to be released with gcc 4.0.0) </li> </ul> </li> <ul> <li>IBM xlf (v9.1)</li> <li>Intel ifc (v8.1) </li> </ul> <li>Fortran debugger support (including gdb)</li> <li>Launching Fortran programs</li> <li>Fortran editor with simple syntax highlighting</li> <li>Integration with the parallel tools platform to provide support for parallel Fortran programs </li> </ul> <div class="subsection"> <p><a name="6.1" id="6.1"></a>6.1 Future Work </p> </div> <p>The main item of future work will be the provision of a Fortran parser, along with enhanced search functionality, and content assist functionality. In addition, support for Fortran in mixed-language environments will be provided, with a wizard to automatically provide C interfaces to Fortran procedures using the Fortran 2003 C-interop standard. <br> </p> <p><div class="indexsub">LAUR-05-0325</div></p> </body> </html>