eclipse_newsletter/2015/january/content/en_article4.php - gerrit/www.eclipse.org/community - Git at Google

 <?php
 /*******************************************************************************
  * Copyright (c) 2015 Eclipse Foundation and others.
  * All rights reserved. This program and the accompanying materials
  * are made available under the terms of the Eclipse Public License v1.0
  * which accompanies this distribution, and is available at
  * http://eclipse.org/legal/epl-v10.html
  *
  * Contributors:
  *    Eric Poirier (Eclipse Foundation) - Initial implementation
  *******************************************************************************/
 ?>

 <h1 class="article-title"><?php echo $pageTitle; ?></h1>
     <h2>John D. McGregor J. Yates Monteith John E. Ingram</h2>
     <p>
       <i>Strategic Software Engineering Research Group<br> Clemson
         University<br> Clemson, SC 29634<br> {johnmc, jymonte,
         jei}@clemson.edu
       </i>
     </p>

     <p>The science research enterprise – including organizations such as
       universities, companies, and federal agencies – supports the
       development of a large amount of software. In some cases, a large
       community of scientific users comes to depend on the continued
       availability of one of these software systems or one of its
       constituent parts. Examples of these systems include Hadoop, R,
       Eclipse, and many more. In the case of open-source software, much
       of the software and software systems developed for scientific
       users depends on numerous software packages, some with a long
       lineage of “parent” software projects. The future of the system
       being developed depends on these components being maintained but
       there are just too many open source software systems for
       universities, companies, or federal agencies to support all of
       them. The science research enterprise must strategically choose
       which software systems to develop, support, and maintain and which
       to petition the original producers to maintain.</p>
     <p>When a software tool becomes popular outside the research group
       that developed it, the continued use of the software system is a
       point of risk for the advancement of scientific goals. Scientific
       outcomes are dependent on the continued support of not just the
       target software package, but also on the continued maintenance of
       the ecosystem of software packages upon which a product depends.
       When decisions must be made about continued funding for these
       research projects, these decisions should be partially based on
       the quality and availability of the supporting software
       infrastructure and the proposed software’s future impact on its
       scientific community.</p>
     <p>Our operating premise is that software, which is supported by a
       healthy ecosystem [8], will be nurtured and sustained. This is
       easier for “Big Science” projects [3] that involve professional
       staff than it is for projects with one or two senior investigators
       and a few graduate students. GitHub and similar development
       support infrastructure facilitate some mechanical tasks but small
       groups may not have a computing specialist and may have a hard
       time identifying and understanding how to use a robust
       infrastructure. There is a substantial difference between a
       warehouse such as GitHub, which stores discrete pieces of
       software, and a development community, which stores software that
       contributes to the specific products developed by the community.</p>

     <p>
       The National Science Foundation (NSF) report: <b><u>A VISION AND
           STRATEGY FOR SOFTWARE FOR SCIENCE, ENGINEERING, AND EDUCATION</u></b>
       [9] recommends that NSF “Support the creation and maintenance of
       an innovative, integrated, reliable, sustainable and accessible
       ecosystem of software and services that advances scientific
       inquiry and application at unprecedented complexity and scale.”
       Taking a software ecosystem approach addresses both organizational
       issues and technical issues [2, 6]. An analysis of the ecosystem
       surrounding a project could assist in evaluating requests for
       funding software development. Such an analysis should include an
       evaluation of the strength of the community support for the
       software and the software’s fit with the larger context as defined
       by the ecosystem’s architecture. The community’s contributions to
       the software through add-ons, testing, and other continuing
       activities is an important factor [5]. The analysis might also
       include an evaluation of the product itself through the quality of
       the code, architecture, and supporting elements such as automated
       test cases [5].
     </p>
     <p>This strategic view can be difficult to motivate in basic
       scientific research projects where the return on investment is
       even more indirect than for an open source product. The impacts of
       a research project and its intellectual merit should be considered
       in the context of value chain analysis to point out the balance
       between cost and value. Evaluating the potential of start-up
       companies and patents securing research results would also
       strengthen the business case.</p>
     <p>This is not simply an economic issue. Scientific research must be
       reproducible. Changes to libraries somewhere in the supply chain
       may affect results and be virtually impossible to trace. Having
       access to the entire supply chain is essential to reproducibility.
       A scientific software ecosystem should support reproducibility, as
       does a commercial product development environment, by providing
       meta-data that identifies the exact tool chain and software
       component chains used to produce a specific set of results.</p>
     <p>
       There are numerous other issues regarding the sustainability of
       scientific research software. Many of these issues have been
       surfaced at the Workshop on Sustainable Software for Science:
       Practice and Experiences (WSSPE) workshop series (<a
         target="_blank"
         href="http://wssspe.researchcomputing.org.uk/wssspe2/cfp/">http://wssspe.researchcomputing.org.uk/wssspe2/cfp/</a>).
       For example, Allen and Schmidt pointed out issues with
       establishing a repository of code for a discipline including the
       need for meta-data curation and giving the repository sufficient
       within the discipline. They state that “the greatest inhibitors
       relate to human nature, including the unwillingness of scientists
       to share their codes openly, the effect of the lack of an adequate
       reward system for software authorship, and the competitive
       environment in astronomy [1]”. Habermann et al [4] look at
       sustainability from the point of view of data “In order to be
       sustainable in the long-term, data must be preserved in
       well-documented, self-describing formats accessible on multiple
       platforms using many programming languages.”
     </p>
     <p>Clemson University, a longtime member of the Eclipse Foundation,
       joined the Eclipse Science Working Group with the goal of
       participating in the formation of a model ecosystem that sustains
       scientific research software for a domain. As part of a National
       Science Foundation funded project, we have already produced
       several studies and modified our ecosystem modeling technique to
       facilitate understanding the available software within an
       ecosystem [6,7]. We look forward to participating in growing and
       maturing the community and to raising awareness of the issues and
       potential solutions to developing long-lived scientific research
       software.</p>
     <p>This work was partially funded by the National Science Foundation
       grant #ACI-1343033.</p>
     <ol>
       <li>Alice Allen and Judy Schmidt. Looking before leaping: Creating
         a software registry. http://arxiv.org/abs/1407.5378, 2014.</li>
       <li>G. Chastek and J. D. McGregor, “It takes an ecosystem,” SSTC,
         2012.</li>
       <li>The CRASH Report - 2011/12 (CAST Report on Application
         Software Health), <a target="_blank"
         href="http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd">http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd</a>.
       </li>
       <li>Habermann, Ted; Collette, Andrew; Vincena, Steve; Billings,
         Jay Jay; Gerring, Matt; Hinsen, Konrad; Benger, Werner; Maia,
         Filipe RNC; Byna, Suren; de Buyl, Pierre (2014): The
         Hierarchical Data Format (HDF): A Foundation for Sustainable
         Data and Software. <a target="_blank"
         href="http://figshare.com/articles/The_Hierarchical_Data_Format_HDF_A_Foundation_for_Sustainable_Data_and_Software/1112485">http://dx.doi.org/10.6084/m9.figshare.1112485</a>.
       </li>
       <li>John D. McGregor: A method for analyzing software product line
         ecosystems: First International Workshop on Software Ecosystems,
         73-80, 2008.</li>
       <li>John Yates Monteith, John D. McGregor, and John E. Ingram.
         2014. Proposed metrics on ecosystem health. In Proceedings of
         the 2014 ACM international workshop on Software-defined
         ecosystems (BigSystem '14). ACM, New York, NY, USA, 33-36.
         DOI=10.1145/2609441.2609643 <a target="_blank"
         href="http://dl.acm.org/citation.cfm?doid=2609441.2609643">http://doi.acm.org/10.1145/2609441.2609643</a>.
       </li>
       <li>J. Yates Monteith, John D. McGregor, and John E. Ingram. 2014.
         Scientific Research Software Ecosystems. In Proceedings of the
         2014 European Conference on Software Architecture Workshops
         (ECSAW '14). ACM, New York, NY, USA, , Article 9 , 6 pages.
         DOI=10.1145/2642803.2642812 <a target="_blank"
         href="http://dl.acm.org/citation.cfm?doid=2642803.2642812">http://doi.acm.org/10.1145/2642803.2642812</a>.
       </li>
       <li>David G. Messerschmitt and Clemens Szyperski (2003). Software
         Ecosystem: Understanding an Indispensable Technology and
         Industry. Cambridge, MA, USA: MIT Press.</li>
       <li>National Science Foundation, A VISION AND STRATEGY FOR
         SOFTWARE FORSCIENCE, ENGINEERING, AND EDUCATION
         CYBERINFRASTRUCTURE FRAMEWORKFOR THE 21ST CENTURY, <a
         target="_blank"
         href="http://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf">www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf</a>.
       </li>
     </ol>

 <div class="bottomitem">
   <h3>About the Authors</h3>

   <div class="row">
     <div class="col-sm-12">
       <div class="row">
         <div class="col-sm-8">

         </div>
         <div class="col-sm-16">
           <p class="author-name">
         John D. McGregor<br />
         <a target="_blank" href="http://www.clemson.edu/">Clemson
           University</a>
       </p>
       <ul class="author-link">
         <!--<li><a target="_blank" href="http://geospatial.blogs.com/">Blog</a></li>
         <li><a target="_blank" href="https://twitter.com/gzeiss">Twitter</a></li>
         <li><a target="_blank" href="">Google +</a></li>
         $og-->
       </ul>
         </div>
       </div>
     </div>
   </div>
 </div>
	<?php
	/*******************************************************************************
	* Copyright (c) 2015 Eclipse Foundation and others.
	* All rights reserved. This program and the accompanying materials
	* are made available under the terms of the Eclipse Public License v1.0
	* which accompanies this distribution, and is available at
	* http://eclipse.org/legal/epl-v10.html
	*
	* Contributors:
	* Eric Poirier (Eclipse Foundation) - Initial implementation
	*******************************************************************************/
	?>

	<h1 class="article-title"><?php echo $pageTitle; ?></h1>
	<h2>John D. McGregor J. Yates Monteith John E. Ingram</h2>
	<p>
	<i>Strategic Software Engineering Research Group<br> Clemson
	University<br> Clemson, SC 29634<br> {johnmc, jymonte,
	jei}@clemson.edu
	</i>
	</p>

	<p>The science research enterprise – including organizations such as
	universities, companies, and federal agencies – supports the
	development of a large amount of software. In some cases, a large
	community of scientific users comes to depend on the continued
	availability of one of these software systems or one of its
	constituent parts. Examples of these systems include Hadoop, R,
	Eclipse, and many more. In the case of open-source software, much
	of the software and software systems developed for scientific
	users depends on numerous software packages, some with a long
	lineage of “parent” software projects. The future of the system
	being developed depends on these components being maintained but
	there are just too many open source software systems for
	universities, companies, or federal agencies to support all of
	them. The science research enterprise must strategically choose
	which software systems to develop, support, and maintain and which
	to petition the original producers to maintain.</p>
	<p>When a software tool becomes popular outside the research group
	that developed it, the continued use of the software system is a
	point of risk for the advancement of scientific goals. Scientific
	outcomes are dependent on the continued support of not just the
	target software package, but also on the continued maintenance of
	the ecosystem of software packages upon which a product depends.
	When decisions must be made about continued funding for these
	research projects, these decisions should be partially based on
	the quality and availability of the supporting software
	infrastructure and the proposed software’s future impact on its
	scientific community.</p>
	<p>Our operating premise is that software, which is supported by a
	healthy ecosystem [8], will be nurtured and sustained. This is
	easier for “Big Science” projects [3] that involve professional
	staff than it is for projects with one or two senior investigators
	and a few graduate students. GitHub and similar development
	support infrastructure facilitate some mechanical tasks but small
	groups may not have a computing specialist and may have a hard
	time identifying and understanding how to use a robust
	infrastructure. There is a substantial difference between a
	warehouse such as GitHub, which stores discrete pieces of
	software, and a development community, which stores software that
	contributes to the specific products developed by the community.</p>

	<p>
	The National Science Foundation (NSF) report: <b><u>A VISION AND
	STRATEGY FOR SOFTWARE FOR SCIENCE, ENGINEERING, AND EDUCATION</u></b>
	[9] recommends that NSF “Support the creation and maintenance of
	an innovative, integrated, reliable, sustainable and accessible
	ecosystem of software and services that advances scientific
	inquiry and application at unprecedented complexity and scale.”
	Taking a software ecosystem approach addresses both organizational
	issues and technical issues [2, 6]. An analysis of the ecosystem
	surrounding a project could assist in evaluating requests for
	funding software development. Such an analysis should include an
	evaluation of the strength of the community support for the
	software and the software’s fit with the larger context as defined
	by the ecosystem’s architecture. The community’s contributions to
	the software through add-ons, testing, and other continuing
	activities is an important factor [5]. The analysis might also
	include an evaluation of the product itself through the quality of
	the code, architecture, and supporting elements such as automated
	test cases [5].
	</p>
	<p>This strategic view can be difficult to motivate in basic
	scientific research projects where the return on investment is
	even more indirect than for an open source product. The impacts of
	a research project and its intellectual merit should be considered
	in the context of value chain analysis to point out the balance
	between cost and value. Evaluating the potential of start-up
	companies and patents securing research results would also
	strengthen the business case.</p>
	<p>This is not simply an economic issue. Scientific research must be
	reproducible. Changes to libraries somewhere in the supply chain
	may affect results and be virtually impossible to trace. Having
	access to the entire supply chain is essential to reproducibility.
	A scientific software ecosystem should support reproducibility, as
	does a commercial product development environment, by providing
	meta-data that identifies the exact tool chain and software
	component chains used to produce a specific set of results.</p>
	<p>
	There are numerous other issues regarding the sustainability of
	scientific research software. Many of these issues have been
	surfaced at the Workshop on Sustainable Software for Science:
	Practice and Experiences (WSSPE) workshop series (<a
	target="_blank"
	href="http://wssspe.researchcomputing.org.uk/wssspe2/cfp/">http://wssspe.researchcomputing.org.uk/wssspe2/cfp/</a>).
	For example, Allen and Schmidt pointed out issues with
	establishing a repository of code for a discipline including the
	need for meta-data curation and giving the repository sufficient
	within the discipline. They state that “the greatest inhibitors
	relate to human nature, including the unwillingness of scientists
	to share their codes openly, the effect of the lack of an adequate
	reward system for software authorship, and the competitive
	environment in astronomy [1]”. Habermann et al [4] look at
	sustainability from the point of view of data “In order to be
	sustainable in the long-term, data must be preserved in
	well-documented, self-describing formats accessible on multiple
	platforms using many programming languages.”
	</p>
	<p>Clemson University, a longtime member of the Eclipse Foundation,
	joined the Eclipse Science Working Group with the goal of
	participating in the formation of a model ecosystem that sustains
	scientific research software for a domain. As part of a National
	Science Foundation funded project, we have already produced
	several studies and modified our ecosystem modeling technique to
	facilitate understanding the available software within an
	ecosystem [6,7]. We look forward to participating in growing and
	maturing the community and to raising awareness of the issues and
	potential solutions to developing long-lived scientific research
	software.</p>
	<p>This work was partially funded by the National Science Foundation
	grant #ACI-1343033.</p>
	<ol>
	<li>Alice Allen and Judy Schmidt. Looking before leaping: Creating
	a software registry. http://arxiv.org/abs/1407.5378, 2014.</li>
	<li>G. Chastek and J. D. McGregor, “It takes an ecosystem,” SSTC,
	2012.</li>
	<li>The CRASH Report - 2011/12 (CAST Report on Application
	Software Health), <a target="_blank"
	href="http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd">http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd</a>.
	</li>
	<li>Habermann, Ted; Collette, Andrew; Vincena, Steve; Billings,
	Jay Jay; Gerring, Matt; Hinsen, Konrad; Benger, Werner; Maia,
	Filipe RNC; Byna, Suren; de Buyl, Pierre (2014): The
	Hierarchical Data Format (HDF): A Foundation for Sustainable
	Data and Software. <a target="_blank"
	href="http://figshare.com/articles/The_Hierarchical_Data_Format_HDF_A_Foundation_for_Sustainable_Data_and_Software/1112485">http://dx.doi.org/10.6084/m9.figshare.1112485</a>.
	</li>
	<li>John D. McGregor: A method for analyzing software product line
	ecosystems: First International Workshop on Software Ecosystems,
	73-80, 2008.</li>
	<li>John Yates Monteith, John D. McGregor, and John E. Ingram.
	2014. Proposed metrics on ecosystem health. In Proceedings of
	the 2014 ACM international workshop on Software-defined
	ecosystems (BigSystem '14). ACM, New York, NY, USA, 33-36.
	DOI=10.1145/2609441.2609643 <a target="_blank"
	href="http://dl.acm.org/citation.cfm?doid=2609441.2609643">http://doi.acm.org/10.1145/2609441.2609643</a>.
	</li>
	<li>J. Yates Monteith, John D. McGregor, and John E. Ingram. 2014.
	Scientific Research Software Ecosystems. In Proceedings of the
	2014 European Conference on Software Architecture Workshops
	(ECSAW '14). ACM, New York, NY, USA, , Article 9 , 6 pages.
	DOI=10.1145/2642803.2642812 <a target="_blank"
	href="http://dl.acm.org/citation.cfm?doid=2642803.2642812">http://doi.acm.org/10.1145/2642803.2642812</a>.
	</li>
	<li>David G. Messerschmitt and Clemens Szyperski (2003). Software
	Ecosystem: Understanding an Indispensable Technology and
	Industry. Cambridge, MA, USA: MIT Press.</li>
	<li>National Science Foundation, A VISION AND STRATEGY FOR
	SOFTWARE FORSCIENCE, ENGINEERING, AND EDUCATION
	CYBERINFRASTRUCTURE FRAMEWORKFOR THE 21ST CENTURY, <a
	target="_blank"
	href="http://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf">www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf</a>.
	</li>
	</ol>

	<div class="bottomitem">
	<h3>About the Authors</h3>

	<div class="row">
	<div class="col-sm-12">
	<div class="row">
	<div class="col-sm-8">

	</div>
	<div class="col-sm-16">
	<p class="author-name">
	John D. McGregor<br />
	<a target="_blank" href="http://www.clemson.edu/">Clemson
	University</a>
	</p>
	<ul class="author-link">
	<!--<li><a target="_blank" href="http://geospatial.blogs.com/">Blog</a></li>
	<li><a target="_blank" href="https://twitter.com/gzeiss">Twitter</a></li>
	<li><a target="_blank" href="">Google +</a></li>
	$og-->
	</ul>
	</div>
	</div>
	</div>
	</div>
	</div>