blob: 71e8400656d90de754b9f04f63d0828046987300 [file] [log] [blame]
<?php
/*******************************************************************************
* Copyright (c) 2015 Eclipse Foundation and others.
* All rights reserved. This program and the accompanying materials
* are made available under the terms of the Eclipse Public License v1.0
* which accompanies this distribution, and is available at
* http://eclipse.org/legal/epl-v10.html
*
* Contributors:
* Eric Poirier (Eclipse Foundation) - Initial implementation
*******************************************************************************/
?>
<h1 class="article-title"><?php echo $pageTitle; ?></h1>
<h2>John D. McGregor J. Yates Monteith John E. Ingram</h2>
<p>
<i>Strategic Software Engineering Research Group<br> Clemson
University<br> Clemson, SC 29634<br> {johnmc, jymonte,
jei}@clemson.edu
</i>
</p>
<p>The science research enterprise – including organizations such as
universities, companies, and federal agencies – supports the
development of a large amount of software. In some cases, a large
community of scientific users comes to depend on the continued
availability of one of these software systems or one of its
constituent parts. Examples of these systems include Hadoop, R,
Eclipse, and many more. In the case of open-source software, much
of the software and software systems developed for scientific
users depends on numerous software packages, some with a long
lineage of “parent” software projects. The future of the system
being developed depends on these components being maintained but
there are just too many open source software systems for
universities, companies, or federal agencies to support all of
them. The science research enterprise must strategically choose
which software systems to develop, support, and maintain and which
to petition the original producers to maintain.</p>
<p>When a software tool becomes popular outside the research group
that developed it, the continued use of the software system is a
point of risk for the advancement of scientific goals. Scientific
outcomes are dependent on the continued support of not just the
target software package, but also on the continued maintenance of
the ecosystem of software packages upon which a product depends.
When decisions must be made about continued funding for these
research projects, these decisions should be partially based on
the quality and availability of the supporting software
infrastructure and the proposed software’s future impact on its
scientific community.</p>
<p>Our operating premise is that software, which is supported by a
healthy ecosystem [8], will be nurtured and sustained. This is
easier for “Big Science” projects [3] that involve professional
staff than it is for projects with one or two senior investigators
and a few graduate students. GitHub and similar development
support infrastructure facilitate some mechanical tasks but small
groups may not have a computing specialist and may have a hard
time identifying and understanding how to use a robust
infrastructure. There is a substantial difference between a
warehouse such as GitHub, which stores discrete pieces of
software, and a development community, which stores software that
contributes to the specific products developed by the community.</p>
<p>
The National Science Foundation (NSF) report: <b><u>A VISION AND
STRATEGY FOR SOFTWARE FOR SCIENCE, ENGINEERING, AND EDUCATION</u></b>
[9] recommends that NSF “Support the creation and maintenance of
an innovative, integrated, reliable, sustainable and accessible
ecosystem of software and services that advances scientific
inquiry and application at unprecedented complexity and scale.”
Taking a software ecosystem approach addresses both organizational
issues and technical issues [2, 6]. An analysis of the ecosystem
surrounding a project could assist in evaluating requests for
funding software development. Such an analysis should include an
evaluation of the strength of the community support for the
software and the software’s fit with the larger context as defined
by the ecosystem’s architecture. The community’s contributions to
the software through add-ons, testing, and other continuing
activities is an important factor [5]. The analysis might also
include an evaluation of the product itself through the quality of
the code, architecture, and supporting elements such as automated
test cases [5].
</p>
<p>This strategic view can be difficult to motivate in basic
scientific research projects where the return on investment is
even more indirect than for an open source product. The impacts of
a research project and its intellectual merit should be considered
in the context of value chain analysis to point out the balance
between cost and value. Evaluating the potential of start-up
companies and patents securing research results would also
strengthen the business case.</p>
<p>This is not simply an economic issue. Scientific research must be
reproducible. Changes to libraries somewhere in the supply chain
may affect results and be virtually impossible to trace. Having
access to the entire supply chain is essential to reproducibility.
A scientific software ecosystem should support reproducibility, as
does a commercial product development environment, by providing
meta-data that identifies the exact tool chain and software
component chains used to produce a specific set of results.</p>
<p>
There are numerous other issues regarding the sustainability of
scientific research software. Many of these issues have been
surfaced at the Workshop on Sustainable Software for Science:
Practice and Experiences (WSSPE) workshop series (<a
target="_blank"
href="http://wssspe.researchcomputing.org.uk/wssspe2/cfp/">http://wssspe.researchcomputing.org.uk/wssspe2/cfp/</a>).
For example, Allen and Schmidt pointed out issues with
establishing a repository of code for a discipline including the
need for meta-data curation and giving the repository sufficient
within the discipline. They state that “the greatest inhibitors
relate to human nature, including the unwillingness of scientists
to share their codes openly, the effect of the lack of an adequate
reward system for software authorship, and the competitive
environment in astronomy [1]”. Habermann et al [4] look at
sustainability from the point of view of data “In order to be
sustainable in the long-term, data must be preserved in
well-documented, self-describing formats accessible on multiple
platforms using many programming languages.”
</p>
<p>Clemson University, a longtime member of the Eclipse Foundation,
joined the Eclipse Science Working Group with the goal of
participating in the formation of a model ecosystem that sustains
scientific research software for a domain. As part of a National
Science Foundation funded project, we have already produced
several studies and modified our ecosystem modeling technique to
facilitate understanding the available software within an
ecosystem [6,7]. We look forward to participating in growing and
maturing the community and to raising awareness of the issues and
potential solutions to developing long-lived scientific research
software.</p>
<p>This work was partially funded by the National Science Foundation
grant #ACI-1343033.</p>
<ol>
<li>Alice Allen and Judy Schmidt. Looking before leaping: Creating
a software registry. http://arxiv.org/abs/1407.5378, 2014.</li>
<li>G. Chastek and J. D. McGregor, “It takes an ecosystem,” SSTC,
2012.</li>
<li>The CRASH Report - 2011/12 (CAST Report on Application
Software Health), <a target="_blank"
href="http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd">http://www.castsoftware.com/resources/resource/whitepapers/cast-report-on-application-software-health?gad=otd</a>.
</li>
<li>Habermann, Ted; Collette, Andrew; Vincena, Steve; Billings,
Jay Jay; Gerring, Matt; Hinsen, Konrad; Benger, Werner; Maia,
Filipe RNC; Byna, Suren; de Buyl, Pierre (2014): The
Hierarchical Data Format (HDF): A Foundation for Sustainable
Data and Software. <a target="_blank"
href="http://figshare.com/articles/The_Hierarchical_Data_Format_HDF_A_Foundation_for_Sustainable_Data_and_Software/1112485">http://dx.doi.org/10.6084/m9.figshare.1112485</a>.
</li>
<li>John D. McGregor: A method for analyzing software product line
ecosystems: First International Workshop on Software Ecosystems,
73-80, 2008.</li>
<li>John Yates Monteith, John D. McGregor, and John E. Ingram.
2014. Proposed metrics on ecosystem health. In Proceedings of
the 2014 ACM international workshop on Software-defined
ecosystems (BigSystem '14). ACM, New York, NY, USA, 33-36.
DOI=10.1145/2609441.2609643 <a target="_blank"
href="http://dl.acm.org/citation.cfm?doid=2609441.2609643">http://doi.acm.org/10.1145/2609441.2609643</a>.
</li>
<li>J. Yates Monteith, John D. McGregor, and John E. Ingram. 2014.
Scientific Research Software Ecosystems. In Proceedings of the
2014 European Conference on Software Architecture Workshops
(ECSAW '14). ACM, New York, NY, USA, , Article 9 , 6 pages.
DOI=10.1145/2642803.2642812 <a target="_blank"
href="http://dl.acm.org/citation.cfm?doid=2642803.2642812">http://doi.acm.org/10.1145/2642803.2642812</a>.
</li>
<li>David G. Messerschmitt and Clemens Szyperski (2003). Software
Ecosystem: Understanding an Indispensable Technology and
Industry. Cambridge, MA, USA: MIT Press.</li>
<li>National Science Foundation, A VISION AND STRATEGY FOR
SOFTWARE FORSCIENCE, ENGINEERING, AND EDUCATION
CYBERINFRASTRUCTURE FRAMEWORKFOR THE 21ST CENTURY, <a
target="_blank"
href="http://www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf">www.nsf.gov/pubs/2012/nsf12113/nsf12113.pdf</a>.
</li>
</ol>
<div class="bottomitem">
<h3>About the Authors</h3>
<div class="row">
<div class="col-sm-12">
<div class="row">
<div class="col-sm-8">
</div>
<div class="col-sm-16">
<p class="author-name">
John D. McGregor<br />
<a target="_blank" href="http://www.clemson.edu/">Clemson
University</a>
</p>
<ul class="author-link">
<!--<li><a target="_blank" href="http://geospatial.blogs.com/">Blog</a></li>
<li><a target="_blank" href="https://twitter.com/gzeiss">Twitter</a></li>
<li><a target="_blank" href="">Google +</a></li>
$og-->
</ul>
</div>
</div>
</div>
</div>
</div>