blob: bee6c4305a7211e0cb9dff776e1755b44c5d5c1d [file] [log] [blame]
<!--
This document is provided as a template along with some guidance for creating
your project proposal. This is just a template. Feel free to change it as
you see fit (add sections, remove section). We feel, however, that the
suggestions represented in this document represent the reasonable minimum
amount of information to move forward.
Please keep the formatting in this document simple. Please do not edit
this document in Microsoft Word as it adds huge piles of markup that make
it difficult to restyle.
More information is available here:
http://wiki.eclipse.org/Development_Resources/HOWTO/Pre-Proposal_Phase
Direct any questions about this template to emo@eclipse.org
-->
<html>
<head>
<!--
Include the title here. We will parse it out of here and include it on the
rendered webpage. Do not duplicate the title within the text of your page.
-->
<title>Code Recommenders</title>
</head>
<!--
We make use of the 'classic' HTML Definition List (dl) tag to specify
committers. I know... you haven't seen this tag in a long while...
-->
<style>
dt {
display: list-item;
list-style-position: outside;
list-style-image:
url(/eclipse.org-common/themes/Phoenix/images/arrow.gif);
margin-left: 16px;
}
dd {
margin-left: 25px;
margin-bottom: 5px;
}
</style>
<body>
<p>Code Recommenders is a proposed open source project under the <a
href="http://www.eclipse.org/projects/project_summary.php?projectid=technology">Eclipse
Technology Container Project</a>.</p>
<!--
The communication channel must be specified. Typically, this is the
"Proposals" forum. In general, you don't need to change this.
-->
<p>This proposal is in the Project Proposal Phase (as defined in the
Eclipse Development Process) and is written to declare its intent and
scope. We solicit additional participation and input from the Eclipse
community. Please send all feedback to the <a
href="http://www.eclipse.org/forums/eclipse.proposals">Eclipse
Proposals</a> Forum.</p>
<p>This proposal is structured as follows. Section "Background"
gives the motivation of the project and provides some background
information about the origins of the proposed project, namely, the Code
Recommenders Project developed at Darmstadt University of Technology.
Section "Scope" outlines the initial set of tools and platforms this
project aims to deliver to its users; Section "Initial Contributions"
describes the current state of the project and the initial contributions
that will be made. "Description" gives little more details on the
intermediate goals. Section "Related Eclipse Projects" describes
potential future connections between current Eclipse Projects and the
Code Recommenders project as well as likely collaborations. The
remaining sections (Committers, Mentors, Interested Parties, Additional
Information) describe what their names suggest.</p>
<h2>Background</h2>
<blockquote><i>Under the right circumstances, groups
are remarkably intelligent and are often better than the smartest person
in them.&nbsp;&nbsp;&nbsp; - James Surowiecki: "Wisdom of the Crowds"</i></blockquote>
<br>
<p>Application frameworks have become an integral part of today's
software development - this is hardly surprising given their promised
benefits such as reduced costs, higher quality, and shorter time to
market. But using an application framework is not free of cost. Before
frameworks can be used efficiently, software developers have to learn
their correct usage which often results in high initial training costs.
</p>
<p>To reduce these training costs, framework developers provide
diverse documentation addressing different information needs. Tutorials,
for instance, describe typical usage scenarios, and thus give the
application developer an initial insight into the workings of the
framework. However, their benefit quickly disappears when problems have
to be solved that differ from standard usage scenarios. Now, API
documentation becomes the most important resource for software
developers. Documentation is scanned for hints relevant for the own
problem at hand but if it does not provide the required information, the
most costly part of the research begins: The source code of other
programs is investigated that successfully used the framework in a
similar way. But learning correct framework usage from these real-world
examples is difficult. The problem with these examples is that they also
contain application-specific code that obscures the view on what is
really important for using the framework. This significantly complicates
the understanding process which makes the training a challenging and
time-consuming task again. However, source code of other applications
seems to be a valuable source of information. Code-search engines like
Google Codesearch or Krugle experience their hype not least because
existing framework documentation seems insufficient to support
developers on their daily work.</p>
<p>But despite their widespread use, it's an open question whether
code-search engines solve the problem of missing documentation in a
satisfactory manner. When looking at how developers use code-search
engines, it turns out that they rarely create a single query and study
just a single example; instead, they typically refine their queries
several times, investigate a number of examples, compare them to each
other and try to extract a pattern that underlies all these examples,
i.e., a common way how to use the API in question.</p>
<p>Although this task is very time-consuming, analyzing example code
seems worth doing. Apparently, example code must provide some important
insights in how to use a given API. Given this observation, the question
is raised whether such important information can be extracted from
example code automatically, i.e., without large manual effort. And
furthermore if valuable information can be found, how can these findings
made accessible to support developers on their daily work.</p>
<p>The <b><i>Code Recommenders'</i></b> project developed at
Darmstadt University of Technology investigates exactly these two
questions. In a nutshell, tools are developed that automatically analyze
large-scale code repositories, extract various interesting data from it
and integrate this information back into the IDE where it is reused by
developers on their daily work. The vision of the project is to create a
context-sensitive IDE that learns from what is relevant in a given code
situation from its users and, in turn, give back this knowledge to other
users. If you like, you may think of it like a collaborative way of
sharing knowledge over the IDE.</p>
<p>This Eclipse proposal is the next step towards the goal to build
next generation of collaborative IDE services, which we call "the IDE
2.0" - inspired by the success of Web 2.0. The complete vision and
explanation of the IDE 2.0 to web 2.0 analogy is described in <a
href="http://code-recommenders.blogspot.com/2010/08/eclipse-and-academia-briding-gap.html">
IDE 2.0: Collective Intelligence in Software Development</a> - published at
the Working Conference on the "Future of Software Engineering Research
(FoSER) 2010".
<h2>Scope</h2>
One of the major goals of this project is to make a new generation of
tool ideas accessible and usable by the Eclipse community, to further
improve these tools based on the user feedback obtained or even to build
completely new tools based on the experiences and developer needs. So
far, a couple of steps towards IDE 2.0 have been accomplished, some of
which we will describe briefly in Section "Initial Contributions". These
tools, however, have to prove themselves as being useful. To allow this
evaluation this project aims to (i) provide a platform for innovative
IDE features that leverage the wisdom of the crowds, (ii) build a very
vibrant community around IDE 2.0 services based on Eclipse, and (iii)
provide an open platform allowing every community member to actively
contribute to these services and to build and evaluate new tools based
on the data contributed by the community itself. The initial scope of
this project is to provide tools for the following topics:
<ol>
<li><b>Intelligent Code Completion Systems</b>:<br>
Code Completion Systems pretty good in showing a developer all possible
completions in a given context. However, sometimes these proposals can
be overwhelming for novice developers. Goal of this project is to
develop completion engines that leverage the information how other
developers used certain types in similar context and thus are capable
to filter OR rearrange proposals according to some relevance criterion
(similar to Mylyn's Context model but learning this relevance judgment
based on how thousands of users used a given API). <a
href="http://code-recommenders.blogspot.com/2010/05/its-all-about-intelligent-code.html">read
more...</a>
<li><b>Smart Template Engines</b>:<br>
The well-known SWT Templates are pretty helpful for developers not
familiar with all details of SWT. Unfortunately creating such templates
is a tedious and time-consuming task. Consequently the number of such
code templates is rather small. However, code of existing applications
contains hundreds of frequently reoccurring code snippets that can be
extracted and shared among developers. This project will provide tools
that support developers finding (for instance) method call chains for
situations like "How do I get an instance of IStatusLineManager inside
a ViewPart" and will allow them to share such templates with other
developers.
<li><b>Crowd-sourced and Usage-Driven API Documentation</b>:<br>
API documentation, independent of how much time has been spent on
writing them, lacks the information how developers actually use these
APIs. This information, however, can be easily extracted from code that
uses the APIs in questions, and thus could be used to enrich existing
API documentation with real usage driven documentation. Code
Recommenders aims to develop tools for finding and sharing this kind of
knowledge among developers. <a
href="http://code-recommenders.blogspot.com/2010/03/problem-of-incomplete-javadocs.html">read
more...</a>
<li><b>Stacktrace Search Engine</b>:<br>
Exceptions occur. Apache Maven, for instance, reflects this reality by
providing wiki pages for frequently occurring build exceptions which
aim to explain <i>why</i> these exceptions may have occurred during a
Maven build and <i>how </i> to fix them. This concept is a pretty neat
idea but its potential is not exhausted yet. Currently the matching
between an exception occurring during a build and a wiki page is done
based on the <i>type</i> of the exception (e.g., <i>BuildException</i>,
<i>IllegalArgumentException'</i>etc.) This matching is rather
coarse-grained and neglects the fact that the same exception might
occur in many different locations and may be caused by many different
reasons. First experimental results have shown that leveraging much
more information like the stackframe elements and exceptions messages
etc. yield to a system that is capable to find very similar exceptions
and thus allows building a new kind of search engine for stacktraces.
This project aims to develop such a stacktrace search engine and
provide integrations of this engine into existing web platforms like
the Eclipse forums and others.
<li><b>API Misuse / Bug Detector</b>:<br>
When using APIs unfamiliar with we often misuse a given API, i.e., we
forget to call certain methods or pass wrong parameters to a method
call etc. These mistakes are hard to find and debug. Tools like PMD and
FindBugs do a great job on finding issues like NULL pointers, or
recommend overriding hashCode along with equals but aren't a big help
if framework specific usage rules are violated. However, research tools
exist that are capable to find <i>strange</i> API uses, i.e., usages
which significantly differ from how <i>most people used a certain
API</i> and thus may indicate possibly bugs in code. This project aims to
provide an evaluation for such tools and will provide an initial system
as baseline. <a href="http://tinyurl.com/34lz56c">read more</a>
</ol>
<p>However, the scope of the recommenders project is not limited to
such kind of tools and encourages the community discuss new ideas of
tools that might be helpful for software engineers.</p>
<h2>Initial Contribution</h2>
There are dozens of (research) projects that leverage collective
intelligence in one way or the other, and the code recommenders project
developed at Darmstadt University of Technology is just one of them.
However, an open vendor-neutral Eclipse project may be a perfect place
for these tools to contribute to Eclipse and to evaluate their
approaches within a vibrant user community. But every Eclipse incubator
project has to start with an initial contribution which will consist of
two existing recommender components. Each component was described in its
own blog post in detail, and we refer interested parties to these blog
posts and to the forum for further discussions of these tools.
<ol>
<li><a
href="http://code-recommenders.blogspot.com/2010/05/its-all-about-intelligent-code.html">Intelligent
Code Completion</a>
<li><a
href="http://code-recommenders.blogspot.com/2010/03/problem-of-incomplete-javadocs.html">Extended,
usage-driven Javadoc</a>
</ol>
<p>Components like the <i>Stacktrace search engine</i>, or <i>API
Usage bug detector</i> are under development yet and will follow when ready.
</p>
<p>The proposed namespace of the project will be <code>org.eclipse.recommenders.*</code>.
</p>
<h2>Description</h2>
<p>Goal of the (code) recommenders project is to build IDE tools
like intelligent code completion, extended API docs etc. that
continuously improve themselves by leveraging implicit and explicit
knowledge about how APIs are used by their clients, and, in turn, give
back this information to other developers to ease their work with new
and unfamiliar frameworks and development environments.</p>
<p>Current state of the initial contribution is that these systems
are fed more or less manually by an administrator that collects example
applications from large code repositories like EclipseSource's Yoxos and
then starts the analysis and data extraction process to build new
models. This approach may be further automated to leverage the already
existing infrastructure of the Eclipse Marketplace and P2 to
continuously scan and update API usages and build up-to-date models for
the Eclipse APIs.</p>
<p>Unfortunately, such a manual approach does not scale well if
potentially thousands of (non-eclipse-based) frameworks should be
supported. It is simply too difficult to find enough example
applications to make this approach work. Thus, in the long-term this
manual data collection process should be replaced by a community-driven
approach where users are allowed to voluntary share their knowledge
about how use these APIs either by giving explicit or implicit feedback
(cf. the <a
href="http://code-recommenders.blogspot.com/2010/08/eclipse-and-academia-briding-gap.html">position
paper about user feedback and information sharing</a>). Clearly, special
requirements for privacy have to be met so that no individual's private
or company´┐Żs critical data is collected or published. Different models
of data sharing have to be developed and discussed with the community.</p>
<p>As one of the first steps, a platform allowing developers to
share knowledge will be developed and the existing tools (i.e.,
intelligent code completion and usage-driven Javadocs) will be based on
these concepts as a proof of concept. A community driven approach may
follow.</p>
<!--
no legal issues
<h2>Legal Issues</h2>
--> <!--
Please describe any potential legal issues in this section. Does somebody else
own the trademark to the project name? Is there some issue that prevents you
from licensing the project under the Eclipse Public License? Are parts of the
code available under some other license? Are there any LGPL/GPL bits that you
absolutely require?
-->
<h2>Committers</h2>
<!--
List any initial committers that should be provisioned along with the
new project. Include affiliation, but do not include email addresses at
this point.
-->
<p>The following individuals are proposed as initial committers to
the project:</p>
The Code Recommenders project is developed at Darmstadt University of
Technology. The project is lead by Marcel Bruch and advised by Mira
Mezini. Although the number of initial committers is low, we expect this
set to quickly grow. The project itself was supported by more than 50
students doing various hands-on trainings, bachelor and master theses in
the past and future contributions will be made directly under the
proposed project. Thus, the initial committers will be
<dl>
<dt>Marcel Bruch, Darmstadt University of Technology</dt>
<dd>Project Lead</dd>
<dt>Mira Mezini, Darmstadt University of Technology</dt>
<dd>Project Management</dd>
<dt>Eric Bodden, Darmstadt University of Technology</dt>
<dd>Committer</dd>
<dt>Johannes Lerch</dt>
<dd>Committer</dd>
<dt>Dennis S&auml;nger</dt>
<dd>Committer</dd>
<dt>Sebastian Proksch</dt>
<dd>Committer</dd>
</dl>
<p>We welcome additional committers and contributions.</p>
<!--
Describe any initial contributions of code that will be brought to the
project. If there is no existing code, just remove this section.
-->
<h2>Mentors</h2>
<!--
New Eclipse projects require a minimum of two mentors from the Architecture
Council. You need to identify two mentors before the project is created. The
proposal can be posted before this section is filled in (it's a little easier
to find a mentor when the proposal itself is public).
-->
<p>The following Architecture Council members will mentor this
project:</p>
<ul>
<li><a href="http://aniszczyk.org">Chris Aniszczyk</a>, Red Hat</li>
<li><a href="http://eclipsesource.com">Jochen Krause</a>,
EclipseSource</li>
</ul>
<h2>Interested Parties</h2>
<!--
Provide a list of individuals, organisations, companies, and other Eclipse
projects that are interested in this project. This list will provide some
insight into who your project's community will ultimately include. Where
possible, include affiliations. Do not include email addresses.
-->
<p>The following individuals, organisations, companies and projects
have expressed interest in this project:</p>
<ul>
<li>Chris Aniszczyk, Red Hat</li>
<li>Fabian Steeg, University of Cologne</li>
<li>Benjamin Muskalla, Tasktop</li>
<li>Beyhan Veliev, EclipseSource</li>
<li>Holger Staudacher, EclipseSource</li>
<li>Zviki Cohen, nWire Software</li>
<li>Martin Robillard, McGill University, Montreal, Canada</li>
<li>Stefan Lay, SAP AG</li>
<li>Matthias Sohn, SAP AG</li>
<li>Frederic Madiot, Eclipse MoDisco</li>
<li>Maxime Jeanmart, JITT Consulting</li>
</ul>
<!--
<h2>Project Scheduling</h2>
--> <!--
Describe, in rough terms, what the basic scheduling of the project will
be. You might, for example, include an indication of when an initial contribution
should be expected, when your first build will be ready, etc. Exact
dates are not required.
-->
<h2>Additional Information</h2>
<ul>
<li><b>Blog</b>:<a href="http://code-recommenders.blogspot.com/">http://code-recommenders.blogspot.com</a>
<li><b>Project Homepage</b>:<a
href="http://www.stg.tu-darmstadt.de/research/core/">http://www.stg.tu-darmstadt.de/research/core/</a>
<li><b>Eclipselabs Project</b>: <a
href="http://eclipselabs.org/p/code-recommenders/">http://eclipselabs.org/p/code-recommenders/</a>
</ul>
<h2>Changes to this Document</h2>
<!--
List any changes that have occurred in the document here.
You only need to document changes that have occurred after the document
has been posted live for the community to view and comment.
-->
<table>
<tr>
<th>Date</th>
<th>Change</th>
</tr>
<tr>
<td>28-October-2010</td>
<td>Document created</td>
</tr>
<tr>
<td>22-November-2010</td>
<td>Updated Initial Contributions (added proposed namespace),
Interested Parties (added new interested parties), Mentors (added
second mentor), Committers (added three initial committers)</td>
</tr>
</table>
</body>
</html>