blob: fcb732805401e6283855fbf92a7f227512f493a7 [file] [log] [blame]
<!DOCTYPE html>
<html lang='en' dir='auto'><head>
<meta charset='utf-8'>
<meta name='viewport' content='width=device-width, initial-scale=1'>
<meta name='description' content=''>
<meta name='theme-color' content='#ffcd00'>
<meta property='og:title' content='AERI Stacktraces • Eclipse DataEggs'>
<meta property='og:description' content=''>
<meta property='og:url' content='https://www.eclipse.org/dataeggs/aeri_stacktraces/'>
<meta property='og:site_name' content='Eclipse DataEggs'>
<meta property='og:type' content='website'><meta name='twitter:card' content='summary'>
<meta name="generator" content="Hugo 0.80.0" />
<title>AERI Stacktraces • Eclipse DataEggs</title>
<link rel='canonical' href='https://www.eclipse.org/dataeggs/aeri_stacktraces/'>
<link href="https://www.eclipse.org/dataeggs/aeri_stacktraces/index.xml" rel="alternate" type="application/rss+xml" title="Eclipse DataEggs" />
<link rel='icon' href='/dataeggs/favicon.ico'>
<link rel='stylesheet' href='/dataeggs/assets/css/main.ab98e12b.css'><link rel='stylesheet' href='/dataeggs/css/custom.css'><style>
:root{--color-accent:#ffcd00;}
</style>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-3675452-15', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
</head>
<body class='section type-aeri_stacktraces has-sidebar'>
<div class='site'><div id='sidebar' class='sidebar'>
<a class='screen-reader-text' href='#main-menu'>Skip to Main Menu</a>
<div class='container'><section class='widget widget-about sep-after'>
<header>
<div class='logo'>
<a href='/dataeggs/'>
<img src='/dataeggs/images/dataeggs-menu.png'>
</a>
</div>
<div class='desc'>
Open. Safe. Easy.
</div>
</header>
</section>
<section class='widget widget-search sep-after'>
<header>
<h4 class='title widget-title'>Search</h4>
</header>
<form action='/dataeggs/search' id='search-form' class='search-form'>
<label>
<span class='screen-reader-text'>Search</span>
<input id='search-term' class='search-term' type='search' name='q' placeholder='Search&hellip;'>
</label></form>
</section>
<section class='widget widget-sidebar_menu sep-after'><nav id='sidebar-menu' class='menu sidebar-menu' aria-label='Sidebar Menu'>
<div class='container'>
<ul><li class='item'>
<a href='/dataeggs/'>Home</a></li><li class='item'>
<a href='/dataeggs/privacy/'>Privacy</a></li><li class='item current'>
<a aria-current='page' href='/dataeggs/aeri_stacktraces/'>AERI</a></li><li class='item'>
<a href='/dataeggs/eclipse_mls/'>MLS</a></li><li class='item has-children'>
<a href=''>projects</a><button class='sub-menu-toggler'>
<span class='screen-reader-text'>expand sub menu</span>
<span class='sign'></span>
</button>
<ul class='sub-menu'><li class='item'>
<a href='/dataeggs/projects/ecd.che/datasets_report/'>ecd.che</a></li><li class='item'>
<a href='/dataeggs/projects/ee4j.glassfish/datasets_report/'>ee4j.glassfish</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.emf-parsley/datasets_report/'>modeling.emf-parsley</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.emfcompare/datasets_report/'>modeling.emfcompare</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.epsilon/datasets_report/'>modeling.epsilon</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.gendoc/datasets_report/'>modeling.gendoc</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.m2t.acceleo/datasets_report/'>modeling.m2t.acceleo</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.mdt.ocl/datasets_report/'>modeling.mdt.ocl</a></li><li class='item'>
<a href='/dataeggs/projects/modeling.sirius/datasets_report/'>modeling.sirius</a></li><li class='item'>
<a href='/dataeggs/projects/technology.apogy/datasets_report/'>technology.apogy</a></li><li class='item'>
<a href='/dataeggs/projects/technology.app4mc/datasets_report/'>technology.app4mc</a></li><li class='item'>
<a href='/dataeggs/projects/technology.collections/datasets_report/'>technology.collections</a></li><li class='item'>
<a href='/dataeggs/projects/technology.ease/datasets_report/'>technology.ease</a></li><li class='item'>
<a href='/dataeggs/projects/technology.egit/datasets_report/'>technology.egit</a></li><li class='item'>
<a href='/dataeggs/projects/technology.epf/datasets_report/'>technology.epf</a></li><li class='item'>
<a href='/dataeggs/projects/technology.jgit/datasets_report/'>technology.jgit</a></li><li class='item'>
<a href='/dataeggs/projects/technology.paho/datasets_report/'>technology.paho</a></li><li class='item'>
<a href='/dataeggs/projects/tools.cdt/datasets_report/'>tools.cdt</a></li><li class='item'>
<a href='/dataeggs/projects/tools.tracecompass/datasets_report/'>tools.tracecompass</a></li></ul></li></ul>
</div>
</nav>
</section><section class='widget widget-social_menu sep-after'><nav aria-label='Social Menu'>
<ul><li>
<a href='https://gitlab.eclipse.org/eclipse/dataeggs/dataeggs' target='_blank' rel='noopener me'>
<span class='screen-reader-text'>Open Gitlab account in new tab</span><svg class='icon' xmlns='http://www.w3.org/2000/svg' viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<title>GitLab icon</title> <path d="M22.65 14.39L12 22.13 1.35 14.39a.84.84 0 0 1-.3-.94l1.22-3.78 2.44-7.51A.42.42 0 0 1 4.82 2a.43.43 0 0 1 .58 0 .42.42 0 0 1 .11.18l2.44 7.49h8.1l2.44-7.51A.42.42 0 0 1 18.6 2a.43.43 0 0 1 .58 0 .42.42 0 0 1 .11.18l2.44 7.51L23 13.45a.84.84 0 0 1-.35.94z"/>
</svg>
</a>
</li><li>
<a href='mailto:boris@chrysalice.org' target='_blank' rel='noopener me'>
<span class='screen-reader-text'>Contact via Email</span><svg class='icon' xmlns='http://www.w3.org/2000/svg' viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<path d="M4 4h16c1.1 0 2 .9 2 2v12c0 1.1-.9 2-2 2H4c-1.1 0-2-.9-2-2V6c0-1.1.9-2 2-2z"></path><polyline points="22,6 12,13 2,6"></polyline>
</svg>
</a>
</li></ul>
</nav>
</section></div>
<div class='sidebar-overlay'></div>
</div><div class='main'><a class='screen-reader-text' href='#content'>Skip to Content</a>
<button id='sidebar-toggler' class='sidebar-toggler' aria-controls='sidebar'>
<span class='screen-reader-text'>Toggle Sidebar</span>
<span class='open'><svg class='icon' xmlns='http://www.w3.org/2000/svg' viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<line x1="3" y1="12" x2="21" y2="12" />
<line x1="3" y1="6" x2="21" y2="6" />
<line x1="3" y1="18" x2="21" y2="18" />
</svg>
</span>
<span class='close'><svg class='icon' xmlns='http://www.w3.org/2000/svg' viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<line x1="18" y1="6" x2="6" y2="18" />
<line x1="6" y1="6" x2="18" y2="18" />
</svg>
</span>
</button><div class='header-widgets'>
<div class='container'>
<style>.widget-breadcrumbs li:after{content:'\2f '}</style>
<section class='widget widget-breadcrumbs sep-after'>
<nav id='breadcrumbs'>
<ol><li><a href='/dataeggs/'>Home</a></li><li><span>AERI</span></li></ol>
</nav>
</section></div>
</div>
<header id='header' class='header site-header'>
<div class='container sep-after'>
<div class='header-info'><p class='site-title title'>Eclipse DataEggs</p><p class='desc site-desc'></p>
</div>
</div>
</header>
<main id='content'>
<header class='header'>
<div class='container sep-after'>
<div class='header-info'>
<h1 class='title'>AERI Stacktraces</h1>
</div>
</div>
</header>
<div class='entry'>
<div class='container entry-content'>
<p>The <strong>AERI stacktraces dataset</strong> is a list of exceptions encountered by users in the Eclipse IDE, as retrieved by the AERI system. The <a href="https://wiki.eclipse.org/EPP/Logging">Automated Error Reporting</a> (AERI) system has been developed by the people at <a href="https://www.codetrails.com/">Code Trails</a> and retrieves information about exceptions. It was installed by default in the Eclipse IDE and has helped hundreds of projects better support their users and resolve bugs. This dataset is a dump of all records over a couple of years, with useful information about the exceptions and environment.
Last update of the dataset occured on 2018-02-11.</p>
<p>Structure:</p>
<ul>
<li><strong>Incidents</strong> When an exception occurs and is trapped by the AERI system, it constitutes an incident (or error report). An incident can be reported by several different people, can be reported multiple times, and can be linked to different environments.</li>
<li><strong>Problems</strong> As soon as an error report arrives on the server, it will be analyzed and subsequently assigned to one or more problems. A problem thus represents a set of (similar) error reports which usually have the same root cause – for example a bug in your software. (Extract from the now-AERI system documentation)</li>
</ul>
<p>This dataset is published under the <a href="https://creativecommons.org/licenses/by-sa/4.0/">Creative Commons BY-Attribution-Share Alike 4.0 (International) licence</a>.</p>
<h2 id="downloads">Downloads</h2>
<ul>
<li><strong>Problems full</strong> [ <a href="https://download.eclipse.org/dataeggs/aeri_stacktraces/problems_full.tar.bz2">Download JSON</a> ] &ndash; A list of all problems, exported as JSON (one problem per file).
<ul>
<li>Content: 125250 entries, 22 attributes</li>
<li>Size: 38M compressed, 904M raw</li>
</ul>
</li>
<li><strong>Problems extract</strong> [ <a href="https://download.eclipse.org/dataeggs/aeri_stacktraces/problems_extract.csv.bz2">Download CSV</a> ] &ndash; A list of all problems, exported as CSV (one big file).
<ul>
<li>Content: 125250 entries, 22 attributes</li>
<li>Size: 1.5M compressed, 14M raw</li>
</ul>
</li>
<li><strong>Incidents full</strong> [ <a href="https://download.eclipse.org/dataeggs/aeri_stacktraces/incidents_full.tar.bz2">Download JSON</a> ] &ndash; A list of all incidents, exported as JSON (one incident per file).
<ul>
<li>Content: 2084363 entries, 22 attributes</li>
<li>Size: 820M compressed, 19G raw</li>
</ul>
</li>
<li><strong>Incidents extract</strong> [ <a href="https://download.eclipse.org/dataeggs/aeri_stacktraces/incidents_extract.csv.bz2">Download CSV</a> ] &ndash; A list of all incidents, exported as CSV (one big file).
<ul>
<li>Content: 2084045 entries, 20 attributes</li>
<li>Size: 141M compressed, 778M raw</li>
</ul>
</li>
<li><strong>Incidents Bundles</strong> [ <a href="https://download.eclipse.org/dataeggs/aeri_stacktraces/incidents_bundles_extract.csv.bz2">Download CSV</a> ] &ndash; A list of all bundles found in incidents, exported as CSV. Attributes are bundle_name, bundle_version, and number of occurrences.
<ul>
<li>Content: 29709 entries, 3 attributes</li>
<li>Size: 220K compressed, 1.5M raw</li>
</ul>
</li>
</ul>
<h2 id="documentation">Documentation</h2>
<ul>
<li><strong>Stacktraces Problems analysis document</strong> [ <a href="problems_analysis.pdf">Download PDF</a> | <a href="problems_analysis.rmd">Download Rmd</a> ] &ndash; A R Markdown document to analyse the Stacktraces problem dataset, with description of the actual content and examples of usage.</li>
<li><strong>Stacktraces Incidents analysis document</strong> [ <a href="incidents_analysis.pdf">Download PDF</a> | <a href="incidents_analysis.rmd">Download Rmd</a> ] &ndash; A R Markdown document to analyse the Stacktraces incidents dataset, with description of the actual content and examples of usage.</li>
</ul>
<h2 id="privacy-concerns">Privacy concerns</h2>
<p>See also the documentation about <a href="../privacy">privacy in our datasets</a>.</p>
<p>The result contains no email address, user id or machine id. Rather than removing the information (we are not sure that we remove all required information) we decided to simply pick relevant information from the file and push it into the output.</p>
<p>End users have an option to keep their own class names private. We have presently no simple means to know what stacktraces in the database extraction should be kept private, so we decided to play it safe and hide class names whose packages don&rsquo;t start with known prefixes [1]. All private classnames have been replaced by the HIDDEN keyword.</p>
<p>[1] <code>&quot;ch.qos.*&quot;, &quot;com.cforcoding.*&quot;, &quot;com.google.*&quot;, &quot;com.gradleware.tooling.*&quot;, &quot;com.mountainminds.eclemma.*&quot;, &quot;com.naef.*&quot;, &quot;com.sun.*&quot;, &quot;java.*&quot;, &quot;javafx.*&quot;, &quot;javax.*&quot;, &quot;org.apache.*&quot;, &quot;org.eclipse.*&quot;, &quot;org.fordiac.*&quot;, &quot;org.gradle.*&quot;, &quot;org.jacoco.*&quot;, &quot;org.osgi.*&quot;, &quot;org.slf4j.*&quot;, &quot;sun.*&quot; </code></p>
<h2 id="format-problems">Format: problems</h2>
<pre><code>{
&quot;summary&quot;: &quot;&quot;,
&quot;osgiArch&quot;: &quot;&quot;,
&quot;osgiOs&quot;: &quot;&quot;,
&quot;osgiOsVersion&quot;: &quot;&quot;,
&quot;osgiWs&quot;: &quot;&quot;,
&quot;eclipseBuildId&quot;: &quot;&quot;,
&quot;eclipseProduct&quot;: &quot;&quot;,
&quot;javaRuntimeVersion&quot;: &quot;&quot;,
&quot;numberOfIncidents&quot;: 0,
&quot;numberOfReporters&quot;: 74,
&quot;stacktraces&quot;: [
[ &quot;stacktrace for incident&quot; ],
[ &quot;stacktrace for cause&quot; ],
[ &quot;stacktrace for exception&quot; ]
]
}
</code></pre>
<h2 id="format-incidents">Format: incidents</h2>
<pre><code>{
&quot;eclipseBuildId&quot;:&quot;4.6.1.M20160907-1200&quot;,
&quot;eclipseProduct&quot;:&quot;org.eclipse.epp.package.jee.product&quot;,
&quot;javaRuntimeVersion&quot;:&quot;1.8.0_112-b15&quot;,
&quot;osgiArch&quot;:&quot;x86_64&quot;,
&quot;osgiOs&quot;:&quot;Windows7&quot;,
&quot;osgiOsVersion&quot;:&quot;6.1.0&quot;,
&quot;osgiWs&quot;:&quot;win32&quot;,
&quot;stacktraces&quot;:[
[ &quot;stacktrace&quot; ]
],
&quot;summary&quot;: &quot;Failed to retrieve default libraries for jre1.8.0_111&quot;
}
</code></pre>
<h2 id="format-stacktraces">Format: Stacktraces</h2>
<p>The structure used in the mongodb for stacktraces has been kept as is: it is composed of fields with all information relevant to each line of the stacktrace. Each stacktrace is an array of objects as shown below:</p>
<pre><code>[
{
&quot;cN&quot;: &quot;sun.net.www.http.HttpClient&quot;,
&quot;mN&quot;: &quot;parseHTTPHeader&quot;,
&quot;fN&quot;: &quot;HttpClient.java&quot;,
&quot;lN&quot;: 786,
}
]
</code></pre>
<h2 id="generation">Generation</h2>
<p>The database dump is composed of several mongodb tables and uses the bson format. Only two tables contain stack traces: <code>problems</code> and <code>incidents</code>.</p>
<p>The bson files can be read using the bsondump utility, provided with the mongodb client package (mongodb-clients on Debian).</p>
<pre><code>bsondump problems.bson --type json &gt; problems.json
</code></pre><p>After conversion the two files are quite big: 37GB for incidents and 2.1 GB for problems.</p>
<p>Unfortunately the utility adds some progress information in the UI that needs to be removed from the output:</p>
<pre><code>grep -v 'Progress: ' problems.json &gt; problems_clean.json
</code></pre><p>We also had to remove a few (approx. a dozen of) lines because they embed unparseable source code, characters or asian/binary/utf8/16/256 text. The script tries to JSON-decode all lines one by one, and on failure simply goes to the next line.</p>
<p>For <code>problems</code> (the file is reasonably small) the script generates for each line a separate JSON file with only information related to that line. The script for problems extraction is <code>parse_json_problems.pl</code>. Output is 820MB and processing time is roughly 45mn.</p>
<p>For <code>incidents</code> (file is 37GB) the script generates for each line a separate JSON file with only information related to that line. For the records, trying to generate a single file requires at least twice the size of the file in RAM/SWAP (i.e. roughly 74GB). There are 2084328 files in the output for 17GB. The script for incidents extraction is <code>parse_json_incidents.pl</code>. To get an idea of the resources required to process that, the final incidents extraction took roughly 16h on a quite powerful box.</p>
</div>
</div>
<div class='container list-container'>
<ul class='list'>
</ul>
</div>
</main>
<footer id='footer' class='footer'>
<div class='container sep-before'><div class="row">
<div class="column">
<a href="http://www.eclipse.org/" target="_blank"><img src="/dataeggs/images/logo-eclipse-foundation.png" alt="Eclipse Foundation logo"></a>
</div>
<div class="column">
<p></p>
<p id="copyright">Copyright © 2021 Eclipse Foundation, Inc.<br>All Rights Reserved.</p>
</div>
</div>
<div class="row">
<p><a href="http://www.eclipse.org/legal/privacy.php" target="_blank">Privacy Policy</a> /
<a href="http://eclipse.org/" target="_blank">Eclipse</a> /
<a href="http://www.eclipse.org/legal/termsofuse.php" target="_blank">Terms of Use</a> /
<a href="http://www.eclipse.org/legal/copyright.php" target="_blank">Copyright Agent</a> /
<a href="http://www.eclipse.org/legal/" target="_blank">Legal</a> /
<a href="http://www.eclipse.org/org/foundation/contact.php" target="_blank"> Contact Us</a></p>
</div>
</div>
</footer>
</div>
</div><script>window.__assets_js_src="/dataeggs/assets/js/"</script>
<script src='/dataeggs/assets/js/main.c3bcf2df.js'></script><script src='/dataeggs/js/custom.js'></script>
</body>
</html>