blob: 71068765461aa4adf7f3fc4744cc1feb18e22598 [file] [log] [blame]
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
<title>Eclipse Platform/Core</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://dev.eclipse.org/default_style.css" type="text/css">
</head>
<body>
<center>
<font class=indextop>core</font><br>
<font class=indexsub>the foundation of the platform</font><p></p>
<a href="../main.html">[home]</a>
<a href="../documents.html">[documents]</a>
<a href="../downloads.html">[downloads]</a>
<a href="../resources.html">[resources]</a>
<a href="../planning.html">[planning]</a>
<a href="../testing.html">[testing]</a>
</center>
<br>
<table BORDER=0 CELLPADDING=2 WIDTH="100%" >
<tr>
<td ALIGN=LEFT VALIGN=TOP COLSPAN="2" BGCOLOR="#0080C0"><b><font face="Arial,Helvetica" color="#FFFFFF">History Store Modifications</font></b> </td>
</tr>
<tr>
<td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td>
<td WIDTH="98%"><b>Summary</b>
<p> The history store current implementation is said to be problem-prone
(data corruption) and hard to maintain. There is an ongoing effort to
replace it with something simpler. One of the options would be the adoption
of a third-party provided storage engine, such as a embedded database
or a b-tree engine. Another is to provide a very simple home-made solution
with as little (conceptual/memory/execution) overhead as possible. </p>
<ul>
<li><p><em>October 29th</em> - tested a file system based solution (<em>take
1</em>). Every project has its own history area. History for a file
will be stored in a relative directory similar to where the file is.
The difference is that, in order to prevent problems with path length
limitations on Windows, every segment is translated to a short fixed
length string (using a hash function). This opens opportunity for colisions,
and it is an one-way translation, so we need to remember the original
path for every state. We did this by keeping an additional .info file
for each file state to maintain that information. Result: adding new
state is really fast, but any tree traversals are very costly (we have
to open all .info files in every directory we look).</p></li>
<li><p><em>November 2nd</em> - tested an improved solution (<em>take 2</em>)
based on the previous approach. Instead of keeping an individual .info
file showing the original location for every state, we keep a index
table per directory. This drastically reduces the number of file openings
(not mentioning it takes much less space in the file system), having
a better performance for tree based operations. The poor performance
for copying history is understandable since we are actually copying
the states as well, whereas the original implementation links them (maybe
we should do the same). The bad performance when clearing the history
for a given resource is due to having to delete multiple files/directories
in a more complex directory structure than the original implementation
(a two-level tree, directories are never deleted). We could improve
that by postponing deleting directories to shutdown (making shutdown
slower instead).</p></li>
<li><p><em>November 17th</em> - using some ideas from the previous tries
and the existing code, we got a solution (<em>take 3</em>) that seems
to have all the characteristics we have been looking for: it is much
simpler than the current story, and generally performs (sometimes much)
better than the current implementation. We are using the existing BlobStore
(as the existing implementation) and per-bucket indexes, as in the previous
solutions. These indexes are just dictionaries having the file path
as key and a pair &lt;UUID,timestamps&gt; for every state belonging
to the file as value.</p></li>
<li><p><em>December 13th</em> - the solution above (known as <em>take 3</em>)
has been part of recent integration builds and will be in for M4. We are currently
investigating its use as a replacement for the previous property store implementation as well.
There are concerns w.r.t. scalability in using a single file to store all properties
for all files/folders under a given directory (since properties can be as big as 2K).
Some sort of segmentation strategy seems necessary to avoid such problems.
</p></li>
</ul>
</td>
</tr>
<tr>
<td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td>
<td WIDTH="98%"><b>Performance Comparisons</b>
<p>Following is a comparison of performance (execution time) for the most
common use cases:</p>
<table width="70%" border="1">
<tr>
<td width="60%"><strong>
<div align="center">Use cases</div>
</strong></td>
<td width="20%"><strong>
<div align="center">Original (ms)</div>
</strong></td>
<td width="20%"><strong>
<div align="center">Take 3 (ms)</div>
</strong></td>
</tr>
<tr>
<td><em>add state</em></td>
<td align="center">
<div align="center">498</div></td>
<td align="center">
<div align="center">481</div></td>
</tr>
<tr>
<td><em>get history for file</em></td>
<td align="center">
<div align="center">1280</div></td>
<td align="center">
<div align="center">16</div></td>
</tr>
<tr>
<td><em>find deleted members (100 files, 4 states per file)</em></td>
<td align="center">914</td>
<td align="center">46</td>
</tr>
<tr>
<td><em>find deleted members (20 files, 20 states per file)</em></td>
<td align="center">906</td>
<td align="center">15</td>
</tr>
<tr>
<td><em>find deleted members (4 files, 100 states per file)</em></td>
<td align="center">
<div align="center">882</div></td>
<td align="center">
<div align="center">15</div></td>
</tr>
<tr>
<td><em>clear history of a directory tree (100 files, 4 states per file)</em></td>
<td align="center">367</td>
<td align="center">42</td>
</tr>
<tr>
<td><em>clear history of a directory tree (20 files, 20 states per file)</em></td>
<td align="center">332</td>
<td align="center">187</td>
</tr>
<tr>
<td><em>clear history of a directory tree (4 files, 100 states per file)</em></td>
<td align="center">
<div align="center">379</div></td>
<td align="center">
<div align="center">179</div></td>
</tr>
<tr>
<td><em>copy history for a directory tree (100 files, 4 states per file)</em></td>
<td align="center">3350</td>
<td align="center">3250</td>
</tr>
<tr>
<td><em>copy history for a directory tree (20 files, 20 states per file)</em></td>
<td align="center">2210</td>
<td align="center">868</td>
</tr>
<tr>
<td><em>copy history for a directory tree (4 files, 100 states per file)</em></td>
<td align="center">4720</td>
<td align="center">453</td>
</tr>
<tr>
<td><em>apply cleaning policy</em></td>
<td align="center">
<div align="center">tbd</div></td>
<td align="center">
<div align="center">tbd</div></td>
</tr>
</table> </td>
</tr>
</table>
</body>
</html>