blob: 64f00c3ae090b169e83a24437aa5c3f1aff0f06d [file] [log] [blame]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Eclipse Platform/Core</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="http://dev.eclipse.org/default_style.css" type="text/css">
</head>
<body>
<center>
<font class=indextop>core</font><br>
<font class=indexsub>the foundation of the platform</font><p></p>
<a href="../main.html">[home]</a>
<a href="../documents.html">[documents]</a>
<a href="../downloads.html">[downloads]</a>
<a href="../resources.html">[resources]</a>
<a href="../planning.html">[planning]</a>
<a href="../testing.html">[testing]</a>
</center>
<br>
<table BORDER=0 CELLPADDING=2 WIDTH="100%" >
<tr>
<td ALIGN=LEFT VALIGN=TOP COLSPAN="2" BGCOLOR="#0080C0"><b><font face="Arial,Helvetica" color="#FFFFFF">History Store Modifications</font></b> </td>
</tr>
<tr>
<td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td>
<td WIDTH="98%"><b>Summary</b>
<p> The history store current implementation is said to be problem-prone
(data corruption) and hard to maintain. There is an ongoing effort to
replace it with something simpler. One of the options would be the adoption
of a third-party provided storage engine, such as a embedded database
or a b-tree engine. Another is to provide a very simple home-made solution
with as little (conceptual/memory/execution) overhead as possible. </p>
<ul>
<li>
<p><em>2004/10/29</em> - tested a file system based solution (<em>take
1</em>). Every project has its own history area. History for a file
will be stored in a relative directory similar to where the file is.
The difference is that, in order to prevent problems with path length
limitations on Windows, every segment is translated to a short fixed
length string (using a hash function). This opens opportunity for
colisions, and it is an one-way translation, so we need to remember
the original path for every state. We did this by keeping an additional
.info file for each file state to maintain that information. Result:
adding new state is really fast, but any tree traversals are very
costly (we have to open all .info files in every directory we look).</p>
</li>
<li>
<p><em>2004/11/02</em> - tested an improved solution (<em>take 2</em>)
based on the previous approach. Instead of keeping an individual .info
file showing the original location for every state, we keep a index
table per directory. This drastically reduces the number of file openings
(not mentioning it takes much less space in the file system), having
a better performance for tree based operations. The poor performance
for copying history is understandable since we are actually copying
the states as well, whereas the original implementation links them
(maybe we should do the same). The bad performance when clearing the
history for a given resource is due to having to delete multiple files/directories
in a more complex directory structure than the original implementation
(a two-level tree, directories are never deleted). We could improve
that by postponing deleting directories to shutdown (making shutdown
slower instead).</p>
</li>
<li>
<p><em>2004/11/17</em> - using some ideas from the previous tries and
the existing code, we got a solution (<em>take 3</em>) that seems
to have all the characteristics we have been looking for: it is much
simpler than the current story, and generally performs (sometimes
much) better than the current implementation. We are using the existing
BlobStore (as the existing implementation) and per-bucket indexes,
as in the previous solutions. These indexes are just dictionaries
having the file path as key and a pair &lt;UUID,timestamps&gt; for
every state belonging to the file as value.</p>
</li>
<li>
<p><em>2004/12/13</em> - the solution above (known as <em>take 3</em>)
has been part of recent integration builds and will be in for M4.
We are currently investigating its use as a replacement for the previous
property store implementation as well. There are concerns w.r.t. scalability
in using a single file to store all properties for all files/folders
under a given directory (since properties can be as big as 2K). Some
sort of segmentation strategy seems necessary to avoid such problems.</p>
</li>
<li><em>2005/02/02</em> - Eclipse 3.1 M5 - both history store and property
store implementations were moved to a new infrastructure. The existing
indexed store has been moved to a separate fragment, only used for supporting
backward compatibility with old workspaces. Performance tests are part
of the org.eclipse.core.tests.resources project.</li>
</ul>
</td>
</tr>
<tr>
<td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td>
<td WIDTH="98%"><b>Performance Comparisons</b>
<p>Following is a comparison of performance (execution time) for the most
common use cases:</p>
<table width="70%" border="1">
<tr>
<td width="60%"><strong>
<div align="center">Use cases</div>
</strong></td>
<td width="20%"><strong>
<div align="center">Original (ms)</div>
</strong></td>
<td width="20%"><strong>
<div align="center">Take 3 (ms)</div>
</strong></td>
</tr>
<tr>
<td><em>add state</em></td>
<td align="center">
<div align="center">498</div></td>
<td align="center">
<div align="center">481</div></td>
</tr>
<tr>
<td><em>get history for file</em></td>
<td align="center">
<div align="center">1280</div></td>
<td align="center">
<div align="center">16</div></td>
</tr>
<tr>
<td><em>find deleted members (100 files, 4 states per file)</em></td>
<td align="center">914</td>
<td align="center">46</td>
</tr>
<tr>
<td><em>find deleted members (20 files, 20 states per file)</em></td>
<td align="center">906</td>
<td align="center">15</td>
</tr>
<tr>
<td><em>find deleted members (4 files, 100 states per file)</em></td>
<td align="center">
<div align="center">882</div></td>
<td align="center">
<div align="center">15</div></td>
</tr>
<tr>
<td><em>clear history of a directory tree (100 files, 4 states per file)</em></td>
<td align="center">367</td>
<td align="center">42</td>
</tr>
<tr>
<td><em>clear history of a directory tree (20 files, 20 states per file)</em></td>
<td align="center">332</td>
<td align="center">187</td>
</tr>
<tr>
<td><em>clear history of a directory tree (4 files, 100 states per file)</em></td>
<td align="center">
<div align="center">379</div></td>
<td align="center">
<div align="center">179</div></td>
</tr>
<tr>
<td><em>copy history for a directory tree (100 files, 4 states per file)</em></td>
<td align="center">3350</td>
<td align="center">3250</td>
</tr>
<tr>
<td><em>copy history for a directory tree (20 files, 20 states per file)</em></td>
<td align="center">2210</td>
<td align="center">868</td>
</tr>
<tr>
<td><em>copy history for a directory tree (4 files, 100 states per file)</em></td>
<td align="center">4720</td>
<td align="center">453</td>
</tr>
<tr>
<td><em>apply cleaning policy</em></td>
<td align="center">
<div align="center">tbd</div></td>
<td align="center">
<div align="center">tbd</div></td>
</tr>
</table> </td>
</tr>
</table>
</body>
</html>