| <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> |
| <html> |
| <head> |
| <title>Eclipse Platform/Core</title> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
| <link rel="stylesheet" href="http://dev.eclipse.org/default_style.css" type="text/css"> |
| </head> |
| <body> |
| <center> |
| <font class=indextop>core</font><br> |
| <font class=indexsub>the foundation of the platform</font><p></p> |
| <a href="../main.html">[home]</a> |
| <a href="../documents.html">[documents]</a> |
| <a href="../downloads.html">[downloads]</a> |
| <a href="../resources.html">[resources]</a> |
| <a href="../planning.html">[planning]</a> |
| <a href="../testing.html">[testing]</a> |
| </center> |
| <br> |
| <table BORDER=0 CELLPADDING=2 WIDTH="100%" > |
| <tr> |
| <td ALIGN=LEFT VALIGN=TOP COLSPAN="2" BGCOLOR="#0080C0"><b><font face="Arial,Helvetica" color="#FFFFFF">History Store Modifications</font></b> </td> |
| </tr> |
| <tr> |
| <td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td> |
| <td WIDTH="98%"><b>Summary</b> |
| <p> The history store current implementation is said to be problem-prone |
| (data corruption) and hard to maintain. There is an ongoing effort to |
| replace it with something simpler. One of the options would be the adoption |
| of a third-party provided storage engine, such as a embedded database |
| or a b-tree engine. Another is to provide a very simple home-made solution |
| with as little (conceptual/memory/execution) overhead as possible. </p> |
| |
| <ul> |
| <li><p><em>October 29th</em> - tested a file system based solution (<em>take |
| 1</em>). Every project has its own history area. History for a file |
| will be stored in a relative directory similar to where the file is. |
| The difference is that, in order to prevent problems with path length |
| limitations on Windows, every segment is translated to a short fixed |
| length string (using a hash function). This opens opportunity for colisions, |
| and it is an one-way translation, so we need to remember the original |
| path for every state. We did this by keeping an additional .info file |
| for each file state to maintain that information. Result: adding new |
| state is really fast, but any tree traversals are very costly (we have |
| to open all .info files in every directory we look).</p></li> |
| <li><p><em>November 2nd</em> - tested an improved solution (<em>take 2</em>) |
| based on the previous approach. Instead of keeping an individual .info |
| file showing the original location for every state, we keep a index |
| table per directory. This drastically reduces the number of file openings |
| (not mentioning it takes much less space in the file system), having |
| a better performance for tree based operations. The poor performance |
| for copying history is understandable since we are actually copying |
| the states as well, whereas the original implementation links them (maybe |
| we should do the same). The bad performance when clearing the history |
| for a given resource is due to having to delete multiple files/directories |
| in a more complex directory structure than the original implementation |
| (a two-level tree, directories are never deleted). We could improve |
| that by postponing deleting directories to shutdown (making shutdown |
| slower instead).</p></li> |
| <li><p><em>November 17th</em> - using some ideas from the previous tries |
| and the existing code, we got a solution (<em>take 3</em>) that seems |
| to have all the characteristics we have been looking for: it is much |
| simpler than the current story, and generally performs (sometimes much) |
| better than the current implementation. We are using the existing BlobStore |
| (as the existing implementation) and per-bucket indexes, as in the previous |
| solutions. These indexes are just dictionaries having the file path |
| as key and a pair <UUID,timestamps> for every state belonging |
| to the file as value.</p></li> |
| <li><p><em>December 13th</em> - the solution above (known as <em>take 3</em>) |
| has been part of recent integration builds and will be in for M4. We are currently |
| investigating its use as a replacement for the previous property store implementation as well. |
| There are concerns w.r.t. scalability in using a single file to store all properties |
| for all files/folders under a given directory (since properties can be as big as 2K). |
| Some sort of segmentation strategy seems necessary to avoid such problems. |
| </p></li> |
| </ul> |
| </td> |
| </tr> |
| <tr> |
| <td ALIGN=RIGHT VALIGN=TOP WIDTH="2%"><img SRC="../images/Adarrow.gif" BORDER=0 height=16 width=16></td> |
| |
| <td WIDTH="98%"><b>Performance Comparisons</b> |
| <p>Following is a comparison of performance (execution time) for the most |
| common use cases:</p> |
| |
| <table width="70%" border="1"> |
| <tr> |
| <td width="60%"><strong> |
| <div align="center">Use cases</div> |
| </strong></td> |
| <td width="20%"><strong> |
| <div align="center">Original (ms)</div> |
| </strong></td> |
| <td width="20%"><strong> |
| <div align="center">Take 3 (ms)</div> |
| </strong></td> |
| </tr> |
| <tr> |
| <td><em>add state</em></td> |
| <td align="center"> |
| <div align="center">498</div></td> |
| <td align="center"> |
| <div align="center">481</div></td> |
| </tr> |
| <tr> |
| <td><em>get history for file</em></td> |
| <td align="center"> |
| <div align="center">1280</div></td> |
| <td align="center"> |
| <div align="center">16</div></td> |
| </tr> |
| <tr> |
| <td><em>find deleted members (100 files, 4 states per file)</em></td> |
| <td align="center">914</td> |
| <td align="center">46</td> |
| </tr> |
| <tr> |
| <td><em>find deleted members (20 files, 20 states per file)</em></td> |
| <td align="center">906</td> |
| <td align="center">15</td> |
| </tr> |
| <tr> |
| <td><em>find deleted members (4 files, 100 states per file)</em></td> |
| <td align="center"> |
| <div align="center">882</div></td> |
| <td align="center"> |
| <div align="center">15</div></td> |
| </tr> |
| <tr> |
| <td><em>clear history of a directory tree (100 files, 4 states per file)</em></td> |
| <td align="center">367</td> |
| <td align="center">42</td> |
| </tr> |
| <tr> |
| <td><em>clear history of a directory tree (20 files, 20 states per file)</em></td> |
| <td align="center">332</td> |
| <td align="center">187</td> |
| </tr> |
| <tr> |
| <td><em>clear history of a directory tree (4 files, 100 states per file)</em></td> |
| <td align="center"> |
| <div align="center">379</div></td> |
| <td align="center"> |
| <div align="center">179</div></td> |
| </tr> |
| <tr> |
| <td><em>copy history for a directory tree (100 files, 4 states per file)</em></td> |
| <td align="center">3350</td> |
| <td align="center">3250</td> |
| </tr> |
| <tr> |
| <td><em>copy history for a directory tree (20 files, 20 states per file)</em></td> |
| <td align="center">2210</td> |
| <td align="center">868</td> |
| </tr> |
| <tr> |
| <td><em>copy history for a directory tree (4 files, 100 states per file)</em></td> |
| <td align="center">4720</td> |
| <td align="center">453</td> |
| </tr> |
| <tr> |
| <td><em>apply cleaning policy</em></td> |
| <td align="center"> |
| <div align="center">tbd</div></td> |
| <td align="center"> |
| <div align="center">tbd</div></td> |
| </tr> |
| </table> </td> |
| </tr> |
| </table> |
| |
| |
| </body> |
| </html> |