|  | <html> | 
|  | <head> | 
|  | </head> | 
|  | <body> | 
|  | <h1>Workspace lock granularity</h1> | 
|  | <h2>The problem</h2> | 
|  | <ul> | 
|  | <li>The current workspace lock is heavy handed.  There is a single write lock for the entire | 
|  | workspace, and it is generally acquired for significant periods of time.  This can block | 
|  | other processes for several minutes.  When the lock is acquired, no other threads | 
|  | can write to the workspace in any way. The write lock does not interfere with workspace | 
|  | readers, which are always allowed to proceed. | 
|  | </li> | 
|  | <p> | 
|  | <li> | 
|  | Processes often don't know in advance whether writes are necessary, so often the | 
|  | lock is acquired for long periods even when no writing occurs (for example on searches). | 
|  | </li> | 
|  | <p> | 
|  | <li> | 
|  | The workspace locking mechanism is currently tied to the mechanism for batching | 
|  | workspace changes.  We widely advertise that compound workspace changes should | 
|  | be done within an IWorkspaceRunnable.  The workspace is always locked for the | 
|  | duration of the IWorkspaceRunnable even if it is not making changes to the workspace. | 
|  | </li> | 
|  | <p> | 
|  | <li> | 
|  | In the absence of other sync mechanisms, other clients use the workspace lock | 
|  | for working with operations that don't necessarily even operate on resources. | 
|  | </li> | 
|  | <p> | 
|  | <li> | 
|  | To avoid deadlock, many clients acquire the workspace lock "just in case", before | 
|  | acquiring their own private lock mechanism. | 
|  | </li> | 
|  | <p> | 
|  | <li> | 
|  | Although this problem is most evident with the workspace lock, there are other | 
|  | heavy-handed locks in Eclipse with which the same issues appear (the CVS lock is | 
|  | another example). | 
|  | </li> | 
|  | </ul> | 
|  |  | 
|  | <h2>Proposed Solution</h2> | 
|  | <p> | 
|  | The current workspace lock is very coarse-grained both in how long the lock is held, | 
|  | and the scope of the resources that are locked by it.  To achieve better concurrency | 
|  | in the workspace, we need to attack the lock's physical granularity, its temporal | 
|  | granularity, or some combination of the two. | 
|  | </p> | 
|  | <p> | 
|  | We can improve the lock's temporal granularity by removing any public, general way | 
|  | of locking for an arbitrary period of time. I.e., we make the simple change of not | 
|  | acquiring the lock for IWorkspaceRunnables. (or even better, deprecate IWorkspaceRunnable entirely). | 
|  | </p> | 
|  | <p> | 
|  | Internally, the workspace would continue to use a single lock in critical regions to ensure | 
|  | the integrity of individual API calls.  However, a systematic effort would be made to | 
|  | break down the duration of locking within those calls, where possible.  For example, | 
|  | a best-effort copy method does not need to lock the workspace for its entire duration, | 
|  | but only for each unit of work where a new resource is created at the destination. | 
|  | </p> | 
|  | <p> | 
|  | The underlying design rule is to prevent an atomic operation from locking | 
|  | the workspace for a long period of time.  Without the presence of long term locks, | 
|  | long running concurrent operations will interleave their access to the workspace, | 
|  | enabling several workspace-modifying operations to run at the same time. | 
|  | </p> | 
|  | <p> | 
|  | This solution is optimistic. It assumes that concurrent writes to the same set | 
|  | of resources in a conflicting manner are relatively rare. One could in fact go further, and | 
|  | say that if concurrent operations are modifying the same resources at the same time, | 
|  | then it is either a user or programming error, and the concurrent tasks should just fail. | 
|  | Serves them right for trying to do contradictory things on the same set of resources. | 
|  | </p> | 
|  | <p> | 
|  | However, this philosophy is not acceptable without a very strong visual indication | 
|  | to the user of what is happening and when it will be finished.  I.e., if we had a progress | 
|  | dialog saying, "Resource X is being modified, and I will let you know when I'm done", | 
|  | then it might be acceptable to blame the user if they start a conflicting task before | 
|  | it finishes.  Some of the drawbacks of this approach are: | 
|  | <ol style="a"> | 
|  | <li>The dialog is a distraction to the user in the cases where the user | 
|  | doesn't care when the task ends and is not waiting for its result</li> | 
|  | <li>This interaction style puts the onus on the user to avoid making mistakes.  </li> | 
|  | <li>In many cases the user doesn't know what resources are modified by a given | 
|  | operation. I.e., it is unrealistic to assume that users can compute the intersection | 
|  | between the set of resources that all current background threads may modify | 
|  | and the set of resources that might be modified by another task they are thinking | 
|  | of starting.</li> | 
|  | <li>The penalty to the user for making a mistake can be severe.  For example, | 
|  | if the user starts a CVS commit operation, and then, thinking the commit is about done, | 
|  | decides to delete the project, they will be unhappy if the deletion started before | 
|  | the commit finished.</li> | 
|  | </ol> | 
|  | <p> | 
|  | So, how do we schedule concurrent jobs in a way that prevents conflicts without | 
|  | employing a single long term lock? We can introduce a scheme where | 
|  | jobs can specify in advance whether they need exclusive access to a resource. | 
|  | That is, each job can optionally supply a <i>scheduling rule</i> that is used by | 
|  | the job scheduler when making decisions about which jobs to run.  The API for these | 
|  | rules would look something like this: | 
|  | <pre> | 
|  | public interface ISchedulingRule { | 
|  | public boolean isConflicting(ISchedulingRule); | 
|  | } | 
|  | </pre> | 
|  | <p> | 
|  | While these rules would remain generic at the job scheduling level, the workspace | 
|  | can introduce some standard rules.  For example, a rule could request an array | 
|  | of resources, or the entire workspace.  In this way, finer-grained portions of the workspace | 
|  | can be effectively locked by a job. | 
|  | </p> | 
|  | <p> | 
|  | The contract on these rules would be as follows: the job scheduling mechanism | 
|  | guarantees that a job won't be started if there is a job currently running that conflicts | 
|  | with its scheduling rule.  This scheduling rule would be orthogonal to any locking | 
|  | mechanism, thus avoiding some of the problems discussed earlier with regard | 
|  | to pre-specification of locks. We still need to revisit our previous objections to | 
|  | pre-specified locks to see how they apply to scheduling rules: | 
|  | <ul> | 
|  | <li>Scheduling performance. The scheduling rules need to be resolved every time | 
|  | a job is to be run.  It is true that this will still impose an overhead on scheduling. | 
|  | Most of the time, however, there will be few or no jobs running so very little rule | 
|  | analysis is needed.  As the number of concurrent operations grows, the cost will | 
|  | increase.  However, this has the nice property that it is fast most of the time, and | 
|  | only starts to slow down in the uncommon case where the system is already very busy anyway. | 
|  | </li> | 
|  | <li>How to know what rules to use?  In some cases it is hard to predict what resources | 
|  | will be changed by a given unit of work.  When third party API is called, there is often | 
|  | no contract saying what will be touched.  This is still an issue.  In the worst case, | 
|  | clients will err on the side of caution and specify that the entire workspace is needed. | 
|  | We will need to educate users to specify rules carefully, and if necessary to break | 
|  | jobs up into separate units with different rules.  We will introduce a way of creating | 
|  | composite jobs that will be run in a specific order, but each job can have its | 
|  | own scheduling rules. | 
|  | </li> | 
|  | <li>How to enforce that rules are followed?  The proposal is that we don't enforce | 
|  | anything.  That is, a job can request arbitrary locks regardless of what rules were | 
|  | stated.  To avoid deadlock, lock ordering with release and wait (described earlier) | 
|  | will be used.  The theory is that if jobs are well behaved and specify their rules | 
|  | correctly, deadlock will be rare, so release and wait will rarely be necessary.  To put | 
|  | it another way, the penalty for not specifying your rules correctly is that you might | 
|  | have some locks taken away from you when you don't expect it.  This is a much | 
|  | more palatable failure mode than throwing an exception, for example. | 
|  | </li> | 
|  | <li>Pre-allocation can be wasteful if resources are only actually needed for a short | 
|  | period of time.  This is still a potential problem with scheduling rules.  Job writers | 
|  | will have to take this into consideration when designing jobs and their rules.  If a | 
|  | job only needs access to a resource briefly at the end of a long operation, it can | 
|  | be coded as two jobs with different rules.  This is an area where scheduling rules | 
|  | introduce a new complication that clients need to be aware of. | 
|  | </li> | 
|  | </ul> | 
|  | <p> | 
|  | Scheduling rules may not be necessary at all in cases where contention is not likely, | 
|  | or where the job is written in such a way that concurrent changes are tolerated. | 
|  | if clients are confident that no contention is likely, they don't need to specify any | 
|  | rules.  A good example for this is search.  Search may create search result markers | 
|  | on arbitrary resources.  However, search could be implemented to not have | 
|  | any scheduling rules, and it could be tolerant of concurrent changes and deletion. | 
|  | Since it only creates search markers, it doesn't care if those markers are changed | 
|  | or deleted after they are created.  Thus it is possible that search can be implemented without | 
|  | using any scheduling rules at all, even though it may potentially make modifications | 
|  | to an arbitrary set of resources.  Another example of this is CVS metadata files.  Since | 
|  | the CVS client is the only one that ever views or modifies the CVS metadata files, | 
|  | it may not need to create a scheduling rule for them. | 
|  | </p> | 
|  | <p> | 
|  | Finally, when there is contention between jobs, we need a | 
|  | mechanism for giving more value to jobs initiated by users versus background jobs | 
|  | that the user is not waiting on results for.  Each job belongs to a priority class that | 
|  | can be used to manage this interaction.  User initiated jobs belong to the INTERACTIVE | 
|  | priority class. To avoid blocking interactive jobs for unacceptable periods of time, we | 
|  | can employ various policies to ensure the job gets run, such as: | 
|  | <ul> | 
|  | <li>Only start non-interactive jobs when NO interactive jobs are running</li> | 
|  | <li>If an interactive job is waiting to be run, and its rule conflicts with a currently | 
|  | running non-interactive job, the non-interactive job may be canceled to let the interactive | 
|  | job proceed.</li> | 
|  | <li>If an interactive job is waiting on a lock held by a non-interactive job, | 
|  | the non-interactive job may be canceled to let the interactive job proceed.</li> | 
|  | </ul> | 
|  |  | 
|  | <h3>Locking issues</h3> | 
|  | <h4>Is the UI thread allowed to acquire locks?</h4> | 
|  | <b>Reasons for "No":</b><br> | 
|  | <p> | 
|  | Clearly there is a deadlock risk due to the interaction with syncExec. | 
|  | We can handle this risk in the same way that we handle it in Eclipse 2.1.  By ensuring | 
|  | the core locking mechanism and the SWT synchronizer are aware of each other, we | 
|  | can avoid potential deadlocks by servicing syncExecs in the UI thread in the case | 
|  | where the UI thread is waiting on an acquired lock. | 
|  | </p> | 
|  | <p> | 
|  | Not allowing locks in the UI thread would also help improve UI responsiveness. | 
|  | If the UI thread is waiting on a lock, it cannot be processing the event queue, and thus | 
|  | the UI will fail to paint. | 
|  | </p> | 
|  | <p> | 
|  | If we don't allow locks in the UI thread, then we can easily add the extra restriction | 
|  | that locks must be acquired from within the context of currently running jobs.  This | 
|  | would give us a story for cleaning up misbehaving locks.  I.e., if someone acquires | 
|  | a lock but fails to release it, we can release it automatically when the job completes. | 
|  | </p> | 
|  | <b>Reasons for "Yes":</b><br> | 
|  | <p> | 
|  | The main drawback is that this would be unweildy to program with.  Some operations | 
|  | that acquire locks may actually be very fast.  Forcing the UI to fork a | 
|  | job every time a lock is needed may be overkill in many cases.  If third party code is | 
|  | called from the UI, the caller may not know if locks will be acquired deep down in the | 
|  | call stack.  To be defensive, the UI would have to fork jobs whenever third party | 
|  | code is called. | 
|  | </p> | 
|  | <p> | 
|  | Another problem is how this rule would be enforced.  Throwing an exception when | 
|  | the UI thread attempts to acquire a lock would result in unacceptable runtime errors. | 
|  | </p> | 
|  | <h4>Do we expose locks?</h4> | 
|  | <p> | 
|  | Do we allow third party clients to directly acquire, for example, the workspace lock? | 
|  | The answer to this question must be "No", if we don't have the restriction that jobs | 
|  | be acquired within running jobs.  Otherwise, we would have no way of cleaning up | 
|  | locks that clients acquire but fail to release.  Is this acceptable?  Do we know of | 
|  | any cases where clients will need to acquire the workspace lock, given that we have | 
|  | a system of rules to prevent conflicting jobs from running concurrently? | 
|  | </p> | 
|  |  | 
|  |  | 
|  |  | 
|  | </body> | 
|  | </html> |