examples/org.eclipse.ui.examples.job/doc/workspace-lock.html - gerrit/platform/eclipse.platform.ui - Git at Google

 <html>
 <head>
 </head>
 <body>
 <h1>Workspace lock granularity</h1>
 <h2>The problem</h2>
 <ul>
 <li>The current workspace lock is heavy handed.  There is a single write lock for the entire
 workspace, and it is generally acquired for significant periods of time.  This can block
 other processes for several minutes.  When the lock is acquired, no other threads
 can write to the workspace in any way. The write lock does not interfere with workspace
 readers, which are always allowed to proceed.
 </li>
 <p>
 <li>
 Processes often don't know in advance whether writes are necessary, so often the
 lock is acquired for long periods even when no writing occurs (for example on searches).
 </li>
 <p>
 <li>
 The workspace locking mechanism is currently tied to the mechanism for batching
 workspace changes.  We widely advertise that compound workspace changes should
 be done within an IWorkspaceRunnable.  The workspace is always locked for the
 duration of the IWorkspaceRunnable even if it is not making changes to the workspace.
 </li>
 <p>
 <li>
 In the absence of other sync mechanisms, other clients use the workspace lock
 for working with operations that don't necessarily even operate on resources.
 </li>
 <p>
 <li>
 To avoid deadlock, many clients acquire the workspace lock "just in case", before
 acquiring their own private lock mechanism.
 </li>
 <p>
 <li>
 Although this problem is most evident with the workspace lock, there are other
 heavy-handed locks in Eclipse with which the same issues appear (the CVS lock is
 another example).
 </li>
 </ul>

 <h2>Proposed Solution</h2>
 <p>
 The current workspace lock is very coarse-grained both in how long the lock is held,
 and the scope of the resources that are locked by it.  To achieve better concurrency
 in the workspace, we need to attack the lock's physical granularity, its temporal
 granularity, or some combination of the two.
 </p>
 <p>
 We can improve the lock's temporal granularity by removing any public, general way
 of locking for an arbitrary period of time. I.e., we make the simple change of not
 acquiring the lock for IWorkspaceRunnables. (or even better, deprecate IWorkspaceRunnable entirely).
 </p>
 <p>
 Internally, the workspace would continue to use a single lock in critical regions to ensure
 the integrity of individual API calls.  However, a systematic effort would be made to
 break down the duration of locking within those calls, where possible.  For example,
 a best-effort copy method does not need to lock the workspace for its entire duration,
 but only for each unit of work where a new resource is created at the destination.
 </p>
 <p>
 The underlying design rule is to prevent an atomic operation from locking
 the workspace for a long period of time.  Without the presence of long term locks,
 long running concurrent operations will interleave their access to the workspace,
 enabling several workspace-modifying operations to run at the same time.
 </p>
 <p>
 This solution is optimistic. It assumes that concurrent writes to the same set
 of resources in a conflicting manner are relatively rare. One could in fact go further, and
 say that if concurrent operations are modifying the same resources at the same time,
 then it is either a user or programming error, and the concurrent tasks should just fail.
 Serves them right for trying to do contradictory things on the same set of resources.
 </p>
 <p>
 However, this philosophy is not acceptable without a very strong visual indication
 to the user of what is happening and when it will be finished.  I.e., if we had a progress
 dialog saying, "Resource X is being modified, and I will let you know when I'm done",
 then it might be acceptable to blame the user if they start a conflicting task before
 it finishes.  Some of the drawbacks of this approach are:
 <ol style="a">
 <li>The dialog is a distraction to the user in the cases where the user
 doesn't care when the task ends and is not waiting for its result</li>
 <li>This interaction style puts the onus on the user to avoid making mistakes.  </li>
 <li>In many cases the user doesn't know what resources are modified by a given
 operation. I.e., it is unrealistic to assume that users can compute the intersection
 between the set of resources that all current background threads may modify
 and the set of resources that might be modified by another task they are thinking
 of starting.</li>
 <li>The penalty to the user for making a mistake can be severe.  For example,
 if the user starts a CVS commit operation, and then, thinking the commit is about done,
 decides to delete the project, they will be unhappy if the deletion started before
 the commit finished.</li>
 </ol>
 <p>
 So, how do we schedule concurrent jobs in a way that prevents conflicts without
 employing a single long term lock? We can introduce a scheme where
 jobs can specify in advance whether they need exclusive access to a resource.
 That is, each job can optionally supply a <i>scheduling rule</i> that is used by
 the job scheduler when making decisions about which jobs to run.  The API for these
 rules would look something like this:
 <pre>
 public interface ISchedulingRule {
 	public boolean isConflicting(ISchedulingRule);
 }
 </pre>
 <p>
 While these rules would remain generic at the job scheduling level, the workspace
 can introduce some standard rules.  For example, a rule could request an array
 of resources, or the entire workspace.  In this way, finer-grained portions of the workspace
 can be effectively locked by a job.
 </p>
 <p>
 The contract on these rules would be as follows: the job scheduling mechanism
 guarantees that a job won't be started if there is a job currently running that conflicts
 with its scheduling rule.  This scheduling rule would be orthogonal to any locking
 mechanism, thus avoiding some of the problems discussed earlier with regard
 to pre-specification of locks. We still need to revisit our previous objections to
 pre-specified locks to see how they apply to scheduling rules:
 <ul>
 <li>Scheduling performance. The scheduling rules need to be resolved every time
 a job is to be run.  It is true that this will still impose an overhead on scheduling.
 Most of the time, however, there will be few or no jobs running so very little rule
 analysis is needed.  As the number of concurrent operations grows, the cost will
 increase.  However, this has the nice property that it is fast most of the time, and
 only starts to slow down in the uncommon case where the system is already very busy anyway.
 </li>
 <li>How to know what rules to use?  In some cases it is hard to predict what resources
 will be changed by a given unit of work.  When third party API is called, there is often
 no contract saying what will be touched.  This is still an issue.  In the worst case,
 clients will err on the side of caution and specify that the entire workspace is needed.
 We will need to educate users to specify rules carefully, and if necessary to break
 jobs up into separate units with different rules.  We will introduce a way of creating
 composite jobs that will be run in a specific order, but each job can have its
 own scheduling rules.
 </li>
 <li>How to enforce that rules are followed?  The proposal is that we don't enforce
 anything.  That is, a job can request arbitrary locks regardless of what rules were
 stated.  To avoid deadlock, lock ordering with release and wait (described earlier)
 will be used.  The theory is that if jobs are well behaved and specify their rules
 correctly, deadlock will be rare, so release and wait will rarely be necessary.  To put
 it another way, the penalty for not specifying your rules correctly is that you might
 have some locks taken away from you when you don't expect it.  This is a much
 more palatable failure mode than throwing an exception, for example.
 </li>
 <li>Pre-allocation can be wasteful if resources are only actually needed for a short
 period of time.  This is still a potential problem with scheduling rules.  Job writers
 will have to take this into consideration when designing jobs and their rules.  If a
 job only needs access to a resource briefly at the end of a long operation, it can
 be coded as two jobs with different rules.  This is an area where scheduling rules
 introduce a new complication that clients need to be aware of.
 </li>
 </ul>
 <p>
 Scheduling rules may not be necessary at all in cases where contention is not likely,
 or where the job is written in such a way that concurrent changes are tolerated.
 if clients are confident that no contention is likely, they don't need to specify any
 rules.  A good example for this is search.  Search may create search result markers
 on arbitrary resources.  However, search could be implemented to not have
 any scheduling rules, and it could be tolerant of concurrent changes and deletion.
 Since it only creates search markers, it doesn't care if those markers are changed
 or deleted after they are created.  Thus it is possible that search can be implemented without
 using any scheduling rules at all, even though it may potentially make modifications
 to an arbitrary set of resources.  Another example of this is CVS metadata files.  Since
 the CVS client is the only one that ever views or modifies the CVS metadata files,
 it may not need to create a scheduling rule for them.
 </p>
 <p>
 Finally, when there is contention between jobs, we need a
 mechanism for giving more value to jobs initiated by users versus background jobs
 that the user is not waiting on results for.  Each job belongs to a priority class that
 can be used to manage this interaction.  User initiated jobs belong to the INTERACTIVE
 priority class. To avoid blocking interactive jobs for unacceptable periods of time, we
 can employ various policies to ensure the job gets run, such as:
 <ul>
 <li>Only start non-interactive jobs when NO interactive jobs are running</li>
 <li>If an interactive job is waiting to be run, and its rule conflicts with a currently
 running non-interactive job, the non-interactive job may be canceled to let the interactive
 job proceed.</li>
 <li>If an interactive job is waiting on a lock held by a non-interactive job,
 the non-interactive job may be canceled to let the interactive job proceed.</li>
 </ul>

 <h3>Locking issues</h3>
 <h4>Is the UI thread allowed to acquire locks?</h4>
 <b>Reasons for "No":</b><br>
 <p>
 Clearly there is a deadlock risk due to the interaction with syncExec.
 We can handle this risk in the same way that we handle it in Eclipse 2.1.  By ensuring
 the core locking mechanism and the SWT synchronizer are aware of each other, we
 can avoid potential deadlocks by servicing syncExecs in the UI thread in the case
 where the UI thread is waiting on an acquired lock.
 </p>
 <p>
 Not allowing locks in the UI thread would also help improve UI responsiveness.
 If the UI thread is waiting on a lock, it cannot be processing the event queue, and thus
 the UI will fail to paint.
 </p>
 <p>
 If we don't allow locks in the UI thread, then we can easily add the extra restriction
 that locks must be acquired from within the context of currently running jobs.  This
 would give us a story for cleaning up misbehaving locks.  I.e., if someone acquires
 a lock but fails to release it, we can release it automatically when the job completes.
 </p>
 <b>Reasons for "Yes":</b><br>
 <p>
 The main drawback is that this would be unweildy to program with.  Some operations
 that acquire locks may actually be very fast.  Forcing the UI to fork a
 job every time a lock is needed may be overkill in many cases.  If third party code is
 called from the UI, the caller may not know if locks will be acquired deep down in the
 call stack.  To be defensive, the UI would have to fork jobs whenever third party
 code is called.
 </p>
 <p>
 Another problem is how this rule would be enforced.  Throwing an exception when
 the UI thread attempts to acquire a lock would result in unacceptable runtime errors.
 </p>
 <h4>Do we expose locks?</h4>
 <p>
 Do we allow third party clients to directly acquire, for example, the workspace lock?
 The answer to this question must be "No", if we don't have the restriction that jobs
 be acquired within running jobs.  Otherwise, we would have no way of cleaning up
 locks that clients acquire but fail to release.  Is this acceptable?  Do we know of
 any cases where clients will need to acquire the workspace lock, given that we have
 a system of rules to prevent conflicting jobs from running concurrently?
 </p>


 </body>
 </html>
	<html>
	<head>
	</head>
	<body>
	<h1>Workspace lock granularity</h1>
	<h2>The problem</h2>
	<ul>
	<li>The current workspace lock is heavy handed. There is a single write lock for the entire
	workspace, and it is generally acquired for significant periods of time. This can block
	other processes for several minutes. When the lock is acquired, no other threads
	can write to the workspace in any way. The write lock does not interfere with workspace
	readers, which are always allowed to proceed.
	</li>
	<p>
	<li>
	Processes often don't know in advance whether writes are necessary, so often the
	lock is acquired for long periods even when no writing occurs (for example on searches).
	</li>
	<p>
	<li>
	The workspace locking mechanism is currently tied to the mechanism for batching
	workspace changes. We widely advertise that compound workspace changes should
	be done within an IWorkspaceRunnable. The workspace is always locked for the
	duration of the IWorkspaceRunnable even if it is not making changes to the workspace.
	</li>
	<p>
	<li>
	In the absence of other sync mechanisms, other clients use the workspace lock
	for working with operations that don't necessarily even operate on resources.
	</li>
	<p>
	<li>
	To avoid deadlock, many clients acquire the workspace lock "just in case", before
	acquiring their own private lock mechanism.
	</li>
	<p>
	<li>
	Although this problem is most evident with the workspace lock, there are other
	heavy-handed locks in Eclipse with which the same issues appear (the CVS lock is
	another example).
	</li>
	</ul>

	<h2>Proposed Solution</h2>
	<p>
	The current workspace lock is very coarse-grained both in how long the lock is held,
	and the scope of the resources that are locked by it. To achieve better concurrency
	in the workspace, we need to attack the lock's physical granularity, its temporal
	granularity, or some combination of the two.
	</p>
	<p>
	We can improve the lock's temporal granularity by removing any public, general way
	of locking for an arbitrary period of time. I.e., we make the simple change of not
	acquiring the lock for IWorkspaceRunnables. (or even better, deprecate IWorkspaceRunnable entirely).
	</p>
	<p>
	Internally, the workspace would continue to use a single lock in critical regions to ensure
	the integrity of individual API calls. However, a systematic effort would be made to
	break down the duration of locking within those calls, where possible. For example,
	a best-effort copy method does not need to lock the workspace for its entire duration,
	but only for each unit of work where a new resource is created at the destination.
	</p>
	<p>
	The underlying design rule is to prevent an atomic operation from locking
	the workspace for a long period of time. Without the presence of long term locks,
	long running concurrent operations will interleave their access to the workspace,
	enabling several workspace-modifying operations to run at the same time.
	</p>
	<p>
	This solution is optimistic. It assumes that concurrent writes to the same set
	of resources in a conflicting manner are relatively rare. One could in fact go further, and
	say that if concurrent operations are modifying the same resources at the same time,
	then it is either a user or programming error, and the concurrent tasks should just fail.
	Serves them right for trying to do contradictory things on the same set of resources.
	</p>
	<p>
	However, this philosophy is not acceptable without a very strong visual indication
	to the user of what is happening and when it will be finished. I.e., if we had a progress
	dialog saying, "Resource X is being modified, and I will let you know when I'm done",
	then it might be acceptable to blame the user if they start a conflicting task before
	it finishes. Some of the drawbacks of this approach are:
	<ol style="a">
	<li>The dialog is a distraction to the user in the cases where the user
	doesn't care when the task ends and is not waiting for its result</li>
	<li>This interaction style puts the onus on the user to avoid making mistakes. </li>
	<li>In many cases the user doesn't know what resources are modified by a given
	operation. I.e., it is unrealistic to assume that users can compute the intersection
	between the set of resources that all current background threads may modify
	and the set of resources that might be modified by another task they are thinking
	of starting.</li>
	<li>The penalty to the user for making a mistake can be severe. For example,
	if the user starts a CVS commit operation, and then, thinking the commit is about done,
	decides to delete the project, they will be unhappy if the deletion started before
	the commit finished.</li>
	</ol>
	<p>
	So, how do we schedule concurrent jobs in a way that prevents conflicts without
	employing a single long term lock? We can introduce a scheme where
	jobs can specify in advance whether they need exclusive access to a resource.
	That is, each job can optionally supply a <i>scheduling rule</i> that is used by
	the job scheduler when making decisions about which jobs to run. The API for these
	rules would look something like this:
	<pre>
	public interface ISchedulingRule {
	public boolean isConflicting(ISchedulingRule);
	}
	</pre>
	<p>
	While these rules would remain generic at the job scheduling level, the workspace
	can introduce some standard rules. For example, a rule could request an array
	of resources, or the entire workspace. In this way, finer-grained portions of the workspace
	can be effectively locked by a job.
	</p>
	<p>
	The contract on these rules would be as follows: the job scheduling mechanism
	guarantees that a job won't be started if there is a job currently running that conflicts
	with its scheduling rule. This scheduling rule would be orthogonal to any locking
	mechanism, thus avoiding some of the problems discussed earlier with regard
	to pre-specification of locks. We still need to revisit our previous objections to
	pre-specified locks to see how they apply to scheduling rules:
	<ul>
	<li>Scheduling performance. The scheduling rules need to be resolved every time
	a job is to be run. It is true that this will still impose an overhead on scheduling.
	Most of the time, however, there will be few or no jobs running so very little rule
	analysis is needed. As the number of concurrent operations grows, the cost will
	increase. However, this has the nice property that it is fast most of the time, and
	only starts to slow down in the uncommon case where the system is already very busy anyway.
	</li>
	<li>How to know what rules to use? In some cases it is hard to predict what resources
	will be changed by a given unit of work. When third party API is called, there is often
	no contract saying what will be touched. This is still an issue. In the worst case,
	clients will err on the side of caution and specify that the entire workspace is needed.
	We will need to educate users to specify rules carefully, and if necessary to break
	jobs up into separate units with different rules. We will introduce a way of creating
	composite jobs that will be run in a specific order, but each job can have its
	own scheduling rules.
	</li>
	<li>How to enforce that rules are followed? The proposal is that we don't enforce
	anything. That is, a job can request arbitrary locks regardless of what rules were
	stated. To avoid deadlock, lock ordering with release and wait (described earlier)
	will be used. The theory is that if jobs are well behaved and specify their rules
	correctly, deadlock will be rare, so release and wait will rarely be necessary. To put
	it another way, the penalty for not specifying your rules correctly is that you might
	have some locks taken away from you when you don't expect it. This is a much
	more palatable failure mode than throwing an exception, for example.
	</li>
	<li>Pre-allocation can be wasteful if resources are only actually needed for a short
	period of time. This is still a potential problem with scheduling rules. Job writers
	will have to take this into consideration when designing jobs and their rules. If a
	job only needs access to a resource briefly at the end of a long operation, it can
	be coded as two jobs with different rules. This is an area where scheduling rules
	introduce a new complication that clients need to be aware of.
	</li>
	</ul>
	<p>
	Scheduling rules may not be necessary at all in cases where contention is not likely,
	or where the job is written in such a way that concurrent changes are tolerated.
	if clients are confident that no contention is likely, they don't need to specify any
	rules. A good example for this is search. Search may create search result markers
	on arbitrary resources. However, search could be implemented to not have
	any scheduling rules, and it could be tolerant of concurrent changes and deletion.
	Since it only creates search markers, it doesn't care if those markers are changed
	or deleted after they are created. Thus it is possible that search can be implemented without
	using any scheduling rules at all, even though it may potentially make modifications
	to an arbitrary set of resources. Another example of this is CVS metadata files. Since
	the CVS client is the only one that ever views or modifies the CVS metadata files,
	it may not need to create a scheduling rule for them.
	</p>
	<p>
	Finally, when there is contention between jobs, we need a
	mechanism for giving more value to jobs initiated by users versus background jobs
	that the user is not waiting on results for. Each job belongs to a priority class that
	can be used to manage this interaction. User initiated jobs belong to the INTERACTIVE
	priority class. To avoid blocking interactive jobs for unacceptable periods of time, we
	can employ various policies to ensure the job gets run, such as:
	<ul>
	<li>Only start non-interactive jobs when NO interactive jobs are running</li>
	<li>If an interactive job is waiting to be run, and its rule conflicts with a currently
	running non-interactive job, the non-interactive job may be canceled to let the interactive
	job proceed.</li>
	<li>If an interactive job is waiting on a lock held by a non-interactive job,
	the non-interactive job may be canceled to let the interactive job proceed.</li>
	</ul>

	<h3>Locking issues</h3>
	<h4>Is the UI thread allowed to acquire locks?</h4>
	<b>Reasons for "No":</b><br>
	<p>
	Clearly there is a deadlock risk due to the interaction with syncExec.
	We can handle this risk in the same way that we handle it in Eclipse 2.1. By ensuring
	the core locking mechanism and the SWT synchronizer are aware of each other, we
	can avoid potential deadlocks by servicing syncExecs in the UI thread in the case
	where the UI thread is waiting on an acquired lock.
	</p>
	<p>
	Not allowing locks in the UI thread would also help improve UI responsiveness.
	If the UI thread is waiting on a lock, it cannot be processing the event queue, and thus
	the UI will fail to paint.
	</p>
	<p>
	If we don't allow locks in the UI thread, then we can easily add the extra restriction
	that locks must be acquired from within the context of currently running jobs. This
	would give us a story for cleaning up misbehaving locks. I.e., if someone acquires
	a lock but fails to release it, we can release it automatically when the job completes.
	</p>
	<b>Reasons for "Yes":</b><br>
	<p>
	The main drawback is that this would be unweildy to program with. Some operations
	that acquire locks may actually be very fast. Forcing the UI to fork a
	job every time a lock is needed may be overkill in many cases. If third party code is
	called from the UI, the caller may not know if locks will be acquired deep down in the
	call stack. To be defensive, the UI would have to fork jobs whenever third party
	code is called.
	</p>
	<p>
	Another problem is how this rule would be enforced. Throwing an exception when
	the UI thread attempts to acquire a lock would result in unacceptable runtime errors.
	</p>
	<h4>Do we expose locks?</h4>
	<p>
	Do we allow third party clients to directly acquire, for example, the workspace lock?
	The answer to this question must be "No", if we don't have the restriction that jobs
	be acquired within running jobs. Otherwise, we would have no way of cleaning up
	locks that clients acquire but fail to release. Is this acceptable? Do we know of
	any cases where clients will need to acquire the workspace lock, given that we have
	a system of rules to prevent conflicting jobs from running concurrently?
	</p>



	</body>
	</html>