blob: 83f19f348621a35af5f525f4a30dfd1b5ae16325 [file] [log] [blame]
<html>
<head>
</head>
<body>
<h1>Workspace lock granularity</h1>
<h2>The problem</h2>
<ul>
<li>The current workspace lock is heavy handed. There is a single write lock for the entire
workspace, and it is generally acquired for significant periods of time. This can block
other processes for several minutes. When the lock is acquired, no other threads
can write to the workspace in any way. The write lock does not interfere with workspace
readers, which are always allowed to proceed.
</li>
<p>
<li>
Processes often don't know in advance whether writes are necessary, so often the
lock is acquired for long periods even when no writing occurs (for example on searches).
</li>
<p>
<li>
The workspace locking mechanism is currently tied to the mechanism for batching
workspace changes. We widely advertise that compound workspace changes should
be done within an IWorkspaceRunnable. The workspace is always locked for the
duration of the IWorkspaceRunnable even if it is not making changes to the workspace.
</li>
<p>
<li>
In the absence of other sync mechanisms, other clients use the workspace lock
for working with operations that don't necessarily even operate on resources.
</li>
<p>
<li>
To avoid deadlock, many clients acquire the workspace lock "just in case", before
acquiring their own private lock mechanism.
</li>
<p>
<li>
Although this problem is most evident with the workspace lock, there are other
heavy-handed locks in Eclipse with which the same issues appear (the CVS lock is
another example).
</li>
</ul>
<h2>Proposed Solution</h2>
<p>
The current workspace lock is very coarse-grained both in how long the lock is held,
and the scope of the resources that are locked by it. To achieve better concurrency
in the workspace, we need to attack the lock's physical granularity, its temporal
granularity, or some combination of the two.
</p>
<p>
We can improve the lock's temporal granularity by removing any public, general way
of locking for an arbitrary period of time. I.e., we make the simple change of not
acquiring the lock for IWorkspaceRunnables. (or even better, deprecate IWorkspaceRunnable entirely).
</p>
<p>
Internally, the workspace would continue to use a single lock in critical regions to ensure
the integrity of individual API calls. However, a systematic effort would be made to
break down the duration of locking within those calls, where possible. For example,
a best-effort copy method does not need to lock the workspace for its entire duration,
but only for each unit of work where a new resource is created at the destination.
</p>
<p>
The underlying design rule is to prevent an atomic operation from locking
the workspace for a long period of time. Without the presence of long term locks,
long running concurrent operations will interleave their access to the workspace,
enabling several workspace-modifying operations to run at the same time.
</p>
<p>
This solution is optimistic. It assumes that concurrent writes to the same set
of resources in a conflicting manner are relatively rare. One could in fact go further, and
say that if concurrent operations are modifying the same resources at the same time,
then it is either a user or programming error, and the concurrent tasks should just fail.
Serves them right for trying to do contradictory things on the same set of resources.
</p>
<p>
However, this philosophy is not acceptable without a very strong visual indication
to the user of what is happening and when it will be finished. I.e., if we had a progress
dialog saying, "Resource X is being modified, and I will let you know when I'm done",
then it might be acceptable to blame the user if they start a conflicting task before
it finishes. Some of the drawbacks of this approach are:
<ol style="a">
<li>The dialog is a distraction to the user in the cases where the user
doesn't care when the task ends and is not waiting for its result</li>
<li>This interaction style puts the onus on the user to avoid making mistakes. </li>
<li>In many cases the user doesn't know what resources are modified by a given
operation. I.e., it is unrealistic to assume that users can compute the intersection
between the set of resources that all current background threads may modify
and the set of resources that might be modified by another task they are thinking
of starting.</li>
<li>The penalty to the user for making a mistake can be severe. For example,
if the user starts a CVS commit operation, and then, thinking the commit is about done,
decides to delete the project, they will be unhappy if the deletion started before
the commit finished.</li>
</ol>
<p>
So, how do we schedule concurrent jobs in a way that prevents conflicts without
employing a single long term lock? We can introduce a scheme where
jobs can specify in advance whether they need exclusive access to a resource.
That is, each job can optionally supply a <i>scheduling rule</i> that is used by
the job scheduler when making decisions about which jobs to run. The API for these
rules would look something like this:
<pre>
public interface ISchedulingRule {
public boolean isConflicting(ISchedulingRule);
}
</pre>
<p>
While these rules would remain generic at the job scheduling level, the workspace
can introduce some standard rules. For example, a rule could request an array
of resources, or the entire workspace. In this way, finer-grained portions of the workspace
can be effectively locked by a job.
</p>
<p>
The contract on these rules would be as follows: the job scheduling mechanism
guarantees that a job won't be started if there is a job currently running that conflicts
with its scheduling rule. This scheduling rule would be orthogonal to any locking
mechanism, thus avoiding some of the problems discussed earlier with regard
to pre-specification of locks. We still need to revisit our previous objections to
pre-specified locks to see how they apply to scheduling rules:
<ul>
<li>Scheduling performance. The scheduling rules need to be resolved every time
a job is to be run. It is true that this will still impose an overhead on scheduling.
Most of the time, however, there will be few or no jobs running so very little rule
analysis is needed. As the number of concurrent operations grows, the cost will
increase. However, this has the nice property that it is fast most of the time, and
only starts to slow down in the uncommon case where the system is already very busy anyway.
</li>
<li>How to know what rules to use? In some cases it is hard to predict what resources
will be changed by a given unit of work. When third party API is called, there is often
no contract saying what will be touched. This is still an issue. In the worst case,
clients will err on the side of caution and specify that the entire workspace is needed.
We will need to educate users to specify rules carefully, and if necessary to break
jobs up into separate units with different rules. We will introduce a way of creating
composite jobs that will be run in a specific order, but each job can have its
own scheduling rules.
</li>
<li>How to enforce that rules are followed? The proposal is that we don't enforce
anything. That is, a job can request arbitrary locks regardless of what rules were
stated. To avoid deadlock, lock ordering with release and wait (described earlier)
will be used. The theory is that if jobs are well behaved and specify their rules
correctly, deadlock will be rare, so release and wait will rarely be necessary. To put
it another way, the penalty for not specifying your rules correctly is that you might
have some locks taken away from you when you don't expect it. This is a much
more palatable failure mode than throwing an exception, for example.
</li>
<li>Pre-allocation can be wasteful if resources are only actually needed for a short
period of time. This is still a potential problem with scheduling rules. Job writers
will have to take this into consideration when designing jobs and their rules. If a
job only needs access to a resource briefly at the end of a long operation, it can
be coded as two jobs with different rules. This is an area where scheduling rules
introduce a new complication that clients need to be aware of.
</li>
</ul>
<p>
Scheduling rules may not be necessary at all in cases where contention is not likely,
or where the job is written in such a way that concurrent changes are tolerated.
if clients are confident that no contention is likely, they don't need to specify any
rules. A good example for this is search. Search may create search result markers
on arbitrary resources. However, search could be implemented to not have
any scheduling rules, and it could be tolerant of concurrent changes and deletion.
Since it only creates search markers, it doesn't care if those markers are changed
or deleted after they are created. Thus it is possible that search can be implemented without
using any scheduling rules at all, even though it may potentially make modifications
to an arbitrary set of resources. Another example of this is CVS metadata files. Since
the CVS client is the only one that ever views or modifies the CVS metadata files,
it may not need to create a scheduling rule for them.
</p>
<p>
Finally, when there is contention between jobs, we need a
mechanism for giving more value to jobs initiated by users versus background jobs
that the user is not waiting on results for. Each job belongs to a priority class that
can be used to manage this interaction. User initiated jobs belong to the INTERACTIVE
priority class. To avoid blocking interactive jobs for unacceptable periods of time, we
can employ various policies to ensure the job gets run, such as:
<ul>
<li>Only start non-interactive jobs when NO interactive jobs are running</li>
<li>If an interactive job is waiting to be run, and its rule conflicts with a currently
running non-interactive job, the non-interactive job may be canceled to let the interactive
job proceed.</li>
<li>If an interactive job is waiting on a lock held by a non-interactive job,
the non-interactive job may be canceled to let the interactive job proceed.</li>
</ul>
<h3>Locking issues</h3>
<h4>Is the UI thread allowed to acquire locks?</h4>
<b>Reasons for "No":</b><br>
<p>
Clearly there is a deadlock risk due to the interaction with syncExec.
We can handle this risk in the same way that we handle it in Eclipse 2.1. By ensuring
the core locking mechanism and the SWT synchronizer are aware of each other, we
can avoid potential deadlocks by servicing syncExecs in the UI thread in the case
where the UI thread is waiting on an acquired lock.
</p>
<p>
Not allowing locks in the UI thread would also help improve UI responsiveness.
If the UI thread is waiting on a lock, it cannot be processing the event queue, and thus
the UI will fail to paint.
</p>
<p>
If we don't allow locks in the UI thread, then we can easily add the extra restriction
that locks must be acquired from within the context of currently running jobs. This
would give us a story for cleaning up misbehaving locks. I.e., if someone acquires
a lock but fails to release it, we can release it automatically when the job completes.
</p>
<b>Reasons for "Yes":</b><br>
<p>
The main drawback is that this would be unweildy to program with. Some operations
that acquire locks may actually be very fast. Forcing the UI to fork a
job every time a lock is needed may be overkill in many cases. If third party code is
called from the UI, the caller may not know if locks will be acquired deep down in the
call stack. To be defensive, the UI would have to fork jobs whenever third party
code is called.
</p>
<p>
Another problem is how this rule would be enforced. Throwing an exception when
the UI thread attempts to acquire a lock would result in unacceptable runtime errors.
</p>
<h4>Do we expose locks?</h4>
<p>
Do we allow third party clients to directly acquire, for example, the workspace lock?
The answer to this question must be "No", if we don't have the restriction that jobs
be acquired within running jobs. Otherwise, we would have no way of cleaning up
locks that clients acquire but fail to release. Is this acceptable? Do we know of
any cases where clients will need to acquire the workspace lock, given that we have
a system of rules to prevent conflicting jobs from running concurrently?
</p>
</body>
</html>