Concurrent Affairs



Contents


If multiple threads concurrently execute code that writes to or modifies a resource, then obviously the resource must be protected with a thread synchronization lock to ensure that the resource doesn't get corrupted. However, it is common to have a resource that is occasionally written to but frequently read from. If multiple threads are concurrently reading a resource, using a mutual exclusive lock hurts performance significantly because only one thread at a time is allowed to read from the resource. It's far more efficient to allow all the threads to read the resource simultaneously, and it's fine to do this if all of the threads treat the resource as read-only and do not attempt to write to or modify it.

A reader/writer synchronization lock can and should be used to improve performance and scalability. A reader/writer lock ensures that only one thread can write to a resource at any one time and it allows multiple threads to read from a resource simultaneously as long as no thread is writing at the same time.

The Microsoft® .NET Framework Class Library includes a ReaderWriterLock class in the System.Threading namespace that lets you obtain multiple-reader/single-writer semantics. While it is nice that a class like this exists, there are several problems with its implementation and I recommend you do not use it:

Performance  Even when there is no contention for a ReaderWriterLock, its performance is very slow. For example, a call to its AcquireReaderLock method takes about five times longer to execute than a call to Monitor's Enter method.

Policy  If a thread completes writing and there are both reader and writer threads waiting, the ReaderWriterLock releases the reader threads instead of the writer threads. But if you have a resource that is always being written to, you should use a mutually exclusive lock to guard access. And if you have a resource that's always being read from, you don't need a thread synchronization lock at all. So, the reason to use a reader/writer lock is if you expect a lot of reader threads and very few writer threads. Because the ReaderWriterLock releases any waiting readers when a writer releases the lock, writer threads may get queued up and take an unusually long time to get serviced. In fact, I know of some Web sites that were architected around this lock and had problems. Users would go to a Web page, submit a form to modify their profile data, then on the server the thread would call the ReaderWriterLock's AcquireWriterLock method to update the profile data. But other threads were reading from the data, so the writer thread wasn't allowed access in a reasonable amount of time and the client's browser would time out.

On the other hand, the policy of releasing reader threads when a writer thread releases the lock ensures that threads are always making progress and do eventually get serviced. Imagine a reader/writer lock that has a different policy: when a writer thread releases the lock, another waiting writer thread is allowed to own the lock. With this policy, if a lot of writer threads happen to show up, then reader threads are starved. I generally prefer a reader/writer lock that favors writers and if I expect a lot of writers, I'd use a mutually exclusive lock instead. In my opinion, the ReaderWriterLock that ships with the .NET Framework defeats the main purpose of using a reader/writer lock.

Recursion  The ReaderWriterLock class supports recursion. This means it remembers which threads currently own the lock and if an owning thread attempts to acquire the lock recursively, it allows the thread to acquire the lock and increments a counter for the thread's ownership. The thread must then release the lock the same number of times so that the thread doesn't own the lock anymore. Although this seems like a nice feature, it comes at a very high cost. First, since multiple reader threads can own the lock simultaneously, the lock must keep the counter on a per-thread basis and this requires additional memory and time to update the counter. This feature contributes significantly to the ReaderWriterLock's poor performance. Second, it is sometimes useful to design an architecture where you need to acquire a lock in one thread and release the lock in another thread. Because of its recursion feature, the ReaderWriterLock prohibits this kind of application architecture.

Resource Leak  Prior to version 2.0 of the .NET Framework, the ReaderWriterLock had a bug that caused it to leak some of the kernel objects it was using. These objects would be reclaimed only after process termination. Fortunately, this bug has been fixed in the .NET Framework 2.0.

For all these reasons, I never use the ReaderWriterLock that ships with the .NET Framework. Instead, I have created my own reader/writer lock that is super fast, favors writer threads over reader threads, and does not support recursion. My lock is called OneManyResourceLock because you can use it either to allow just one thread or to allow many threads access to a resource. The object model of my class looks like this:

public sealed class OneManyResourceLock : ResourceLock {
public IDisposable WaitToRead()  { ... }
public void        DoneReading() { ... }
public IDisposable WaitToWrite() { ... }
public void        DoneWriting() { ... }
}

 

I've written many reader/writer locks and all of them exhibited the convoy problem I discussed in my March 2006 column. I decided to fix this in my new OneManyResourceLock. Implementing my new version turned out to be fairly complicated. To give you a sense of my new lock, I created some state diagrams showing how WaitToWrite (Figure 1), WaitToRead (Figure 2), DoneWriting (Figure 3), and DoneReading (Figure 4) work.

Figure 1 WaitToWrite
Figure 1 WaitToWrite

To understand the state diagrams, you first need to know that the OneManyResourceLock has an Int32 field that maintains the state of the lock. An instance of the lock also holds one semaphore that waiting reader threads wait on and another that waiting writer threads wait on. The bytes of the Int32 lock state field represent different parts of the lock's state and are always manipulated using the interlocked methods I described in my first Concurrent Affairs column. The first byte represents the number of writers waiting (WW), the second represents the number of readers waiting (RW), the third represents the number of readers reading (RR), and the last byte represents the current disposition of the lock, which can be any of the following:

  • Free: no thread owns the lock.
  • Owned by writer (OBW): one writer thread owns the lock.
  • Owned by readers (OBR): one or more reader threads owns the lock.
  • Owned by readers and a writer is pending (OBRAWP): one or more reader threads own the lock but a writer thread is waiting. In this state, a new reader thread cannot own the lock.
  • Reserved for writer (RFW): the lock enters this state when a writer thread is waiting, when one leaves the lock with another writer thread waiting, or when the last reader thread leaves the lock and a writer thread is waiting.

 

Figure 2 WaitToRead
Figure 2 WaitToRead

The OneManyResourceLock class is much more complex than the Optex class described in my previous column, but it is quite useful and can greatly improve the performance and scalability of your application code.

Figure 3 DoneWriting
Figure 3 DoneWriting

Notice that the WaitToRead and WaitToWrite methods have a return type of IDisposable. I did this to offer the convenience of calling these methods with a C# using statement. In other words, you can write code like this:

public void ModifyResource() {
using (m_OneManyResourceLock.WaitToWrite()) {
// The code to modify the resource goes here
}
}

 

Figure 4 DoneReading
Figure 4 DoneReading

and the C# compiler will compile the code as if you had written code like this:

public void ModifyResource() {
IDisposable temp = m_OneManyResourceLock.WaitToWrite();
try {
// The code to modify the resource goes here
}
finally {
if (temp != null) temp.Dispose();
}
}

 

When WaitToWrite (or WaitToRead) is called, a reference to an object that implements IDisposable is returned. When Dispose is called on this object, the Dispose method internally calls DoneWriting (or DoneReading). This makes it very convenient to use any of the ResourceLock-derived types in your code. Note that I create just one Disposable object for any given lock instance; I don't create a new IDisposable object every time WaitToRead or WaitToWrite is called, thereby improving performance and decreasing memory consumption.

It would be pretty easy to add additional features to the OneManyResourceLock class. For example, you could add recursion so that an owning thread could acquire the lock multiple times. You could add TryWaitToRead/TryWaitToWrite methods that take timeout values rather than having the calling threads wait infinitely to gain access to the resource. You could change the policy of the lock to release any waiting readers when a writer releases the lock. In fact, you could record the time of the longest waiting reader and use this information to make the policy decision very intelligent. Of course, you could modify the WaitToRead/WaitToWrite methods so that they spin in user mode a few times before transitioning to kernel mode to wait on the semaphore.

In addition, if you know that the work performed by the threads once they own the lock is very short and of a finite duration, you could even create a reader/writer lock that doesn't need semaphores and never transitions into kernel mode. It spins entirely in user mode until the lock can be obtained by the calling thread. I have implemented a lock like this that I call the OneManySpinResourceLock. Be careful when you use it though, because calling threads never relinquish the CPU, so if there is lengthy contention, you can waste a lot of CPU time.


My ResourceLock Library

By now, you're aware of the various locks that ship with the .NET Framework that can be used to allow threads to access a shared resource in a thread-safe way: Monitor, ReaderWriterLock, Mutex, Semaphore, and EventWaitHandle. I've introduced a few others: SpinLock, Optex, OneManyResourceLock, and OneManySpinResourceLock. So now, the question is, with all these thread synchronization locks available, how do you know which is best for any given situation? The answer is that you probably don't know. It fact, it really depends on how your application is being used in the field. It could be that one customer typically writes to the data and only infrequently reads the data. Another customer might write to the data initially and subsequently considers the data read-only.

Is the resource generally accessed for a long or short period of time? Does the machine running your application have one CPU or more than one? Are the CPUs hyper-threaded? Are they multicore? All of these variables have some impact on your application and should influence the exact lock implementation you use in your code. But, how can you code your application so the thread synchronization locks are selectable at run time? Well, I have a proposal—and an implementation.

I have defined an abstract base class, called ResourceLock. This class has a bunch of virtual methods: WaitToRead, DoneReading, WaitToWrite, and DoneWriting. It offers Close and Dispose methods as well. ResourceLock is the base of many concrete classes I've also defined. The table in Figure 5 shows the set of classes (sorted alphabetically) I've derived from my ResourceLock base class.

Each type represents an existing thread synchronization lock you should already be familiar with. But, the class wraps the locks so they all offer the same reader/writer programming interface. You'll find the code for all of these classes in my PowerThreading library, which can be downloaded from my Web site (see my biographical information at the end of this column).

When your application starts up, you construct an instance of the ResourceLock-derived type you desire and assign the object's reference to a variable of type ResourceLock (the abstract base class). In your source code, always use reader/writer semantics and call the WaitToRead, DoneReading, WaitToWrite, DoneWriting methods using the ResourceLock variable. Note that the table shows the different capabilities of each lock.

  • A lock with a in the Exclusive column identifies a lock that allows only one thread at a time to access the resource. So, even if two threads call WaitToRead, the lock will only let one thread at a time access the resource.
  • A lock with a in the Recursion column identifies a lock that allows a thread to own the lock multiple times. The thread must release the lock the same number of times before it completely releases the lock.
  • A lock with a in the Spin column identifies a lock that spins the calling thread in user mode while it waits to own the lock. Calling threads never transition to kernel mode. Locks that spin should be used with great caution and only when the guarded work performed by the thread is known to be of very short duration. You also need to make sure that all waiting threads are of the same priority and that priority boosting is disabled for these threads by calling the Win32® SetProcessPriorityBoost or SetThreadPriorityBoost functions.

It's clear from the table that some permutations of Exclusive, Recursion, and Spin do not have corresponding locks. While it would be possible to create such locks, I haven't found them necessary in my own work so I haven't gone to the trouble. If you're doing any cross-AppDomain or cross-process synchronization, though, you really need a lock that is a wrapper around a kernel object such as a mutex or semaphore. My MutexResourceLock and SemaphoreResourceLock classes don't offer constructors that let you set a string name for the underlying mutex and semaphore objects, but modifying the code to support this should be trivial.

There's one more thing I should point out: the actual class you construct when your application starts can be determined dynamically. For example, when your application starts up, it could determine how many CPUs are in the machine and whether those CPUs are hyper-threaded or multicore and then use this information to decide which specific ResourceLock-derived class to construct. You could also have an application setting in an XML file or equivalent that you use to determine what class to create, which would let you try different locks without modifying and recompiling the source code. In addition, you could instrument your application's code and monitor the behavior the application exhibits when deployed in the field. Based on the results, the application could choose a specific lock for its next run, allowing your application to fine-tune itself to each customer!

Just for fun, I decided to run some performance tests on these different locks. My test application had just one thread that first called WaitToRead/DoneReading two million times and then called WaitToWrite/DoneWriting two million times. In these tests there was no contention for the lock because only one thread was accessing the lock object. Then I ran the test again, this time with four threads calling both sets of methods two million times. I used a 2.4GHz AMD Athlon 64 X2 dual-core CPU system for the tests. The results in Figure 6 should give you a feel for the performance variations of the locks. Obviously, the results will be different on differently configured machines.

The rows are sorted using the 1-Thread Reading time, from best performing (NullResourceLock) to worst performing (SemaphoreResourceLock). Notice that, for each lock, the 1-Thread times for reading and writing are about the same. For example, OptexResourceLock shows 0.105 seconds for reading and 0.102 seconds writing. This is expected because, without contention, there isn't much difference between a thread acquiring/releasing a reader lock versus acquiring/releasing a writer lock.

In the 4-Thread Reading and Writing columns, you'll notice that the time to read versus time to write is also about the same for all the exclusive locks. For example, the MonitorResourceLock shows 0.503 seconds for reading and 0.496 seconds for writing; the 0.007 second difference here is obviously just noise. The nearly identical times are also expected because these locks let only one thread at a time own the lock.

However, notice the disparity between reading and writing using four threads for a nonexclusive lock. For example, the ReaderWriterResourceLock (see Figure 7) shows 1.782 seconds for reading and 2.187 seconds for writing; writing took more than 20 percent longer than reading. That's not surprising. Since multiple threads can access the resource simultaneously for reading whereas only one thread at a time can access the resource for writing, we'd expect the reading time to be substantially less, proving that the lock is doing what it is supposed to be doing.

Finally, my code also includes an IResourceLock interface, which is implemented by my ResourceLock abstract base class. The interface lets you define a type and make that type follow the same reader/write lock programming pattern discussed here. Suppose, for example, you have a CustomerOrder class that implements my IResourceLock interface. Now you can use reader/writer semantics to access an instance of the CustomerOrder class. When you define your CustomerOrder class, most likely you would have a private ResourceLock field that will be initialized to refer to one of the ResourceLock-derived types. Of course, your CustomerOrder class would then have to implement the IResourceLock methods and, internally, you'd have each of the methods delegate to the corresponding method. Here is an example:

public sealed class CustomerOrder : IResourceLock {
private ResourceLock m_rl = new OneManyResourceLock();
public IDisposable WaitToRead()  { return m_rl.WaitToRead(); }
public void        DoneReading() { m_rl.DoneReading(); }
public IDisposable WaitToWrite() { return m_rl.WaitToWrite(); }
public void        DoneWriting() { m_rl.DoneWriting(); }
...
}

 

I've been using my library now for several years with great results. I love the fact that it lets me separate the kind of locking I need to do from the exact lock I decide to use. Now, when I write code, I always think about reader/writer locking for a shared resource. After seeing how the code performs, I decide exactly which lock to use. In fact, I can even use the NullResourceLock to turn off locking altogether if I determine it's not necessary to have concurrent access to a resource.

Back to top

Send your questions and comments for Jeffrey to  mmsync@microsoft.com.

Jeffrey Richter is a cofounder of Wintellect (www.Wintellect.com), a training and consulting firm. He is the author of several books, including Applied Microsoft .NET Framework Programming (Microsoft Press, 2002). Jeffrey is also a contributing editor to MSDN Magazine and has been consulting with Microsoft since 1990.
posted on 2007-04-29 19:32  彭帅  阅读(544)  评论(0编辑  收藏  举报