进程间通信之分工协作--锁
Unreliable Guide To Locking
Introduction
Welcome, to Rusty’s Remarkably Unreliable Guide to Kernel Locking issues. This document describes the locking systems in the Linux Kernel in 2.6.
With the wide availability of HyperThreading, and preemption in the Linux Kernel, everyone hacking on the kernel needs to know the fundamentals of concurrency and locking for SMP.
The Problem With Concurrency
(Skip this if you know what a Race Condition is).
In a normal program, you can increment a counter like so:
very_important_count++;
This is what they would expect to happen:
Instance 1 | Instance 2 |
---|---|
read very_important_count (5) | |
add 1 (6) | |
write very_important_count (6) | |
read very_important_count (6) | |
add 1 (7) | |
write very_important_count (7) |
This is what might happen:
Instance 1 | Instance 2 |
---|---|
read very_important_count (5) | |
read very_important_count (5) | |
add 1 (6) | |
add 1 (6) | |
write very_important_count (6) | |
write very_important_count (6) |
Race Conditions and Critical Regions
This overlap, where the result depends on the relative timing of multiple tasks, is called a race condition. The piece of code containing the concurrency issue is called a critical region. And especially since Linux starting running on SMP machines, they became one of the major issues in kernel design and implementation.
Preemption can have the same effect, even if there is only one CPU: by preempting one task during the critical region, we have exactly the same race condition. In this case the thread which preempts might run the critical region itself.
The solution is to recognize when these simultaneous accesses occur, and use locks to make sure that only one instance can enter the critical region at any time. There are many friendly primitives in the Linux kernel to help you do this. And then there are the unfriendly primitives, but I’ll pretend they don’t exist.
Locking in the Linux Kernel
If I could give you one piece of advice: never sleep with anyone crazier than yourself. But if I had to give you advice on locking: keep it simple.
Be reluctant to introduce new locks.
Strangely enough, this last one is the exact reverse of my advice when you have slept with someone crazier than yourself. And you should think about getting a big dog.
Two Main Types of Kernel Locks: Spinlocks and Mutexes
There are two main types of kernel locks. The fundamental type is the spinlock (include/asm/spinlock.h
), which is a very simple single-holder lock: if you can’t get the spinlock, you keep trying (spinning) until you can. Spinlocks are very small and fast, and can be used anywhere.
The second type is a mutex (include/linux/mutex.h
): it is like a spinlock, but you may block holding a mutex. If you can’t lock a mutex, your task will suspend itself, and be woken up when the mutex is released. This means the CPU can do something else while you are waiting. There are many cases when you simply can’t sleep (see What Functions Are Safe To Call From Interrupts?), and so have to use a spinlock instead.
Neither type of lock is recursive: see Deadlock: Simple and Advanced.
Locks and Uniprocessor Kernels
For kernels compiled without CONFIG_SMP
, and without CONFIG_PREEMPT
spinlocks do not exist at all. This is an excellent design decision: when no-one else can run at the same time, there is no reason to have a lock.
If the kernel is compiled without CONFIG_SMP
, but CONFIG_PREEMPT
is set, then spinlocks simply disable preemption, which is sufficient to prevent any races. For most purposes, we can think of preemption as equivalent to SMP, and not worry about it separately.
You should always test your locking code with CONFIG_SMP
and CONFIG_PREEMPT
enabled, even if you don’t have an SMP test box, because it will still catch some kinds of locking bugs.
Mutexes still exist, because they are required for synchronization between user contexts, as we will see below.
Locking Only In User Context
If you have a data structure which is only ever accessed from user context, then you can use a simple mutex (include/linux/mutex.h
) to protect it. This is the most trivial case: you initialize the mutex. Then you can call mutex_lock_interruptible()
to grab the mutex, and mutex_unlock()
to release it. There is also a mutex_lock()
, which should be avoided, because it will not return if a signal is received.
Example: net/netfilter/nf_sockopt.c
allows registration of new setsockopt()
and getsockopt()
calls, with nf_register_sockopt()
. Registration and de-registration are only done on module load and unload (and boot time, where there is no concurrency), and the list of registrations is only consulted for an unknown setsockopt()
or getsockopt()
system call. The nf_sockopt_mutex
is perfect to protect this, especially since the setsockopt and getsockopt calls may well sleep.
Locking Between User Context and Softirqs
If a softirq shares data with user context, you have two problems. Firstly, the current user context can be interrupted by a softirq, and secondly, the critical region could be entered from another CPU. This is where spin_lock_bh()
(include/linux/spinlock.h
) is used. It disables softirqs on that CPU, then grabs the lock. spin_unlock_bh()
does the reverse. (The ‘_bh’ suffix is a historical reference to “Bottom Halves”, the old name for software interrupts. It should really be called spin_lock_softirq()’ in a perfect world).
Note that you can also use spin_lock_irq()
or spin_lock_irqsave()
here, which stop hardware interrupts as well: see Hard IRQ Context.
This works perfectly for UP as well: the spin lock vanishes, and this macro simply becomes local_bh_disable()
(include/linux/interrupt.h
), which protects you from the softirq being run.
Locking Between User Context and Tasklets
This is exactly the same as above, because tasklets are actually run from a softirq.
Locking Between User Context and Timers
This, too, is exactly the same as above, because timers are actually run from a softirq. From a locking point of view, tasklets and timers are identical.
Locking Between Tasklets/Timers
Sometimes a tasklet or timer might want to share data with another tasklet or timer.
The Same Tasklet/Timer
Since a tasklet is never run on two CPUs at once, you don’t need to worry about your tasklet being reentrant (running twice at once), even on SMP.
Different Tasklets/Timers
If another tasklet/timer wants to share data with your tasklet or timer , you will both need to use spin_lock()
and spin_unlock()
calls. spin_lock_bh()
is unnecessary here, as you are already in a tasklet, and none will be run on the same CPU.
Locking Between Softirqs
Often a softirq might want to share data with itself or a tasklet/timer.
The Same Softirq
The same softirq can run on the other CPUs: you can use a per-CPU array (see Per-CPU Data) for better performance. If you’re going so far as to use a softirq, you probably care about scalable performance enough to justify the extra complexity.
You’ll need to use spin_lock()
and spin_unlock()
for shared data.
Different Softirqs
You’ll need to use spin_lock()
and spin_unlock()
for shared data, whether it be a timer, tasklet, different softirq or the same or another softirq: any of them could be running on a different CPU.
Hard IRQ Context
Hardware interrupts usually communicate with a tasklet or softirq. Frequently this involves putting work in a queue, which the softirq will take out.
Locking Between Hard IRQ and Softirqs/Tasklets
If a hardware irq handler shares data with a softirq, you have two concerns. Firstly, the softirq processing can be interrupted by a hardware interrupt, and secondly, the critical region could be entered by a hardware interrupt on another CPU. This is where spin_lock_irq()
is used. It is defined to disable interrupts on that cpu, then grab the lock. spin_unlock_irq()
does the reverse.
The irq handler does not to use spin_lock_irq()
, because the softirq cannot run while the irq handler is running: it can use spin_lock()
, which is slightly faster. The only exception would be if a different hardware irq handler uses the same lock: spin_lock_irq()
will stop that from interrupting us.
This works perfectly for UP as well: the spin lock vanishes, and this macro simply becomes local_irq_disable()
(include/asm/smp.h
), which protects you from the softirq/tasklet/BH being run.
spin_lock_irqsave()
(include/linux/spinlock.h
) is a variant which saves whether interrupts were on or off in a flags word, which is passed to spin_unlock_irqrestore()
. This means that the same code can be used inside an hard irq handler (where interrupts are already off) and in softirqs (where the irq disabling is required).
Note that softirqs (and hence tasklets and timers) are run on return from hardware interrupts, so spin_lock_irq()
also stops these. In that sense, spin_lock_irqsave()
is the most general and powerful locking function.
Locking Between Two Hard IRQ Handlers
It is rare to have to share data between two IRQ handlers, but if you do, spin_lock_irqsave()
should be used: it is architecture-specific whether all interrupts are disabled inside irq handlers themselves.
Cheat Sheet For Locking
Pete Zaitcev gives the following summary:
- If you are in a process context (any syscall) and want to lock other process out, use a mutex. You can take a mutex and sleep (
copy_from_user*(
orkmalloc(x,GFP_KERNEL)
). - Otherwise (== data can be touched in an interrupt), use
spin_lock_irqsave()
andspin_unlock_irqrestore()
. - Avoid holding spinlock for more than 5 lines of code and across any function call (except accessors like
readb()
).
Table of Minimum Requirements
The following table lists the minimum locking requirements between various contexts. In some cases, the same context can only be running on one CPU at a time, so no locking is required for that context (eg. a particular thread can only run on one CPU at a time, but if it needs shares data with another thread, locking is required).
Remember the advice above: you can always use spin_lock_irqsave()
, which is a superset of all other spinlock primitives.
. | IRQ Handler A | IRQ Handler B | Softirq A | Softirq B | Tasklet A | Tasklet B | Timer A | Timer B | User Context A | User Context B |
---|---|---|---|---|---|---|---|---|---|---|
IRQ Handler A | None | |||||||||
IRQ Handler B | SLIS | None | ||||||||
Softirq A | SLI | SLI | SL | |||||||
Softirq B | SLI | SLI | SL | SL | ||||||
Tasklet A | SLI | SLI | SL | SL | None | |||||
Tasklet B | SLI | SLI | SL | SL | SL | None | ||||
Timer A | SLI | SLI | SL | SL | SL | SL | None | |||
Timer B | SLI | SLI | SL | SL | SL | SL | SL | None | ||
User Context A | SLI | SLI | SLBH | SLBH | SLBH | SLBH | SLBH | SLBH | None | |
User Context B | SLI | SLI | SLBH | SLBH | SLBH | SLBH | SLBH | SLBH | MLI | None |
Table: Table of Locking Requirements
SLIS | spin_lock_irqsave |
SLI | spin_lock_irq |
SL | spin_lock |
SLBH | spin_lock_bh |
MLI | mutex_lock_interruptible |
Table: Legend for Locking Requirements Table
The trylock Functions
There are functions that try to acquire a lock only once and immediately return a value telling about success or failure to acquire the lock. They can be used if you need no access to the data protected with the lock when some other thread is holding the lock. You should acquire the lock later if you then need access to the data protected with the lock.
spin_trylock()
does not spin but returns non-zero if it acquires the spinlock on the first try or 0 if not. This function can be used in all contexts like spin_lock()
: you must have disabled the contexts that might interrupt you and acquire the spin lock.
mutex_trylock()
does not suspend your task but returns non-zero if it could lock the mutex on the first try or 0 if not. This function cannot be safely used in hardware or software interrupt contexts despite not sleeping.
摘抄自:https://www.kernel.org/doc/html/v4.18/kernel-hacking/locking.html#sleeping-things