[读源码] absl-spinlock/mutex
SpinLock 和 Mutex 是两种不同类型的锁, 它们的目的都是实现临界区的互斥访问, 在不考虑优化的情况下, SpinLock 就是让当前线程在它所在的时间片轮训等待加锁成功, Mutex 则是由操作系统提供, 如果当前无法加锁, 就会放弃当前时间片转去其他线程, 直到加锁成功.
为了更好的理解这两者的区别和实现, 本文来学习 absl 库中两者的实现.
SpinLock
首先是 SpinLock, absl 在正常锁的功能上考虑了线程调度和等待时间问题, 它用 atomic<uint32_t> 存储状态, 来看他是如何被编码的
// Description of lock-word:
// 31..00: [............................3][2][1][0]
//
// [0]: kSpinLockHeld
// [1]: kSpinLockCooperative
// [2]: kSpinLockDisabledScheduling
// [31..3]: ONLY kSpinLockSleeper OR
// Wait time in cycles >> PROFILE_TIMESTAMP_SHIFT
//
// Detailed descriptions:
//
// Bit [0]: The lock is considered held iff kSpinLockHeld is set.
//
// Bit [1]: Eligible waiters (e.g. Fibers) may co-operatively reschedule when
// contended iff kSpinLockCooperative is set.
//
// Bit [2]: This bit is exclusive from bit [1]. It is used only by a
// non-cooperative lock. When set, indicates that scheduling was
// successfully disabled when the lock was acquired. May be unset,
// even if non-cooperative, if a ThreadIdentity did not yet exist at
// time of acquisition.
//
// Bit [3]: If this is the only upper bit ([31..3]) set then this lock was
// acquired without contention, however, at least one waiter exists.
//
// Otherwise, bits [31..3] represent the time spent by the current lock
// holder to acquire the lock. There may be outstanding waiter(s).
static constexpr uint32_t kSpinLockHeld = 1;
static constexpr uint32_t kSpinLockCooperative = 2;
static constexpr uint32_t kSpinLockDisabledScheduling = 4;
static constexpr uint32_t kSpinLockSleeper = 8;
// Includes kSpinLockSleeper.
static constexpr uint32_t kWaitTimeMask =
~(kSpinLockHeld | kSpinLockCooperative | kSpinLockDisabledScheduling);
0位记录当前是否上锁, 1位记录上锁失败是否重新调度, 2位记录是否禁止调度, 剩下的位记录了线程的等待所花时间.
设计的很好, 可惜在代码中, 1,2 位涉及到线程调度, 这也超出了 absl 的范畴, 可能被谷歌其他项目所使用, 但没有放到 absl 里.
在分析实现之前, 还有一个点需要提到的是 tsan_mutex_inferface , 这里加入了许多 tasn 的宏定义, 用来做线程安全性扫描, 但也可以帮助我们理解代码中的行为, 但在分析中, 为了代码的简洁, 我会把这些注解删除.
Lock
首先我们来看 Lock, TryLock 的实现.
inline bool TryLockImpl() {
uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
return (TryLockInternal(lock_value, 0) & kSpinLockHeld) == 0;
}
inline void Lock() () {
if (!TryLockImpl()) {
SlowLock();
}
}
inline bool TryLock() {
bool res = TryLockImpl();
return res;
}
inline bool IsHeld() const {
return (lockword_.load(std::memory_order_relaxed) & kSpinLockHeld) != 0;
}
Lock 函数体现了 Futex 的思想, 先 try lock, 失败了再进入 忙等待.
之后我们继续看 TryLockInternal, SlowLock.
// If (result & kSpinLockHeld) == 0, then *this was successfully locked.
// Otherwise, returns last observed value for lockword_.
inline uint32_t SpinLock::TryLockInternal(uint32_t lock_value,
uint32_t wait_cycles) {
if ((lock_value & kSpinLockHeld) != 0) {
return lock_value;
}
uint32_t sched_disabled_bit = 0;
if ((lock_value & kSpinLockCooperative) == 0) {
// For non-cooperative locks we must make sure we mark ourselves as
// non-reschedulable before we attempt to CompareAndSwap.
if (base_internal::SchedulingGuard::DisableRescheduling()) {
sched_disabled_bit = kSpinLockDisabledScheduling;
}
}
if (!lockword_.compare_exchange_strong(
lock_value,
kSpinLockHeld | lock_value | wait_cycles | sched_disabled_bit,
std::memory_order_acquire, std::memory_order_relaxed)) {
base_internal::SchedulingGuard::EnableRescheduling(sched_disabled_bit != 0);
}
return lock_value;
}
TryLockInternal 有两个参数, lock_value 从 lockword_ load 出来的值, 和 wait_cycles 新的等待时间.
首先 test 是否已经上锁, 如果已经上锁说明是被别人持有, 上锁失败, 直接返回.
否则可以上锁, 这里先处理调度的逻辑, 因为我们构造函数只设置了 kSpinLockCooperative, 如果 kSpinLockCooperative == 0 且当前确实不允许Rescheduling, 我们就要设置 kSpinLockDisabledScheduling. (因为总是先加锁再解锁, 与解锁时 kSpinLockDisabledScheduling 的逻辑也是对应上的)
然后做一次 CAS, 把上锁, 等待时间, 禁止调度都给他加上, 最后直接返回交换出来的 lock_value. (如果交换成功了, lock_value 的值是不带锁的, 如果被其他线程抢先而加锁失败, lock_value 就是被其他线程写入后的已经带锁的值)
然后是 SlowLock 函数, 它只有在加锁成功后才允许返回, 且加锁操作只能被 TryLockInternal 完成,
void SpinLock::SlowLock() {
uint32_t lock_value = SpinLoop();
lock_value = TryLockInternal(lock_value, 0);
if ((lock_value & kSpinLockHeld) == 0) {
return;
}
base_internal::SchedulingMode scheduling_mode;
if ((lock_value & kSpinLockCooperative) != 0) {
scheduling_mode = base_internal::SCHEDULE_COOPERATIVE_AND_KERNEL;
} else {
scheduling_mode = base_internal::SCHEDULE_KERNEL_ONLY;
}
// The lock was not obtained initially, so this thread needs to wait for
// it. Record the current timestamp in the local variable wait_start_time
// so the total wait time can be stored in the lockword once this thread
// obtains the lock.
int64_t wait_start_time = CycleClock::Now();
uint32_t wait_cycles = 0;
int lock_wait_call_count = 0;
while ((lock_value & kSpinLockHeld) != 0) {
// If the lock is currently held, but not marked as having a sleeper, mark
// it as having a sleeper.
if ((lock_value & kWaitTimeMask) == 0) {
// Here, just "mark" that the thread is going to sleep. Don't store the
// lock wait time in the lock -- the lock word stores the amount of time
// that the current holder waited before acquiring the lock, not the wait
// time of any thread currently waiting to acquire it.
if (lockword_.compare_exchange_strong(
lock_value, lock_value | kSpinLockSleeper,
std::memory_order_relaxed, std::memory_order_relaxed)) {
// Successfully transitioned to kSpinLockSleeper. Pass
// kSpinLockSleeper to the SpinLockWait routine to properly indicate
// the last lock_value observed.
lock_value |= kSpinLockSleeper;
} else if ((lock_value & kSpinLockHeld) == 0) {
// Lock is free again, so try and acquire it before sleeping. The
// new lock state will be the number of cycles this thread waited if
// this thread obtains the lock.
lock_value = TryLockInternal(lock_value, wait_cycles);
continue; // Skip the delay at the end of the loop.
} else if ((lock_value & kWaitTimeMask) == 0) {
// The lock is still held, without a waiter being marked, but something
// else about the lock word changed, causing our CAS to fail. For
// example, a new lock holder may have acquired the lock with
// kSpinLockDisabledScheduling set, whereas the previous holder had not
// set that flag. In this case, attempt again to mark ourselves as a
// waiter.
continue;
}
}
// SpinLockDelay() calls into fiber scheduler, we need to see
// synchronization there to avoid false positives.
// Wait for an OS specific delay.
base_internal::SpinLockDelay(&lockword_, lock_value, ++lock_wait_call_count,
scheduling_mode);
// Spin again after returning from the wait routine to give this thread
// some chance of obtaining the lock.
lock_value = SpinLoop();
wait_cycles = EncodeWaitCycles(wait_start_time, CycleClock::Now());
lock_value = TryLockInternal(lock_value, wait_cycles);
}
}
// Monitor the lock to see if its value changes within some time period
// (adaptive_spin_count loop iterations). The last value read from the lock
// is returned from the method.
uint32_t SpinLock::SpinLoop() {
// We are already in the slow path of SpinLock, initialize the
// adaptive_spin_count here.
ABSL_CONST_INIT static absl::once_flag init_adaptive_spin_count;
ABSL_CONST_INIT static int adaptive_spin_count = 0;
base_internal::LowLevelCallOnce(&init_adaptive_spin_count, []() {
adaptive_spin_count = base_internal::NumCPUs() > 1 ? 1000 : 1;
});
int c = adaptive_spin_count;
uint32_t lock_value;
do {
lock_value = lockword_.load(std::memory_order_relaxed);
} while ((lock_value & kSpinLockHeld) != 0 && --c > 0);
return lock_value;
}
这里的注释很清楚, SpinLoop 是一次忙等待, 直到 被解锁或 循环一定次数才会退出, 返回 lockword_ 的状态.
但代码比较长, 用伪代码简化下, 并加上理解.
void SpinLock::SlowLock() {
// 首先等待一轮, 再尝试加锁
old_value = SpinLoop();
old_value = TryLockInternal(old_value, 0)
if (old_value 没有持锁)
return
// 加锁失败, 调用系统调用等待, 而不是继续轮循等待
记录开始时间
while (old_value 还持有锁)
// 在系统调用之前必须加上时间标记, 就是声明有线程在等待, 否则系统调用等待后可能无法被唤醒.
// 那么什么时候清掉时间标记? 看 TryLockInternal 的上两次调用, 如果线程加锁时没有阻塞, 说明其他线程没有在等待, 就可以清掉这个标记了.
if (old_value 没有时间标记)
// 加上初始时间标记
success = compare_and_warp(loadword_, old_value, old_value | kSpinLockSleeper);
if (success)
old_value |= kSpinLockSleeper;
else if (old_value 没有持锁)
// 已经被解锁, 尝试去加锁
old_value = TryLockInternal(old_value)
continue
else if (old_value 还是没有时间标记)
// compare_and_warp 有问题, 回到开始, 继续尝试设置时间标记
continue
// 系统调用, sleep/futex
SpinLockDelay(&loadword_, old_value);
old_value = SpinLoop();
wait_cycles = 当前经过时间
old_value = TryLockInternal(old_value, wait_cycles)
}
加锁的代码就分析完毕, 我们来总结下,
- 0号比特表示是否有锁, 3-31比特表示当前等待时间.
- 加锁会有三个阶段 1 直接加锁, 2. 轮询加锁, 3. 循环系统调用等待,
- 最终的加锁都是靠 TryLockInternal 完成的.
- 是否有锁在等待 是通过等待时间标记来判断的.
UnLock
接着是 unlock
inline void Unlock() {
uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
lock_value = lockword_.exchange(lock_value & kSpinLockCooperative,
std::memory_order_release);
if ((lock_value & kSpinLockDisabledScheduling) != 0) {
base_internal::SchedulingGuard::EnableRescheduling(true);
}
if ((lock_value & kWaitTimeMask) != 0) {
// Collect contentionz profile info, and speed the wakeup of any waiter.
// The wait_cycles value indicates how long this thread spent waiting
// for the lock.
SlowUnlock(lock_value);
}
}
Unlock 可能一眼不太能看出要做什么, 首先前两行是将老 lockword_ 换出来, 新值保留 kSpinLockCooperative bit, 这里还用到了 memory_order_release, 因此 TryLock 的时候需要 memory_order_acquire 与之对应.
然后我们去拿老 lockword_ 值与 kSpinLockDisabledScheduling 做 Rescheduling 的逻辑, (这里留个问题, 假如有多个 SpinLock, 一个允许 rescheduling, 另一个不允许, 如何处理?)
之后是 &kWaitTimeMask , 如果有值, 说明有其他线程在等待当前的锁, 调用 SlowUnlock 去激活它.
接下来是 SlowUnlock
void SpinLock::SlowUnlock(uint32_t lock_value) {
base_internal::SpinLockWake(&lockword_,
false); // wake waiter if necessary
// If our acquisition was contended, collect contentionz profile info. We
// reserve a unitary wait time to represent that a waiter exists without our
// own acquisition having been contended.
if ((lock_value & kWaitTimeMask) != kSpinLockSleeper) {
const uint64_t wait_cycles = DecodeWaitCycles(lock_value);
submit_profile_data(this, wait_cycles);
}
}
通过系统调用唤醒等待线程, 如果等待时长超过初始值, 就上报一次 profile.
Mutex
Mutex 相较于 SpinLock 要复杂许多, 与 std::mutex 相比, 它也多了如下的功能
// * Conditional predicates intrinsic to the `Mutex` object
// * Shared/reader locks, in addition to standard exclusive/writer locks
// * Deadlock detection and debug support.
TODO
Relate Reading:
std::mutex
我所使用的 libcxx 会使用 pthread_mutex / mtx_t 实现
Performance Benchmark
这张图是 mutex_benchmark:BM_Contended 的结果. 不同的锁在不同临界区时间下, 随着线程数增加所花费的时间.
可以看到, std::mutex 在线程数较少时, 性能增加明显, 随着到达一定界限后, 所需时间反而下降, 而最终性能三种锁差别都不大.
对于 absl::Mutex , 在临界区非常短的情况下, 性能也没有逊色于 absl:base_internal::SpinLock, 总体也是性能最好的, 优化做的十分到位, 在正常的使用中应该首选 absl::Mutex.
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步