[读源码] absl-spinlock/mutex

SpinLock 和 Mutex 是两种不同类型的锁, 它们的目的都是实现临界区的互斥访问, 在不考虑优化的情况下, SpinLock 就是让当前线程在它所在的时间片轮训等待加锁成功, Mutex 则是由操作系统提供, 如果当前无法加锁, 就会放弃当前时间片转去其他线程, 直到加锁成功.

为了更好的理解这两者的区别和实现, 本文来学习 absl 库中两者的实现.

SpinLock

首先是 SpinLock, absl 在正常锁的功能上考虑了线程调度和等待时间问题, 它用 atomic<uint32_t> 存储状态, 来看他是如何被编码的

// Description of lock-word:
//  31..00: [............................3][2][1][0]
//
//     [0]: kSpinLockHeld
//     [1]: kSpinLockCooperative
//     [2]: kSpinLockDisabledScheduling
// [31..3]: ONLY kSpinLockSleeper OR
//          Wait time in cycles >> PROFILE_TIMESTAMP_SHIFT
//
// Detailed descriptions:
//
// Bit [0]: The lock is considered held iff kSpinLockHeld is set.
//
// Bit [1]: Eligible waiters (e.g. Fibers) may co-operatively reschedule when
//          contended iff kSpinLockCooperative is set.
//
// Bit [2]: This bit is exclusive from bit [1].  It is used only by a
//          non-cooperative lock.  When set, indicates that scheduling was
//          successfully disabled when the lock was acquired.  May be unset,
//          even if non-cooperative, if a ThreadIdentity did not yet exist at
//          time of acquisition.
//
// Bit [3]: If this is the only upper bit ([31..3]) set then this lock was
//          acquired without contention, however, at least one waiter exists.
//
//          Otherwise, bits [31..3] represent the time spent by the current lock
//          holder to acquire the lock.  There may be outstanding waiter(s).
static constexpr uint32_t kSpinLockHeld = 1;
static constexpr uint32_t kSpinLockCooperative = 2;
static constexpr uint32_t kSpinLockDisabledScheduling = 4;
static constexpr uint32_t kSpinLockSleeper = 8;
// Includes kSpinLockSleeper.
static constexpr uint32_t kWaitTimeMask =
    ~(kSpinLockHeld | kSpinLockCooperative | kSpinLockDisabledScheduling);

0位记录当前是否上锁, 1位记录上锁失败是否重新调度, 2位记录是否禁止调度, 剩下的位记录了线程的等待所花时间.

设计的很好, 可惜在代码中, 1,2 位涉及到线程调度, 这也超出了 absl 的范畴, 可能被谷歌其他项目所使用, 但没有放到 absl 里.

在分析实现之前, 还有一个点需要提到的是 tsan_mutex_inferface , 这里加入了许多 tasn 的宏定义, 用来做线程安全性扫描, 但也可以帮助我们理解代码中的行为, 但在分析中, 为了代码的简洁, 我会把这些注解删除.

Thread Safety Analysis

Lock

首先我们来看 Lock, TryLock 的实现.


inline bool TryLockImpl() {
  uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
  return (TryLockInternal(lock_value, 0) & kSpinLockHeld) == 0;
}

inline void Lock() () {
  if (!TryLockImpl()) {
    SlowLock();
  }
}

inline bool TryLock() {
  bool res = TryLockImpl();
  return res;
}

inline bool IsHeld() const {
  return (lockword_.load(std::memory_order_relaxed) & kSpinLockHeld) != 0;
}

Lock 函数体现了 Futex 的思想, 先 try lock, 失败了再进入 忙等待.

之后我们继续看 TryLockInternal, SlowLock.

// If (result & kSpinLockHeld) == 0, then *this was successfully locked.
// Otherwise, returns last observed value for lockword_.
inline uint32_t SpinLock::TryLockInternal(uint32_t lock_value,
                                          uint32_t wait_cycles) {
  if ((lock_value & kSpinLockHeld) != 0) {
    return lock_value;
  }

  uint32_t sched_disabled_bit = 0;
  if ((lock_value & kSpinLockCooperative) == 0) {
    // For non-cooperative locks we must make sure we mark ourselves as
    // non-reschedulable before we attempt to CompareAndSwap.
    if (base_internal::SchedulingGuard::DisableRescheduling()) {
      sched_disabled_bit = kSpinLockDisabledScheduling;
    }
  }

  if (!lockword_.compare_exchange_strong(
          lock_value,
          kSpinLockHeld | lock_value | wait_cycles | sched_disabled_bit,
          std::memory_order_acquire, std::memory_order_relaxed)) {
    base_internal::SchedulingGuard::EnableRescheduling(sched_disabled_bit != 0);
  }

  return lock_value;
}

TryLockInternal 有两个参数, lock_value 从 lockword_ load 出来的值, 和 wait_cycles 新的等待时间.

首先 test 是否已经上锁, 如果已经上锁说明是被别人持有, 上锁失败, 直接返回.

否则可以上锁, 这里先处理调度的逻辑, 因为我们构造函数只设置了 kSpinLockCooperative, 如果 kSpinLockCooperative == 0 且当前确实不允许Rescheduling, 我们就要设置 kSpinLockDisabledScheduling. (因为总是先加锁再解锁, 与解锁时 kSpinLockDisabledScheduling 的逻辑也是对应上的)

然后做一次 CAS, 把上锁, 等待时间, 禁止调度都给他加上, 最后直接返回交换出来的 lock_value. (如果交换成功了, lock_value 的值是不带锁的, 如果被其他线程抢先而加锁失败, lock_value 就是被其他线程写入后的已经带锁的值)

然后是 SlowLock 函数, 它只有在加锁成功后才允许返回, 且加锁操作只能被 TryLockInternal 完成,

void SpinLock::SlowLock() {
  uint32_t lock_value = SpinLoop();
  lock_value = TryLockInternal(lock_value, 0);
  if ((lock_value & kSpinLockHeld) == 0) {
    return;
  }

  base_internal::SchedulingMode scheduling_mode;
  if ((lock_value & kSpinLockCooperative) != 0) {
    scheduling_mode = base_internal::SCHEDULE_COOPERATIVE_AND_KERNEL;
  } else {
    scheduling_mode = base_internal::SCHEDULE_KERNEL_ONLY;
  }

  // The lock was not obtained initially, so this thread needs to wait for
  // it.  Record the current timestamp in the local variable wait_start_time
  // so the total wait time can be stored in the lockword once this thread
  // obtains the lock.
  int64_t wait_start_time = CycleClock::Now();
  uint32_t wait_cycles = 0;
  int lock_wait_call_count = 0;
  while ((lock_value & kSpinLockHeld) != 0) {
    // If the lock is currently held, but not marked as having a sleeper, mark
    // it as having a sleeper.
    if ((lock_value & kWaitTimeMask) == 0) {
      // Here, just "mark" that the thread is going to sleep.  Don't store the
      // lock wait time in the lock -- the lock word stores the amount of time
      // that the current holder waited before acquiring the lock, not the wait
      // time of any thread currently waiting to acquire it.
      if (lockword_.compare_exchange_strong(
              lock_value, lock_value | kSpinLockSleeper,
              std::memory_order_relaxed, std::memory_order_relaxed)) {
        // Successfully transitioned to kSpinLockSleeper.  Pass
        // kSpinLockSleeper to the SpinLockWait routine to properly indicate
        // the last lock_value observed.
        lock_value |= kSpinLockSleeper;
      } else if ((lock_value & kSpinLockHeld) == 0) {
        // Lock is free again, so try and acquire it before sleeping.  The
        // new lock state will be the number of cycles this thread waited if
        // this thread obtains the lock.
        lock_value = TryLockInternal(lock_value, wait_cycles);
        continue;   // Skip the delay at the end of the loop.
      } else if ((lock_value & kWaitTimeMask) == 0) {
        // The lock is still held, without a waiter being marked, but something
        // else about the lock word changed, causing our CAS to fail. For
        // example, a new lock holder may have acquired the lock with
        // kSpinLockDisabledScheduling set, whereas the previous holder had not
        // set that flag. In this case, attempt again to mark ourselves as a
        // waiter.
        continue;
      }
    }

    // SpinLockDelay() calls into fiber scheduler, we need to see
    // synchronization there to avoid false positives.
    // Wait for an OS specific delay.
    base_internal::SpinLockDelay(&lockword_, lock_value, ++lock_wait_call_count,
                                 scheduling_mode);
    // Spin again after returning from the wait routine to give this thread
    // some chance of obtaining the lock.
    lock_value = SpinLoop();
    wait_cycles = EncodeWaitCycles(wait_start_time, CycleClock::Now());
    lock_value = TryLockInternal(lock_value, wait_cycles);
  }
}

// Monitor the lock to see if its value changes within some time period
// (adaptive_spin_count loop iterations). The last value read from the lock
// is returned from the method.
uint32_t SpinLock::SpinLoop() {
  // We are already in the slow path of SpinLock, initialize the
  // adaptive_spin_count here.
  ABSL_CONST_INIT static absl::once_flag init_adaptive_spin_count;
  ABSL_CONST_INIT static int adaptive_spin_count = 0;
  base_internal::LowLevelCallOnce(&init_adaptive_spin_count, []() {
    adaptive_spin_count = base_internal::NumCPUs() > 1 ? 1000 : 1;
  });

  int c = adaptive_spin_count;
  uint32_t lock_value;
  do {
    lock_value = lockword_.load(std::memory_order_relaxed);
  } while ((lock_value & kSpinLockHeld) != 0 && --c > 0);
  return lock_value;
}

这里的注释很清楚, SpinLoop 是一次忙等待, 直到 被解锁或 循环一定次数才会退出, 返回 lockword_ 的状态.

但代码比较长, 用伪代码简化下, 并加上理解.

void SpinLock::SlowLock() {
  // 首先等待一轮, 再尝试加锁
  old_value = SpinLoop();
  old_value =  TryLockInternal(old_value, 0)
  if (old_value 没有持锁)
    return
  // 加锁失败, 调用系统调用等待, 而不是继续轮循等待
  记录开始时间
  while (old_value 还持有锁) 
    // 在系统调用之前必须加上时间标记, 就是声明有线程在等待, 否则系统调用等待后可能无法被唤醒.
    // 那么什么时候清掉时间标记? 看 TryLockInternal 的上两次调用, 如果线程加锁时没有阻塞, 说明其他线程没有在等待, 就可以清掉这个标记了.
    if (old_value 没有时间标记) 
      // 加上初始时间标记
      success = compare_and_warp(loadword_, old_value, old_value | kSpinLockSleeper);
      if (success)
        old_value |= kSpinLockSleeper;
      else if (old_value 没有持锁)
        // 已经被解锁, 尝试去加锁
        old_value = TryLockInternal(old_value) 
        continue
      else if (old_value 还是没有时间标记) 
        // compare_and_warp 有问题, 回到开始, 继续尝试设置时间标记
        continue
    // 系统调用, sleep/futex
    SpinLockDelay(&loadword_, old_value);
    old_value = SpinLoop();
    wait_cycles = 当前经过时间
    old_value =  TryLockInternal(old_value, wait_cycles)
}

加锁的代码就分析完毕, 我们来总结下,

  1. 0号比特表示是否有锁, 3-31比特表示当前等待时间.
  2. 加锁会有三个阶段 1 直接加锁, 2. 轮询加锁, 3. 循环系统调用等待,
  3. 最终的加锁都是靠 TryLockInternal 完成的.
  4. 是否有锁在等待 是通过等待时间标记来判断的.

UnLock

接着是 unlock


inline void Unlock() {
  uint32_t lock_value = lockword_.load(std::memory_order_relaxed);
  lock_value = lockword_.exchange(lock_value & kSpinLockCooperative,
                                  std::memory_order_release);

  if ((lock_value & kSpinLockDisabledScheduling) != 0) {
    base_internal::SchedulingGuard::EnableRescheduling(true);
  }
  if ((lock_value & kWaitTimeMask) != 0) {
    // Collect contentionz profile info, and speed the wakeup of any waiter.
    // The wait_cycles value indicates how long this thread spent waiting
    // for the lock.
    SlowUnlock(lock_value);
  }
}

Unlock 可能一眼不太能看出要做什么, 首先前两行是将老 lockword_ 换出来, 新值保留 kSpinLockCooperative bit, 这里还用到了 memory_order_release, 因此 TryLock 的时候需要 memory_order_acquire 与之对应.

然后我们去拿老 lockword_ 值与 kSpinLockDisabledScheduling 做 Rescheduling 的逻辑, (这里留个问题, 假如有多个 SpinLock, 一个允许 rescheduling, 另一个不允许, 如何处理?)

之后是 &kWaitTimeMask , 如果有值, 说明有其他线程在等待当前的锁, 调用 SlowUnlock 去激活它.

接下来是 SlowUnlock

void SpinLock::SlowUnlock(uint32_t lock_value) {
  base_internal::SpinLockWake(&lockword_,
                              false);  // wake waiter if necessary

  // If our acquisition was contended, collect contentionz profile info.  We
  // reserve a unitary wait time to represent that a waiter exists without our
  // own acquisition having been contended.
  if ((lock_value & kWaitTimeMask) != kSpinLockSleeper) {
    const uint64_t wait_cycles = DecodeWaitCycles(lock_value);
    submit_profile_data(this, wait_cycles);
  }
}

通过系统调用唤醒等待线程, 如果等待时长超过初始值, 就上报一次 profile.

Mutex

Mutex 相较于 SpinLock 要复杂许多, 与 std::mutex 相比, 它也多了如下的功能

//   * Conditional predicates intrinsic to the `Mutex` object
//   * Shared/reader locks, in addition to standard exclusive/writer locks
//   * Deadlock detection and debug support.

TODO

Relate Reading:

design doc
简书上一位大佬写的

std::mutex

我所使用的 libcxx 会使用 pthread_mutex / mtx_t 实现

Performance Benchmark

这张图是 mutex_benchmark:BM_Contended 的结果. 不同的锁在不同临界区时间下, 随着线程数增加所花费的时间.

可以看到, std::mutex 在线程数较少时, 性能增加明显, 随着到达一定界限后, 所需时间反而下降, 而最终性能三种锁差别都不大.

对于 absl::Mutex , 在临界区非常短的情况下, 性能也没有逊色于 absl:base_internal::SpinLock, 总体也是性能最好的, 优化做的十分到位, 在正常的使用中应该首选 absl::Mutex.

posted @ 2022-03-08 18:37  新新人類  阅读(458)  评论(0编辑  收藏  举报