innodb的互斥量(1): os_event

      innodb里实现了2类很常用的互斥量,一个是mutex_t(独占形式),另外一个是rw_lock_t(读共享,写独占),innodb对其进行了改造,以适应数据库的性能要求。因为并发是innodb主打的看点,所以这两类互斥量在整个代码里面占了很重要的地位(特别是mutex_t,几乎贯穿了整个体系),而在介绍这两种互斥量之前,先要介绍一个基础的模块——os_event,它实现了基本的事件收发机制, mutex_t和rw_lock_t的互斥通知都依赖的是os_event。

  note: innodb喜欢把封装了系统调用的模块没其名曰os_xxxxx

先描述一下os_event的事件收发流程                   

thread A calls os_event_reset(event_1) [开始接收事件通知]
thread B calls os_event_set(event_1)   [ 发送事件通知]
thread A calls os_event_wait(event_1)  [等待事件]
thread A 等待完毕

1. A进程调用了os_event_reset()后就已经加入了争抢event_1的队伍,而不是只在wait的时候才开始接收事件,也就是说在reset和wait之间发的该事件信号A也收得到(具体实现code体现)

2. os_event_set的事件通知是惊群模式(调用的pthread_cond_broadcast), 通知所有的waiter这个肯定增加cpu开销,但是可以满足rw_lock_t的需求,下面是pthread manual的一段解释

       The pthread_cond_broadcast() function is used whenever the shared-vari-
       able state has been changed in a way that more than one thread can pro-
       ceed with its task. Consider a single producer/multiple consumer	 prob-
       lem,  where  the	 producer  can insert multiple items on a list that is
       accessed one  item  at  a  time	by  the	 consumers.   By  calling  the
       pthread_cond_broadcast()	 function,  the producer would notify all con-
       sumers that might be waiting, and thereby the application would receive
       more  throughput on a multi-processor. In addition, pthread_cond_broad-
       cast()  makes  it  easier  to  implement	  a   read-write   lock.   The
       pthread_cond_broadcast()	 function  is  needed  in order to wake up all
       waiting readers when a writer releases its  lock.   Finally,  the  two-
       phase  commit  algorithm	 can use this broadcast function to notify all
       clients of an impending transaction commit.

 3.os_event_wait是个pthread_mutex和pthread_cond的常见组合,网上很多这种介绍。

 

我们看看os_event的实现

下面是event的结构

struct os_event_struct {


    os_fast_mutex_t    os_mutex;    /*!< this mutex protects the next
                    fields */
    ibool        is_set;        /*!< this is TRUE when the event is
                    in the signaled state, i.e., a thread
                    does not stop if it tries to wait for
                    this event */
    ib_int64_t    signal_count;    /*!< this is incremented each time
                    the event becomes signaled */
    os_cond_t    cond_var;    /*!< condition variable is used in
                    waiting for the event */
    UT_LIST_NODE_T(os_event_struct_t) os_event_list;
                    /*!< list of all created events */
};

 

1)  is_set和signal_count是一个事件状态的标志组合

线程发送事件(event_set),is_set设置为true,且signal_count++(signal_count只会一直递增)

    os_fast_mutex_lock(&(event->os_mutex));

    if (event->is_set) {
        /* Do nothing */
    } else {
        event->is_set = TRUE;
        event->signal_count += 1;
        os_cond_broadcast(&(event->cond_var));
    }

    os_fast_mutex_unlock(&(event->os_mutex));

 

线程开始接收事件通知(event_reset)会返回此刻的signal_count(假定调用的该线程将返回值保留在old_signal_count里)且is_set设置为false

    os_fast_mutex_lock(&(event->os_mutex));

    if (!event->is_set) {
        /* Do nothing */
    } else {
        event->is_set = FALSE;
    }
    ret = event->signal_count;

    os_fast_mutex_unlock(&(event->os_mutex));

 

 

(old_signal_count==signal_count && is_set==false) 作为判定从reset到wait之间是否已经有event的标志(表达式为真则无event来,还有一个是timeout_wait的,但实现大同小异)

//
os_fast_mutex_lock(&event->os_mutex);
 //初始化这个event的时候signal_count从1开始,因为0在os_event_wait_low判断放弃reset到wait直接的event通知的标志,
 //也就是说old_signal_count硬性设置为0则等于从cond_wait才开始接收该事件的通知
if (!reset_sig_count) 
{
reset_sig_count
= event->signal_count;
}

while
(!event->is_set && event->signal_count == reset_sig_count)
{
os_cond_wait(
&(event->cond_var), &(event->os_mutex));

/* Solaris manual said that spurious wakeups may occur: we have to check if the event really has been signaled after we came here to wait */
}
os_fast_mutex_unlock(
&event->os_mutex);

 


这样做的好处是

event_reset把is_set设置为false,则屏蔽了reset之前的所有event通知,避免早已有event_set把is_set设置过了,但是仅这样设计有缺陷,因为如果是下面这样

A:event_reset

B: event_set

C:event_reset

A : event_wait

这样B的事件通知被C给意外抹杀掉了,A就丢失了这次通知,继续等待下去,所以还得引入signal_count这个变量的判断,如果A在reset的时候记录了signal_count的oldvalue,那么就算is_set


被C给设置成false了,(old_signal_count==signal_count && is_set==false)还是判断为假,A的wait依然会通过。

 

2)  os_mutex保证并发情况下这个os_event内成员的修改一致性,也会配合cond_var等待事件,(os_fast_mutex_tos_cond_t是对pthread_mutex和pthread_cond的简单封装)

3)  所有的os_event都会加入到一个全局双链表中,os_event_list则又反向指向这个链表

/* The os_sync_mutex can be NULL because during startup an event
    can be created [ because it's embedded in the mutex/rwlock ] before
    this module has been initialized */
    if (os_sync_mutex != NULL) {
        os_mutex_enter(os_sync_mutex);
    }

    /* Put to the list of events */
    UT_LIST_ADD_FIRST(os_event_list, os_event_list, event);

    os_event_count++;

    if (os_sync_mutex != NULL) {
        os_mutex_exit(os_sync_mutex);
    }

 os_event的成员就这么多,实现也是比较简单的,主要还是靠对is_set和signal_count的修改和判断来实现整个事件行为,后面的rw_lock和mutex会复杂一点

 

 

posted @ 2012-06-09 14:27  白帆mvp  阅读(1174)  评论(0编辑  收藏  举报