berkeley db replica机制 - election algorithm

repmgr_method.c, __repmgr_start_int() 

初始2个elect线程. 

repmgr_elect.c, __repmgr_init_election()

                      __repmgr_elect_thread()

                     __repmgr_elect_main()

            lease, preferred master mode, 

rep_elect.c,   __repmgr_elect()

        __rep_elect_init()

       lockout,

       if (rep->egen != egen)  // then out

       tiebreaker

 

       /* Use the last commit record as the LSN in the vote

       __rep_write_egen

        __rep_tally // tally our own vote

       __rep_cmp_vote // 把我们自己预先记录为winner

        __rep_send_vote() // -send vote1, our own vote, REP_VOTE1

        phase1, wait...

         if (rep->sites >= rep->nvotes) { // 满足进入phase2, 不满足就退出了

rep->sites - sites heard from.

rep->nvotes - Number of votes needed.

        send vote2/ 或我们自己 是winner的情况, 投自己一票

        我赢了么?

 

rep_record.c, __rep_process_message_int()

case REP_VOTE1:
ret = __rep_vote1(env, rp, rec, eid);
break;
case REP_VOTE2:
ret = __rep_vote2(env, rec, eid);

 

__rep_vote1()

   我们自己是master, send REP_NEWMASTER, 退出

    若收到以前egen的vote, send REP_ALIVE

    若收到以后egen的vote, 终止当前vote, 更新egen

      * Ignore vote1's if we're in phase 2.

     __rep_tally - 记录下来, 如是新的vote site, rep->sites++

    __rep_cmp_vote // 比较此vote1和我们已有的winner

    如果已经得到所有site的vote1, 进入phase2

        - 我们是winner, claim; 否则vote2 别人

    如需要(full election?, 第一次拿到site的vote1), resend our vote1 到这个site

 

__rep_vote2()

/*
* Record this vote. In a VOTE2, the only valid entry
* in the vote information is the election generation.
*
* There are several things which can go wrong that we
* need to account for:
* 1. If we receive a latent VOTE2 from an earlier election,
* we want to ignore it.
* 2. If we receive a VOTE2 from a site from which we never
* received a VOTE1, we want to record it, because we simply
* may be processing messages out of order or its vote1 got lost,
* but that site got all the votes it needed to send it.
* 3. If we have received a duplicate VOTE2 from this election
* from the same site we want to ignore it.
* 4. If this is from the current election and someone is
* really voting for us, then we finally get to record it.
*/

rep_tally - 若 新的site发出的 vote2, rep->votes++

 

#define I_HAVE_WON(rep, winner) \
((rep)->votes >= (rep)->nvotes && winner == (rep)->eid)

 

rep->sites - sites heard from.

rep->nvotes - Number of votes needed.

rep->votes - Number of votes for this site.

rep->nsites - Number of sites in group.

 

/*
* We need to check sites == nsites, not more than half
* like we do in __rep_elect and the VOTE2 code. The
* reason is that we want to process all the incoming votes
* and not short-circuit once we reach more than half. The
* real winner's vote may be in the last half.
*/
#define IS_PHASE1_DONE(rep) \
((rep)->sites >= (rep)->nsites && (rep)->winner != DB_EID_INVALID)

 

 

u_int32_t egen; /* Replication election generation. */

 

 

 

 

 

REP_NEWMASTER -  我是新的master

 REP_MASTER_REQ - 谁是master?

 

rep_util.c, __rep_new_master() 与新master同步

 

 

/*
* Election gen file name
* The file contains an egen number for an election this client has NOT
* participated in. I.e. it is the number of a future election. We
* create it when we create the rep region, if it doesn't already exist
* and initialize egen to 1. If it does exist, we read it when we create
* the rep region. We write it immediately before sending our VOTE1 in
* an election. That way, if a client has ever sent a vote for any
* election, the file is already going to be updated to reflect a future
* election, should it crash.
*/
#define REP_EGENNAME "__db.rep.egen"

 

typedef struct {
u_int32_t egen; /* Voter's election generation. */
int eid; /* Voter's ID. */
} REP_VTALLY;

 

rep_elect.c, __rep_tally()

* Ignore votes from earlier elections (i.e. we've heard
* from this site in this election, but its vote from an
* earlier election got delayed and we received it now).
* However, if we happened to hear from an earlier vote
* and we recorded it and we're now hearin

 

__rep_cmp_vote()

/* Make ourselves the winner to start. */

rep->winner 记录下已知的winner

 

 

__rep_elect_done() 

- 清elect flag, 清rep->votes,.. rep->egen++

posted @ 2016-08-17 13:13  brayden  阅读(156)  评论(0编辑  收藏  举报