Redis源码解析：26集群(二)键的分配与迁移

Redis集群通过分片的方式来保存数据库中的键值对：一个集群中，每个键都通过哈希函数映射到一个槽位，整个集群共分16384个槽位，集群中每个主节点负责其中的一部分槽位。

当数据库中的16384个槽位都有节点在处理时，集群处于上线状态；相反，如果数据库中有任何一个槽没有得到处理，那么集群处于下线状态。

所谓键的分配，实际上就是指槽位在集群节点中的分配；所谓键的迁移，实际上指槽位在集群节点间的迁移。

一：数据结构

在集群最主要的数据结构，记录集群状态的clusterState结构体中，与槽位相关的属性有：

clusterNode *slots[16384];
clusterNode *migrating_slots_to[16384];
clusterNode *importing_slots_from[16384];
zskiplist *slots_to_keys;

slots数组记录了16384个槽位，分别由哪个集群节点负责：比如server->cluster.slots[0] = node，这说明0号槽位由node节点负责；

migrating_slots_to数组记录了16384个槽位中，当前节点所负责的槽位正在迁出到哪个节点。比如server.cluster->migrating_slots_to[0] = node，这说明当前节点负责的0号槽位，正在迁出到node节点；

importing_slots_from数组记录了16384个槽位中，当前节点正在从哪个节点将某个槽位迁入到本节点中；比如server.cluster->importing_slots_from[0] = node，这说明当前节点正在从node节点处迁入0号槽位；

通过以上这些属性，可以快速得到某个槽位由哪个节点负责，以及该槽位正在迁出或迁入到哪个节点。

slots_to_keys是个跳跃表，该跳跃表中，以槽位号为分数进行排序。每个跳跃表节点保存了槽位号(分数)，以及该槽位上的某个key。通过该跳跃表，可以快速得到当前节点所负责的每一个槽位中，都有哪些key。

在表示集群节点的clusterNode结构体中，与槽位相关的属性有：

unsigned char slots[16384/8];
int numslots;

slots记录了节点负责处理哪些槽位。它是个位数组，其中每一个比特位表示一个槽位号，如果该比特位置为1，则说明该槽位由该节点负责；

numslots表示该节点负责的槽位总数；

通过以上这些属性，可以快速得到某个节点负责哪些槽位。

二：分配槽位

在集群刚建立时，需要手动为每个集群主节点分配其负责的槽位。这主要是通过向节点发送”CLUSTER ADDSLOTS”命令实现的。该命令的格式是：”CLUSTER ADDSLOTS <slot> [slot] ...”。

“CLUSTER”命令的处理函数是clusterCommand。在该函数中，处理” CLUSTER ADDSLOTS”部分的代码是：

else if ((!strcasecmp(c->argv[1]->ptr,"addslots") ||
               !strcasecmp(c->argv[1]->ptr,"delslots")) && c->argc >= 3)
    {
        /* CLUSTER ADDSLOTS <slot> [slot] ... */
        /* CLUSTER DELSLOTS <slot> [slot] ... */
        int j, slot;
        unsigned char *slots = zmalloc(REDIS_CLUSTER_SLOTS);
        int del = !strcasecmp(c->argv[1]->ptr,"delslots");

        memset(slots,0,REDIS_CLUSTER_SLOTS);
        /* Check that all the arguments are parseable and that all the
         * slots are not already busy. */
        for (j = 2; j < c->argc; j++) {
            if ((slot = getSlotOrReply(c,c->argv[j])) == -1) {
                zfree(slots);
                return;
            }
            if (del && server.cluster->slots[slot] == NULL) {
                addReplyErrorFormat(c,"Slot %d is already unassigned", slot);
                zfree(slots);
                return;
            } else if (!del && server.cluster->slots[slot]) {
                addReplyErrorFormat(c,"Slot %d is already busy", slot);
                zfree(slots);
                return;
            }
            if (slots[slot]++ == 1) {
                addReplyErrorFormat(c,"Slot %d specified multiple times",
                    (int)slot);
                zfree(slots);
                return;
            }
        }
        for (j = 0; j < REDIS_CLUSTER_SLOTS; j++) {
            if (slots[j]) {
                int retval;

                /* If this slot was set as importing we can clear this
                 * state as now we are the real owner of the slot. */
                if (server.cluster->importing_slots_from[j])
                    server.cluster->importing_slots_from[j] = NULL;

                retval = del ? clusterDelSlot(j) :
                               clusterAddSlot(myself,j);
                redisAssertWithInfo(c,NULL,retval == REDIS_OK);
            }
        }
        zfree(slots);
        clusterDoBeforeSleep(CLUSTER_TODO_UPDATE_STATE|CLUSTER_TODO_SAVE_CONFIG);
        addReply(c,shared.ok);
    }

这里” CLUSTER ADDSLOTS”和” CLUSTER DELSLOTS”命令，采用类似的代码进行处理。ADDSLOTS和DELSLOTS，分别用于将槽位分配给节点，以及将槽位从节点中删除。ADDSLOTS命令常用于新建集群时，给每个主节点分配槽位；DELSLOTS常用于手动修改集群配置，或者用于DEBUG操作，实际中很少用到。

在代码中，首先，依次检查命令参数中的槽位号：如果是DELSLOTS操作，但是数组server.cluster->slots中，记录负责该槽位号的节点为NULL，则反馈给客户端"unassigned"错误；如果是ADDSLOTS操作，但是数组server.cluster->slots中，记录已经有节点负责该槽位号了，则反馈给客户端"busy"错误；然后将参数中的槽位号记录到数组slots中，如果slots中该槽位已经设置过了，说明发来的命令中，该槽位号出现了多次，因此反馈给客户端"multiple times"错误；

然后，依次轮训slots中记录的每一个槽位号进行处理：首先如果该槽位号在数组server.cluster->importing_slots_from中不为NULL，则将其置为NULL，因为该槽位已经由本节点负责了；然后根据是ADDSLOTS，还是DELSLOTS操作，调用clusterAddSlot或clusterDelSlot处理；

最后，反馈给客户端"OK";

因此，clusterAddSlot才是是实际用于分配槽位的函数，该函数的实现如下：

int clusterAddSlot(clusterNode *n, int slot) {
    if (server.cluster->slots[slot]) return REDIS_ERR;
    clusterNodeSetSlotBit(n,slot);
    server.cluster->slots[slot] = n;
    return REDIS_OK;
}

该函数的实现很简单，就是要设置位数组n->slots中的相应位，以及server.cluster->slots[slot]。

首先，根据server.cluster->slots[slot]的值，判断该槽位是否已经分配给其他节点了，若是，则直接返回REDIS_ERR；

然后调用clusterNodeSetSlotBit，在位数组n->slots中设置相应的位；

最后，将server.cluster->slots[slot]置为n；

以上，就相当于把slot槽位分配给了节点n。

顺便看一下删除槽位的函数clusterDelSlot的实现：

int clusterDelSlot(int slot) {
    clusterNode *n = server.cluster->slots[slot];

    if (!n) return REDIS_ERR;
    redisAssert(clusterNodeClearSlotBit(n,slot) == 1);
    server.cluster->slots[slot] = NULL;
    return REDIS_OK;
}

该函数清除slot槽位的信息，将其置为未分配的。成功返回REDIS_OK；否则若该槽位已经被置为未分配的了，则返回REDIS_ERR；

该函数的实现很简单，就是清除位数组n->slots中的相应位，以及将server.cluster->slots[slot]置为NULL。

首先从server.cluster->slots[slot]取得当前负责该槽位的节点n；如果n为NULL，则返回REDIS_ERR；

然后调用clusterNodeClearSlotBit，将该槽位从位数组n->slots中清除；

最后置server.cluster->slots[slot]为NULL；

以上，就相当于把slot槽位置为未分配状态了。

集群节点在发送心跳包时，会附带自己当前记录的槽位信息（clusterNode结构中的位数组slots），这样，最终集群中的每个节点都会知道所有槽位的分配情况。

三：槽位迁移(重新分片)

在集群稳定一段时间之后，如果有新的集群节点加入，或者某个集群节点下线了。此时就涉及到将某个节点上的槽位迁移到另一个节点上的问题。

该过程也是需要手动完成的，Redis提供了辅助脚本redis-trib.rb，以”reshard”参数调用该脚本就可以实现重新分片的操作。但是本质上，该脚本就是通过向迁入节点和迁出节点发送一些命令实现的。

槽位迁移的步骤是：

1：向迁入节点发送” CLUSTER SETSLOT <slot> IMPORTING <node>”命令

其中<slot>是要迁入的槽位号，<node>是当前负责该槽位的节点。在函数clusterCommand中，处理该命令的代码如下：

    else if (!strcasecmp(c->argv[1]->ptr,"setslot") && c->argc >= 4) {
        /* SETSLOT 10 MIGRATING <node ID> */
        /* SETSLOT 10 IMPORTING <node ID> */
        /* SETSLOT 10 STABLE */
        /* SETSLOT 10 NODE <node ID> */
        int slot;
        clusterNode *n;

        if ((slot = getSlotOrReply(c,c->argv[2])) == -1) return;

        if (!strcasecmp(c->argv[3]->ptr,"migrating") && c->argc == 5) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"importing") && c->argc == 5) {
            if (server.cluster->slots[slot] == myself) {
                addReplyErrorFormat(c,
                    "I'm already the owner of hash slot %u",slot);
                return;
            }
            if ((n = clusterLookupNode(c->argv[4]->ptr)) == NULL) {
                addReplyErrorFormat(c,"I don't know about node %s",
                    (char*)c->argv[3]->ptr);
                return;
            }
            server.cluster->importing_slots_from[slot] = n;
        } else if (!strcasecmp(c->argv[3]->ptr,"stable") && c->argc == 4) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"node") && c->argc == 5) {
            ...
        } else {
            addReplyError(c,
                "Invalid CLUSTER SETSLOT action or number of arguments");
            return;
        }
        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
        addReply(c,shared.ok);
    }

针对"CLUSTER SETSLOT"命令，首先从命令参数中取得槽位号slot，如果解析错误，则回复给客户端错误信息，然后直接返回；

如果收到的是" CLUSTER SETSLOT <SLOT> IMPORTING <node>"命令，说明本节点需要迁入槽位。

因此，首先判断server.cluster->slots[slot]是否等于myself，若是，说明slot槽位已由本节点负责，因此回复客户端错误信息后直接返回；然后根据参数<node>在字典server.cluster->nodes中查询迁入槽位的源节点n，若找不到，则回复客户端错误信息后返回；最后，置server.cluster->importing_slots_from[slot]为n；

2：向迁出节点发送” CLUSTER SETSLOT <slot> MIGRATING <node>”命令

其中<slot>是要迁出的槽位号，<node>是迁出槽位的目的地节点。在函数clusterCommand中，处理该命令的代码如下：

    else if (!strcasecmp(c->argv[1]->ptr,"setslot") && c->argc >= 4) {
        /* SETSLOT 10 MIGRATING <node ID> */
        /* SETSLOT 10 IMPORTING <node ID> */
        /* SETSLOT 10 STABLE */
        /* SETSLOT 10 NODE <node ID> */
        int slot;
        clusterNode *n;

        if ((slot = getSlotOrReply(c,c->argv[2])) == -1) return;

        if (!strcasecmp(c->argv[3]->ptr,"migrating") && c->argc == 5) {
            if (server.cluster->slots[slot] != myself) {
                addReplyErrorFormat(c,"I'm not the owner of hash slot %u",slot);
                return;
            }
            if ((n = clusterLookupNode(c->argv[4]->ptr)) == NULL) {
                addReplyErrorFormat(c,"I don't know about node %s",
                    (char*)c->argv[4]->ptr);
                return;
            }
            server.cluster->migrating_slots_to[slot] = n;
        } else if (!strcasecmp(c->argv[3]->ptr,"importing") && c->argc == 5) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"stable") && c->argc == 4) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"node") && c->argc == 5) {
            ...
        } else {
            addReplyError(c,
                "Invalid CLUSTER SETSLOT action or number of arguments");
            return;
        }
        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
        addReply(c,shared.ok);
    }

如果收到的是"CLUSTER SETSLOT <SLOT> MIGRATING <node>"命令，说明本节点需要迁出槽位。

因此，首先判断server.cluster->slots[slot]是否等于myself，若不是，说明slot槽位不由本节点负责，因此回复客户端错误信息后直接返回；然后根据参数<node>在字典server.cluster->nodes中查询迁出的目的地节点n，若找不到，则回复客户端错误信息后返回；最后，置server.cluster->migrating_slots_to[slot]为n；

3：向迁出节点发送”CLUSTER GETKEYSINSLOT <slot> <count>”命令

该命令主要用于获得迁出槽位<slot>中的<count>个key，以便下一步能够执行key的迁移操作。该命令以及下一步的key迁移操作需要执行多次，直到槽位<slot>中没有剩余的key为止。

这里就需要用到之前介绍过的，clusterState结构体中的slots_to_keys跳跃表，该跳跃表中，以槽位号为分数进行排序。每个跳跃表节点保存了槽位号(分数)，以及该槽位上的某个key。通过该跳跃表，可以快速得到当前节点所负责的每一个槽位中，都有哪些key。

每当向数据库中添加或删除key时，同时也会向该跳跃表中添加和删除节点：当调用dbAdd函数向数据库添加key时，在dbAdd中，判断如果当前处于集群模式下，就会调用slotToKeyAdd函数，向slots_to_keys跳跃表中添加节点。slotToKeyAdd函数的代码如下：

void slotToKeyAdd(robj *key) {
    unsigned int hashslot = keyHashSlot(key->ptr,sdslen(key->ptr));

    zslInsert(server.cluster->slots_to_keys,hashslot,key);
    incrRefCount(key);
}

该函数很简单，首先计算该key对应的槽位号hashslot；然后以槽位号hashslot为分数，将hashslot和key插入到跳跃表server.cluster->slots_to_keys中；

当调用dbDelete函数从数据库删除key时，在dbDelete中，判断如果当前处于集群模式下，就会调用slotToKeyDel函数，从slots_to_keys跳跃表中删除节点。slotToKeyDel函数的代码如下：

void slotToKeyDel(robj *key) {
    unsigned int hashslot = keyHashSlot(key->ptr,sdslen(key->ptr));
    zslDelete(server.cluster->slots_to_keys,hashslot,key);
}

该函数很简单，首先计算该key对应的槽位号hashslot；然后将该key，及其对应的槽位号，从跳跃表server.cluster->slots_to_keys中删除。

回到”CLUSTER GETKEYSINSLOT“命令，在函数clusterCommand中，处理该命令的代码如下：

    else if (!strcasecmp(c->argv[1]->ptr,"getkeysinslot") && c->argc == 4) {
        /* CLUSTER GETKEYSINSLOT <slot> <count> */
        long long maxkeys, slot;
        unsigned int numkeys, j;
        robj **keys;

        if (getLongLongFromObjectOrReply(c,c->argv[2],&slot,NULL) != REDIS_OK)
            return;
        if (getLongLongFromObjectOrReply(c,c->argv[3],&maxkeys,NULL)
            != REDIS_OK)
            return;
        if (slot < 0 || slot >= REDIS_CLUSTER_SLOTS || maxkeys < 0) {
            addReplyError(c,"Invalid slot or number of keys");
            return;
        }

        keys = zmalloc(sizeof(robj*)*maxkeys);
        numkeys = getKeysInSlot(slot, keys, maxkeys);
        addReplyMultiBulkLen(c,numkeys);
        for (j = 0; j < numkeys; j++) addReplyBulk(c,keys[j]);
        zfree(keys);
    }

首先从命令参数中解析出槽位号slot，以及要获取的key的数量maxkeys。如果解析失败，或者得到的值不正常，则回复客户端错误信息后直接返回；

然后调用getKeysInSlot，从跳跃表server.cluster->slots_to_keys中取出slot槽位中最多maxkeys个key，取出的key存入数组keys中；getKeysInSlot函数返回实际取得的key的数量；

最后，将取得的所有key及数量回复给客户端；

getKeysInSlot函数的代码如下：

unsigned int getKeysInSlot(unsigned int hashslot, robj **keys, unsigned int count) {
    zskiplistNode *n;
    zrangespec range;
    int j = 0;

    range.min = range.max = hashslot;
    range.minex = range.maxex = 0;

    n = zslFirstInRange(server.cluster->slots_to_keys, &range);
    while(n && n->score == hashslot && count--) {
        keys[j++] = n->obj;
        n = n->level[0].forward;
    }
    return j;
}

根据槽位号，得到要查找的范围是[hashslot,hashslot]，首先调用zslFirstInRange，在跳跃表中找到第一个处于该范围的节点；然后依次轮训该节点及其在level0上的后继节点，只要节点的分数为hashslot，就将该节点的key填充到keys中；

最后返回实际获取的key的个数。

4：向迁出节点发送”MIGRATE <target_host> <target_port> <key> <target_database> <timeout>”命令

针对上一步得到的每一个key，向迁出节点发送该命令，用于将<key>迁出到目标节点的<target_database>数据库中，迁出过程的超时时间为<timeout>，一旦超时，则回复客户端错误信息。

该命令不仅可以用于集群节点间的key迁移，还能用于普通节点间的key迁移。如果是在集群模式下，则<target_database>固定为0。

该命令是原子性的将key从A迁移到B，迁移过程中，节点A和节点B都会阻塞（很小的时间），从而避免了竞争的发生。

4.1、缓存连接

因为一般情况下，是需要将多个key从A迁移到B中，为了避免A和B之间需要多次TCP建链，这里采用了缓存连接的实现方法。具体而言，当迁移第一个key时，节点A向节点B建链，并将该TCP链接缓存起来，一定时间内，当需要迁移下一个key时，可以直接使用缓存的链接，而无需重复建链。缓存的链接如果长时间不用，则会自动释放。

源码中使用migrateCachedSocket结构体表示缓存的TCP连接，该结构体的定义如下：

typedef struct migrateCachedSocket {
    int fd;
    long last_dbid;
    time_t last_use_time;
} migrateCachedSocket;

该结构中保存了socket描述符fd，上一次使用的目的节点的数据库ID，以及该链接上一次被使用的时间。

migrateGetSocket就是用于建链并缓存的函数，该函数的代码如下：

migrateCachedSocket* migrateGetSocket(redisClient *c, robj *host, robj *port, long timeout) {
    int fd;
    sds name = sdsempty();
    migrateCachedSocket *cs;

    /* Check if we have an already cached socket for this ip:port pair. */
    name = sdscatlen(name,host->ptr,sdslen(host->ptr));
    name = sdscatlen(name,":",1);
    name = sdscatlen(name,port->ptr,sdslen(port->ptr));
    cs = dictFetchValue(server.migrate_cached_sockets,name);
    if (cs) {
        sdsfree(name);
        cs->last_use_time = server.unixtime;
        return cs;
    }

    /* No cached socket, create one. */
    if (dictSize(server.migrate_cached_sockets) == MIGRATE_SOCKET_CACHE_ITEMS) {
        /* Too many items, drop one at random. */
        dictEntry *de = dictGetRandomKey(server.migrate_cached_sockets);
        cs = dictGetVal(de);
        close(cs->fd);
        zfree(cs);
        dictDelete(server.migrate_cached_sockets,dictGetKey(de));
    }

    /* Create the socket */
    fd = anetTcpNonBlockConnect(server.neterr,c->argv[1]->ptr,
                                atoi(c->argv[2]->ptr));
    if (fd == -1) {
        sdsfree(name);
        addReplyErrorFormat(c,"Can't connect to target node: %s",
            server.neterr);
        return NULL;
    }
    anetEnableTcpNoDelay(server.neterr,fd);

    /* Check if it connects within the specified timeout. */
    if ((aeWait(fd,AE_WRITABLE,timeout) & AE_WRITABLE) == 0) {
        sdsfree(name);
        addReplySds(c,
            sdsnew("-IOERR error or timeout connecting to the client\r\n"));
        close(fd);
        return NULL;
    }

    /* Add to the cache and return it to the caller. */
    cs = zmalloc(sizeof(*cs));
    cs->fd = fd;
    cs->last_dbid = -1;
    cs->last_use_time = server.unixtime;
    dictAdd(server.migrate_cached_sockets,name,cs);
    return cs;
}

字典server.migrate_cached_sockets表示一个缓存连接池，该字典以目的节点的"<ip>:<port>"为key，以migrateCachedSocket结构为value。该字典中就保存了当前节点所有已经建链的TCP连接；

函数中，首先根据参数host和port，组成key，使用该key查询字典server.migrate_cached_sockets中是否已经缓存了到该地址的连接cs，若找到了缓存的cs，则更新cs->last_use_time为当前时间，然后直接返回cs即可；

若找不到相应的连接cs，则判断字典当前的大小是否已经达到了阈值64，若是，则从字典中随机选择一个字典项de，取出其中的连接cs，关闭cs->fd，释放cs结构，并将de从字典中删除；

接下来，调用anetTcpNonBlockConnect，根据地址信息，向远端Redis发起TCP建链，如果anetTcpNonBlockConnect返回-1，则回复给客户端错误信息后，直接返回NULL；

然后设置socket描述符fd的NO_DELAY选项；然后调用aeWait，等待可写事件的触发，等待时间为timeout，如果在该时间段内没有触发可写事件，则建链超时，因此回复给客户端错误信息，关闭socket描述符，返回NULL；否则，表示建链成功（实际上并没有检查建链是否真的成功，若建链失败，后续调用者在写消息时会发生错误，从而释放连接）；

接下来，构建一个migrateCachedSocket结构的cs，保存socket描述符，置其中的last_dbid为-1，置last_use_time属性为当前时间；然后将cs插入到字典server.migrate_cached_sockets中。

当某个连接长时间不用时，需要断开连接，删除缓存的migrateCachedSocket结构。这是通过migrateCloseTimedoutSockets函数实现的。该函数每隔1秒在定时器函数serverCron中调用一次。该函数的代码如下：

void migrateCloseTimedoutSockets(void) {
    dictIterator *di = dictGetSafeIterator(server.migrate_cached_sockets);
    dictEntry *de;

    while((de = dictNext(di)) != NULL) {
        migrateCachedSocket *cs = dictGetVal(de);

        if ((server.unixtime - cs->last_use_time) > MIGRATE_SOCKET_CACHE_TTL) {
            close(cs->fd);
            zfree(cs);
            dictDelete(server.migrate_cached_sockets,dictGetKey(de));
        }
    }
    dictReleaseIterator(di);
}

轮训字典server.migrate_cached_sockets，针对其中的每一个migrateCachedSocket结构的cs，如果该cs的最后一次使用时间，距离当前时间已经超过10s，则关闭socket描述符，释放cs结构，并将其从字典中删除。

4.2、MIGRATE命令

MIGRATE命令的格式是：”MIGRATE <target_host> <target_port> <key> <target_database> <timeout> [COPY |REPLACE]"，如果最后一个参数是REPLACE，则发送成功之后，还要在当前实例中删除该key；如果是COPY，则无需删除key；默认参数就是REPLACE。

MIGRATE命令的处理函数是migrateCommand，该函数的代码如下：

void migrateCommand(redisClient *c) {
    migrateCachedSocket *cs;
    int copy, replace, j;
    long timeout;
    long dbid;
    long long ttl, expireat;
    robj *o;
    rio cmd, payload;
    int retry_num = 0;

try_again:
    /* Initialization */
    copy = 0;
    replace = 0;
    ttl = 0;

    /* Parse additional options */
    for (j = 6; j < c->argc; j++) {
        if (!strcasecmp(c->argv[j]->ptr,"copy")) {
            copy = 1;
        } else if (!strcasecmp(c->argv[j]->ptr,"replace")) {
            replace = 1;
        } else {
            addReply(c,shared.syntaxerr);
            return;
        }
    }

    /* Sanity check */
    if (getLongFromObjectOrReply(c,c->argv[5],&timeout,NULL) != REDIS_OK)
        return;
    if (getLongFromObjectOrReply(c,c->argv[4],&dbid,NULL) != REDIS_OK)
        return;
    if (timeout <= 0) timeout = 1000;

    /* Check if the key is here. If not we reply with success as there is
     * nothing to migrate (for instance the key expired in the meantime), but
     * we include such information in the reply string. */
    if ((o = lookupKeyRead(c->db,c->argv[3])) == NULL) {
        addReplySds(c,sdsnew("+NOKEY\r\n"));
        return;
    }

    /* Connect */
    cs = migrateGetSocket(c,c->argv[1],c->argv[2],timeout);
    if (cs == NULL) return; /* error sent to the client by migrateGetSocket() */

    rioInitWithBuffer(&cmd,sdsempty());

    /* Send the SELECT command if the current DB is not already selected. */
    int select = cs->last_dbid != dbid; /* Should we emit SELECT? */
    if (select) {
        redisAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',2));
        redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"SELECT",6));
        redisAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,dbid));
    }

    /* Create RESTORE payload and generate the protocol to call the command. */
    expireat = getExpire(c->db,c->argv[3]);
    if (expireat != -1) {
        ttl = expireat-mstime();
        if (ttl < 1) ttl = 1;
    }
    redisAssertWithInfo(c,NULL,rioWriteBulkCount(&cmd,'*',replace ? 5 : 4));
    if (server.cluster_enabled)
        redisAssertWithInfo(c,NULL,
            rioWriteBulkString(&cmd,"RESTORE-ASKING",14));
    else
        redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"RESTORE",7));
    redisAssertWithInfo(c,NULL,sdsEncodedObject(c->argv[3]));
    redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,c->argv[3]->ptr,
            sdslen(c->argv[3]->ptr)));
    redisAssertWithInfo(c,NULL,rioWriteBulkLongLong(&cmd,ttl));

    /* Emit the payload argument, that is the serialized object using
     * the DUMP format. */
    createDumpPayload(&payload,o);
    redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,payload.io.buffer.ptr,
                                sdslen(payload.io.buffer.ptr)));
    sdsfree(payload.io.buffer.ptr);

    /* Add the REPLACE option to the RESTORE command if it was specified
     * as a MIGRATE option. */
    if (replace)
        redisAssertWithInfo(c,NULL,rioWriteBulkString(&cmd,"REPLACE",7));

    /* Transfer the query to the other node in 64K chunks. */
    errno = 0;
    {
        sds buf = cmd.io.buffer.ptr;
        size_t pos = 0, towrite;
        int nwritten = 0;

        while ((towrite = sdslen(buf)-pos) > 0) {
            towrite = (towrite > (64*1024) ? (64*1024) : towrite);
            nwritten = syncWrite(cs->fd,buf+pos,towrite,timeout);
            if (nwritten != (signed)towrite) goto socket_wr_err;
            pos += nwritten;
        }
    }

    /* Read back the reply. */
    {
        char buf1[1024];
        char buf2[1024];

        /* Read the two replies */
        if (select && syncReadLine(cs->fd, buf1, sizeof(buf1), timeout) <= 0)
            goto socket_rd_err;
        if (syncReadLine(cs->fd, buf2, sizeof(buf2), timeout) <= 0)
            goto socket_rd_err;
        if ((select && buf1[0] == '-') || buf2[0] == '-') {
            /* On error assume that last_dbid is no longer valid. */
            cs->last_dbid = -1;
            addReplyErrorFormat(c,"Target instance replied with error: %s",
                (select && buf1[0] == '-') ? buf1+1 : buf2+1);
        } else {
            /* Update the last_dbid in migrateCachedSocket */
            cs->last_dbid = dbid;
            robj *aux;

            addReply(c,shared.ok);

            if (!copy) {
                /* No COPY option: remove the local key, signal the change. */
                dbDelete(c->db,c->argv[3]);
                signalModifiedKey(c->db,c->argv[3]);
                server.dirty++;

                /* Translate MIGRATE as DEL for replication/AOF. */
                aux = createStringObject("DEL",3);
                rewriteClientCommandVector(c,2,aux,c->argv[3]);
                decrRefCount(aux);
            }
        }
    }

    sdsfree(cmd.io.buffer.ptr);
    return;

socket_wr_err:
    sdsfree(cmd.io.buffer.ptr);
    migrateCloseSocket(c->argv[1],c->argv[2]);
    if (errno != ETIMEDOUT && retry_num++ == 0) goto try_again;
    addReplySds(c,
        sdsnew("-IOERR error or timeout writing to target instance\r\n"));
    return;

socket_rd_err:
    sdsfree(cmd.io.buffer.ptr);
    migrateCloseSocket(c->argv[1],c->argv[2]);
    if (errno != ETIMEDOUT && retry_num++ == 0) goto try_again;
    addReplySds(c,
        sdsnew("-IOERR error or timeout reading from target node\r\n"));
    return;
}

首先检查最后一个命令参数，如果该参数既不是COPY，也不是REPLACE，则直接回复给客户端语法错误信息；然后从命令中解析出timeout和dbid，若解析错误，则直接回复给客户端错误信息。如果解析得到的timeout小于等于0，则将其置为1000，也就是1秒；

然后从客户端当前连接的数据库中，查找key，得到其值对象o。如果找不到key，则回复给客户端"+NOKEY"，这不算是错误，因为可能该key刚好超时被删除了；

接下来，根据参数中的host和port，调用migrateGetSocket函数，得到与远端Redis的连接。如果之前已经与该Redis建链了，则该函数会返回之前缓存的连接，否则，直接向该Redis发起TCP同步建链，建链超时时间为timeout。如果建链失败，则在migrateGetSocket中回复给客户端错误信息后，直接返回；

接下来，开始构建要发送给远端Redis的RESTORE命令：首先初始化rio结构的cmd，该结构中记录要发送的命令；如果命令参数中的dbid，与上次迁移时的dbid不同，则需要首先向cmd中填充"SELECT <dbid>"命令；然后取得该key的超时时间expireat，将其转换为相对时间ttl；如果当前处于集群模式下，则向cmd中填充"RESTORE-ASKING"命令，否则填充"RESTORE"命令；然后向cmd中填充key，以及ttl；然后调用createDumpPayload函数，将值对象o，按照DUMP的格式填充到payload中，然后再将payload填充到cmd中；如果最后一个命令参数是REPLACE，则还需要填充"REPLACE"到cmd中；

接下来，开始向远端Redis发送命令：循环调用syncWrite函数，向远端Redis同步发送cmd中的内容，每次最多发送64k个字节；

发送完成后，开始读取远端Redis的回复：如果之前发送了"SELECT"命令，则首先读取"SELECT"命令的回复到buf1中；然后读取"RESTORE"命令的回复到buf2中。读取超时时间为timeout；

如果buf1或buf2首字符为'-'，说明远端Redis回复了错误信息，则先设置cs->last_dbid为-1，这样下次迁移时会强制发送"SELECT"命令，然后回复错误信息给客户端；否则，说明迁移成功了，先设置cs->last_dbid为dbid，然后回复客户端"OK"信息。

如果客户端命令最后一个参数不是"COPY"，则先将该key从数据库中删除，然后调用rewriteClientCommandVector函数，将当前客户端的命令修改为"DEL <key>"，这样接下来在propagate函数中，会将该DEL命令传递给AOF文件或从节点；

如果写命令或者读回复发生错误，则调用migrateCloseSocket关闭与远端Redis的连接，如果不是超时错误的话，则重试一次，否则回复给客户端相应的错误信息；

注意：为了避免出现竞争条件（同一个key出现在两个节点中），在本函数中，涉及到向其他Redis服务器建链、发送命令和等待回复的过程，以上过程都是同步的，因此如果网络异常，并且超时时间又设置的比较大，则该函数有可能会阻塞Redis对于其他事件的处理，导致其他客户端无法操作当前Redis服务器（亲测）!!!

4.3、RESTORE-ASKING（或RESTORE）命令

key迁移的目的节点收到源节点发来的RESTORE-ASKING或RESTORE命令后，将命令中的key和value保存到本地数据库中。命令格式是："RESTORE <key> <ttl> <serialized-value> [REPLACE]"或"RESTORE-ASKING <key> <ttl> <serialized-value> [REPLACE]"

这两个命令的区别是：RESTORE-ASKING命令用于集群节点间的key迁移，RESTORE命令用于普通节点间的key迁移。RESTORE-ASKING命令对应的redisCommand结构标志位中带有'k'标记，这样在键迁移时，就不会返回ASK重定向错误；

这两个命令都通过调用restoreCommand函数处理。该函数的代码如下：

void restoreCommand(redisClient *c) {
    long long ttl;
    rio payload;
    int j, type, replace = 0;
    robj *obj;

    /* Parse additional options */
    for (j = 4; j < c->argc; j++) {
        if (!strcasecmp(c->argv[j]->ptr,"replace")) {
            replace = 1;
        } else {
            addReply(c,shared.syntaxerr);
            return;
        }
    }

    /* Make sure this key does not already exist here... */
    if (!replace && lookupKeyWrite(c->db,c->argv[1]) != NULL) {
        addReply(c,shared.busykeyerr);
        return;
    }

    /* Check if the TTL value makes sense */
    if (getLongLongFromObjectOrReply(c,c->argv[2],&ttl,NULL) != REDIS_OK) {
        return;
    } else if (ttl < 0) {
        addReplyError(c,"Invalid TTL value, must be >= 0");
        return;
    }

    /* Verify RDB version and data checksum. */
    if (verifyDumpPayload(c->argv[3]->ptr,sdslen(c->argv[3]->ptr)) == REDIS_ERR)
    {
        addReplyError(c,"DUMP payload version or checksum are wrong");
        return;
    }

    rioInitWithBuffer(&payload,c->argv[3]->ptr);
    if (((type = rdbLoadObjectType(&payload)) == -1) ||
        ((obj = rdbLoadObject(type,&payload)) == NULL))
    {
        addReplyError(c,"Bad data format");
        return;
    }

    /* Remove the old key if needed. */
    if (replace) dbDelete(c->db,c->argv[1]);

    /* Create the key and set the TTL if any */
    dbAdd(c->db,c->argv[1],obj);
    if (ttl) setExpire(c->db,c->argv[1],mstime()+ttl);
    signalModifiedKey(c->db,c->argv[1]);
    addReply(c,shared.ok);
    server.dirty++;
}

首先，解析命令中第四个参数是否为"REPLACE"，若是则置replace为1，否则，直接回复客户端语法错误信息；

如果replace为1，则从数据库中查找相应的key，如果查不到，则直接回复客户端错误信息；

然后从命令中解析ttl参数，如果解析错误，或者解析出的ttl小于0，则直接回复客户端错误信息；

然后调用verifyDumpPayload函数，验证远端Redis发来的命令参数中，DUMP格式的值对象参数中的验证码是否正确，验证失败则回复客户端错误信息；

接下来，从命令参数中解析出值对象的类型和值对象本身，将值对象保存在obj中，如果解析错误，则回复客户端错误信息；

如果replace为1，则将该key从数据库中删除；然后将key和obj添加到数据库中；

如果ttl不为0，则设置该key的超时时间；最后，回复客户端"OK"信息；

以上，就完成了一个key的迁移过程。

5：向所有节点发送”CLUSTER SETSLOT <slot> NODE <nodeid>”命令

当槽位中的所有key都迁移完成之后，需要向集群中所有节点，包括迁移的源节点以及目的节点，发送”CLUSTER SETSLOT <slot> NODE <nodeid>”命令，以便通知所有节点，更新槽位<slot> 新的负责节点为<nodeid>。

在函数clusterCommand中，处理该命令的代码如下：

    else if (!strcasecmp(c->argv[1]->ptr,"setslot") && c->argc >= 4) {
        /* SETSLOT 10 MIGRATING <node ID> */
        /* SETSLOT 10 IMPORTING <node ID> */
        /* SETSLOT 10 STABLE */
        /* SETSLOT 10 NODE <node ID> */
        int slot;
        clusterNode *n;

        if ((slot = getSlotOrReply(c,c->argv[2])) == -1) return;

        if (!strcasecmp(c->argv[3]->ptr,"migrating") && c->argc == 5) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"importing") && c->argc == 5) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"stable") && c->argc == 4) {
            ...
        } else if (!strcasecmp(c->argv[3]->ptr,"node") && c->argc == 5) {
            /* CLUSTER SETSLOT <SLOT> NODE <NODE ID> */
            clusterNode *n = clusterLookupNode(c->argv[4]->ptr);

            if (!n) {
                addReplyErrorFormat(c,"Unknown node %s",
                    (char*)c->argv[4]->ptr);
                return;
            }
            /* If this hash slot was served by 'myself' before to switch
             * make sure there are no longer local keys for this hash slot. */
            if (server.cluster->slots[slot] == myself && n != myself) {
                if (countKeysInSlot(slot) != 0) {
                    addReplyErrorFormat(c,
                        "Can't assign hashslot %d to a different node "
                        "while I still hold keys for this hash slot.", slot);
                    return;
                }
            }
            /* If this slot is in migrating status but we have no keys
             * for it assigning the slot to another node will clear
             * the migratig status. */
            if (countKeysInSlot(slot) == 0 &&
                server.cluster->migrating_slots_to[slot])
                server.cluster->migrating_slots_to[slot] = NULL;

            /* If this node was importing this slot, assigning the slot to
             * itself also clears the importing status. */
            if (n == myself &&
                server.cluster->importing_slots_from[slot])
            {
                /* This slot was manually migrated, set this node configEpoch
                 * to a new epoch so that the new version can be propagated
                 * by the cluster.
                 *
                 * Note that if this ever results in a collision with another
                 * node getting the same configEpoch, for example because a
                 * failover happens at the same time we close the slot, the
                 * configEpoch collision resolution will fix it assigning
                 * a different epoch to each node. */
                if (clusterBumpConfigEpochWithoutConsensus() == REDIS_OK) {
                    redisLog(REDIS_WARNING,
                        "configEpoch updated after importing slot %d", slot);
                }
                server.cluster->importing_slots_from[slot] = NULL;
            }
            clusterDelSlot(slot);
            clusterAddSlot(n,slot);
        } else {
            addReplyError(c,
                "Invalid CLUSTER SETSLOT action or number of arguments");
            return;
        }
        clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG|CLUSTER_TODO_UPDATE_STATE);
        addReply(c,shared.ok);
    }

如果收到的是"CLUSTER SETSLOT <SLOT> NODE <nodeID>"命令，说明需要更新负责相应槽位的节点。

首先根据参数<node ID>在字典server.cluster->nodes中查询新的负责该槽位的节点n，若找不到，则回复客户端错误信息后返回；

如果目前负责该槽位的节点为当前节点myself，并且myself不等于n，说明当前节点正在将该槽位迁出到节点n中，调用countKeysInSlot函数计算该槽位中尚存多少个key，如果该函数返回值不为0，说明该槽位中还有未迁出的key，因此回复客户端错误信息后返回；

如果当前节点正在迁出该槽位，并且该槽位中所有的key都已经迁出，则置server.cluster->migrating_slots_to[slot]为NULL；

如果当前节点正在迁入该槽位，并且n就是myself，则首先调用函数clusterBumpConfigEpochWithoutConsensus增加纪元configEpoch的值，然后置server.cluster->importing_slots_from[slot]为NULL；

最后，调用clusterDelSlot清空该slot相关的信息，然后调用clusterAddSlot，将该槽位的负责人改为节点n；

至此，就完成了一次槽位迁移(重新分片)流程。

四：集群节点执行命令

在集群模式下，数据库的key分布在多个集群节点中。因此当某个集群节点收到客户端的命令时，与普通模式下稍有不同。这不同主要体现在：

a：若命令中涉及到多个key，而这些key处于不同的槽位中，则该命令不能被执行，直接返回错误；

b：某个集群节点收到客户端发来的命令后，会判断命令中的key是否由本节点负责，若是，则直接处理命令；若不是，则反馈给客户端MOVED重定向错误，错误中指明了该key真正的负责节点。客户端收到MOVED重定向错误之后，需要重新向真正的负责节点再次发送命令；

c：如果节点A正在迁出槽位，此时收到了客户端的命令，而命令中的key已经迁入到了B节点，则节点A返回给客户端ASK重定向错误，该错误中指明了该key的迁入目的地节点。客户端收到ASK错误之后，需要先向B节点发送”ASKING”命令，然后在向B节点发送该命令。

ASK错误和MOVED错误都会导致客户端转向，它们的区别在于：

a：MOVED错误代表槽位的负责权已经从一个节点转移到了另一个节点：在客户端收到关于槽位i的MOVED错误之后，会更新槽位i及其负责节点的对应关系，这样下次遇到关于槽位i的命令请求时，就可以直接将命令请求发送新的负责节点。

b：ASK错误只是两个节点在迁移槽的过程中使用的一种临时措施：客户端收到关于槽位i的ASK错误之后，客户端只会在接下来的一次命令请求中将关于槽位i的命令请求发送至ASK错误所指示的节点，但这种重定向不会对客户端今后发送关于槽位i的命令请求产生任何影响，客户端之后仍然会将关于槽位i的命令请求发送至目前负责处理该槽位的节点，除非ASK错误再次出现。

在处理客户端命令的函数processCommand中，如果Redis服务器处于集群模式下，在实际执行命令处理函数之前，需要判断当前节点是否能处理该命令中的key，若本节点不能处理该命令，则回复给客户端重定向错误，表示该命令应由其他集群节点处理。

以下情况下，可以无需判断命令，本节点可以直接处理该命令：

a：本节点为从节点，该命令是主节点发来的消息；

b：该命令中不包含key；

c：LUA客户端发来的命令；

processCommand中的这部分代码如下：

    /* If cluster is enabled perform the cluster redirection here.
     * However we don't perform the redirection if:
     * 1) The sender of this command is our master.
     * 2) The command has no key arguments. */
    if (server.cluster_enabled &&
        !(c->flags & REDIS_MASTER) &&
        !(c->flags & REDIS_LUA_CLIENT &&
          server.lua_caller->flags & REDIS_MASTER) &&
        !(c->cmd->getkeys_proc == NULL && c->cmd->firstkey == 0))
    {
        int hashslot;

        if (server.cluster->state != REDIS_CLUSTER_OK) {
            flagTransaction(c);
            clusterRedirectClient(c,NULL,0,REDIS_CLUSTER_REDIR_DOWN_STATE);
            return REDIS_OK;
        } else {
            int error_code;
            clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,&hashslot,&error_code);
            if (n == NULL || n != server.cluster->myself) {
                flagTransaction(c);
                clusterRedirectClient(c,n,hashslot,error_code);
                return REDIS_OK;
            }
        }
    }

判断本节点是否能执行该命令的步骤是：

如果当前集群的状态不是REDIS_CLUSTER_OK，则直接回复给客户端REDIS_CLUSTER_REDIR_DOWN_STATE错误，直接返回；

否则，调用getNodeByQuery函数，查询能够处理该命令的节点n，如果n为NULL，或者n不是当前节点，则直接回复给客户端相应的错误，直接返回；

其他情况，说明本节点可以处理该命令；

getNodeByQuery函数是集群模式下，判断当前节点是否能处理客户端命令的函数，本函数还会查找能够处理客户端命令的节点。该函数的代码如下：

clusterNode *getNodeByQuery(redisClient *c, struct redisCommand *cmd, robj **argv, int argc, int *hashslot, int *error_code) {
    clusterNode *n = NULL;
    robj *firstkey = NULL;
    int multiple_keys = 0;
    multiState *ms, _ms;
    multiCmd mc;
    int i, slot = 0, migrating_slot = 0, importing_slot = 0, missing_keys = 0;

    /* Set error code optimistically for the base case. */
    if (error_code) *error_code = REDIS_CLUSTER_REDIR_NONE;

    /* We handle all the cases as if they were EXEC commands, so we have
     * a common code path for everything */
    if (cmd->proc == execCommand) {
        /* If REDIS_MULTI flag is not set EXEC is just going to return an
         * error. */
        if (!(c->flags & REDIS_MULTI)) return myself;
        ms = &c->mstate;
    } else {
        /* In order to have a single codepath create a fake Multi State
         * structure if the client is not in MULTI/EXEC state, this way
         * we have a single codepath below. */
        ms = &_ms;
        _ms.commands = &mc;
        _ms.count = 1;
        mc.argv = argv;
        mc.argc = argc;
        mc.cmd = cmd;
    }

    /* Check that all the keys are in the same hash slot, and obtain this
     * slot and the node associated. */
    for (i = 0; i < ms->count; i++) {
        struct redisCommand *mcmd;
        robj **margv;
        int margc, *keyindex, numkeys, j;

        mcmd = ms->commands[i].cmd;
        margc = ms->commands[i].argc;
        margv = ms->commands[i].argv;

        keyindex = getKeysFromCommand(mcmd,margv,margc,&numkeys);
        for (j = 0; j < numkeys; j++) {
            robj *thiskey = margv[keyindex[j]];
            int thisslot = keyHashSlot((char*)thiskey->ptr,
                                       sdslen(thiskey->ptr));

            if (firstkey == NULL) {
                /* This is the first key we see. Check what is the slot
                 * and node. */
                firstkey = thiskey;
                slot = thisslot;
                n = server.cluster->slots[slot];

                /* Error: If a slot is not served, we are in "cluster down"
                 * state. However the state is yet to be updated, so this was
                 * not trapped earlier in processCommand(). Report the same
                 * error to the client. */
                if (n == NULL) {
                    getKeysFreeResult(keyindex);
                    if (error_code)
                        *error_code = REDIS_CLUSTER_REDIR_DOWN_UNBOUND;
                    return NULL;
                }

                /* If we are migrating or importing this slot, we need to check
                 * if we have all the keys in the request (the only way we
                 * can safely serve the request, otherwise we return a TRYAGAIN
                 * error). To do so we set the importing/migrating state and
                 * increment a counter for every missing key. */
                if (n == myself &&
                    server.cluster->migrating_slots_to[slot] != NULL)
                {
                    migrating_slot = 1;
                } else if (server.cluster->importing_slots_from[slot] != NULL) {
                    importing_slot = 1;
                }
            } else {
                /* If it is not the first key, make sure it is exactly
                 * the same key as the first we saw. */
                if (!equalStringObjects(firstkey,thiskey)) {
                    if (slot != thisslot) {
                        /* Error: multiple keys from different slots. */
                        getKeysFreeResult(keyindex);
                        if (error_code)
                            *error_code = REDIS_CLUSTER_REDIR_CROSS_SLOT;
                        return NULL;
                    } else {
                        /* Flag this request as one with multiple different
                         * keys. */
                        multiple_keys = 1;
                    }
                }
            }

            /* Migarting / Improrting slot? Count keys we don't have. */
            if ((migrating_slot || importing_slot) &&
                lookupKeyRead(&server.db[0],thiskey) == NULL)
            {
                missing_keys++;
            }
        }
        getKeysFreeResult(keyindex);
    }

    /* No key at all in command? then we can serve the request
     * without redirections or errors. */
    if (n == NULL) return myself;

    /* Return the hashslot by reference. */
    if (hashslot) *hashslot = slot;

    /* This request is about a slot we are migrating into another instance?
     * Then if we have all the keys. */

    /* If we don't have all the keys and we are migrating the slot, send
     * an ASK redirection. */
    if (migrating_slot && missing_keys) {
        if (error_code) *error_code = REDIS_CLUSTER_REDIR_ASK;
        return server.cluster->migrating_slots_to[slot];
    }

    /* If we are receiving the slot, and the client correctly flagged the
     * request as "ASKING", we can serve the request. However if the request
     * involves multiple keys and we don't have them all, the only option is
     * to send a TRYAGAIN error. */
    if (importing_slot &&
        (c->flags & REDIS_ASKING || cmd->flags & REDIS_CMD_ASKING))
    {
        if (multiple_keys && missing_keys) {
            if (error_code) *error_code = REDIS_CLUSTER_REDIR_UNSTABLE;
            return NULL;
        } else {
            return myself;
        }
    }

    /* Handle the read-only client case reading from a slave: if this
     * node is a slave and the request is about an hash slot our master
     * is serving, we can reply without redirection. */
    if (c->flags & REDIS_READONLY &&
        cmd->flags & REDIS_CMD_READONLY &&
        nodeIsSlave(myself) &&
        myself->slaveof == n)
    {
        return myself;
    }

    /* Base case: just return the right node. However if this node is not
     * myself, set error_code to MOVED since we need to issue a rediretion. */
    if (n != myself && error_code) *error_code = REDIS_CLUSTER_REDIR_MOVED;
    return n;
}

参数c、cmd、argv和argc表示客户端及其发来的命令；参数hashslot为出参，返回命令中key所属的槽位号；参数error_code为出参，出错时设置为相应错误码，成功时设置为REDIS_CLUSTER_REDIR_NONE。该函数返回能够处理该命令的节点，若返回NULL，说明该命令目前无法在集群中执行。

需要注意的是，如果当前处于事务模式下，则事务中的所有命令中的所有key，需要一起进行判断。对于非事务模式下的命令，也按照事务的方式进行处理，只不过本事务只包含当前一条命令；

首先，如果命令执行函数为execCommand，则说明当前处于事务模式下，并且本条命令是事务中的最后一条命令"EXEC"。事务模式下，在c->mstate中保存了事务中之前的所有命令，因此将ms指向c->mstate。如果客户端没有设置REDIS_MULTI标志，则直接返回myself，表示当前节点能够处理该命令，但是实际上这种情况下，在命令处理函数execCommand中，会直接反馈给客户端"EXEC without MULTI"错误；

如果命令处理函数不是execCommand，则构造伪事务结构ms，其中只包含当前命令这一条命令；

接下来，针对ms中的每一条命令进行判断：调用getKeysFromCommand函数，从命令中得到所有key的索引，保存在数组keyindex中，以及key的个数numkeys；

接下来就循环处理本条命令中的所有key：

首先调用keyHashSlot函数，计算该key所属的槽位号thisslot；

如果该key是命令中的第一个key，则用firstkey记录该key，用slot记录该key所属的槽位号；然后从server.cluster->slots中取得负责该槽位的节点n，如果n为NULL，则说明该槽位没有节点负责，集群目前处于下线状态，因此设置error_code为REDIS_CLUSTER_REDIR_DOWN_UNBOUND，并且返回NULL；如果节点n就是当前节点，并且当前节点正在迁出该槽位，则设置migrating_slot为1；否则如果当前节点正在迁入该槽位，则设置importing_slot为1；

如果该key不是命令中的第一个key，则只要该key与第一个key内容不同，就比较该key所属的槽位是否与第一个key的槽位一致，若不一致，则设置错误码为REDIS_CLUSTER_REDIR_CROSS_SLOT，并返回NULL；若一致，则置multiple_keys为1；

如果当前节点正在迁入或者迁出该槽位，并且在0号数据库中找不到该key，则增加missing_keys的值；

遍历完所有命令的所有key后，走到现在，能保证所有key都属于同一个槽位slot，该槽位由节点n负责处理。接下来接着进行判断：

如果n为NULL，说明所有命令中都不包含任何key，因此返回myself，表示当前节点可以处理该命令；

将slot保存到出参hashslot中；

如果当前节点正在迁出槽位，并且命令中的key有的已经不再当前节点中了，则设置错误码为REDIS_CLUSTER_REDIR_ASK，并返回该槽位所迁出的目的地节点；

如果当前节点正在迁入槽位，并且客户端具有ASKING标记（客户端之前发来过”ASKING”命令）或者该命令本身就具有ASKING标记（”RESTORE-ASKING”命令），则只有在涉及多个key，并且有的key不在当前节点中的情况下，才设置错误码为REDIS_CLUSTER_REDIR_UNSTABLE，并返回NULL；否则，返回当前节点；

以上两条判断条件，保证了当命令中只有一个key时，写（新增key）命令需直接写入到迁入节点中，读命令需在具有key的节点中读取；当涉及多个key时，写（新增key）命令既无法在迁出节点中执行，也无法在迁入节点中执行，读命令需在具有所有key的节点中读取；（亲测）

如果当前节点正好为n节点的从节点，而且客户端是只读客户端，并且该命令是只读命令，则返回当前节点；

其他情况下，如果当前节点不是n节点，则设置错误码为REDIS_CLUSTER_REDIR_MOVED，并返回节点n。

posted @ 2016-06-25 11:46 gqtc 阅读(1674) 评论(0) 编辑收藏举报

刷新页面返回顶部

程序员的自我修养

Redis源码解析：26集群(二)键的分配与迁移

公告