Redis SWAPDB 命令背后做了什么

0x00 摘要

新使用一个功能之前必须慎重。除了进行大量测试以外，如果有条件，可以读取相关代码看看其内部执行原理。

本文我们就通过源码来看看 Redis SwapDB 命令是否靠谱。

0x01 SWAPDB 基础

1.1 命令说明

可用版本：>=4.0.0

该命令可以交换同一Redis服务器上的两个 DATABASE，可以实现连接某一数据库的连接立即访问到其他DATABASE的数据。

swapdb执行之后，用户连接db无需再执行select操作，即可看到新的数据。

1.2 演示

redis> set mystring 0 # 先在 db 0 设置为 0
OK
redis> select 1 # 然后切换到 db 1
OK
redis[1]> set mystring 1 # 设置为 1
OK
redis[1]> swapdb 0 1     # 交换db0和db1的数据
OK
redis[1]> get mystring   # db1的连接里获取  原db0  的数据
"0"

下面我们看看源码，Redis 究竟在背后做了什么，这个功能对我们日常业务是否有影响。

0x02 预先校验

SWAPDB 入口函数为 swapdbCommand。

可以看出来，swapdbCommand 预先做了一些检验。

如果是 cluster mode，则不允许切换；
获取两个DB idnexes，如果出错，就不切换；

然后才开始调用 dbSwapDatabases 进行切换；

/* SWAPDB db1 db2 */
void swapdbCommand(client *c) {
    long id1, id2;

    /* Not allowed in cluster mode: we have just DB 0 there. */
    if (server.cluster_enabled) {
        addReplyError(c,"SWAPDB is not allowed in cluster mode");
        return;
    }

    /* Get the two DBs indexes. */
    if (getLongFromObjectOrReply(c, c->argv[1], &id1,
        "invalid first DB index") != C_OK)
        return;

    if (getLongFromObjectOrReply(c, c->argv[2], &id2,
        "invalid second DB index") != C_OK)
        return;

    /* Swap... */
    if (dbSwapDatabases(id1,id2) == C_ERR) {
        addReplyError(c,"DB index is out of range");
        return;
    } else {
        RedisModuleSwapDbInfo si = {REDISMODULE_SWAPDBINFO_VERSION,id1,id2};
        moduleFireServerEvent(REDISMODULE_EVENT_SWAPDB,0,&si);
        server.dirty++;
        addReply(c,shared.ok);
    }
}

0x03 正式切换

dbSwapDatabases 是正式业务处理。

看了前半部分代码，真没想到这么简单，居然就是简单的把 db1，db2 的一些变量做了交换！

看了后半部分代码，才恍然原来还是有点复杂以及对业务有一定影响，具体就是：

通知 redis db 上面已经连结的各个客户端 ready，因为有些客户端在使用B[LR]POP 监听数据，交换了数据库，有些数值就可能已经ready了；
通知 redis db 上面 watch 的客户端，本数据库的数据已经有问题，所以客户端需要处理；

具体如下：

int dbSwapDatabases(long id1, long id2) {
    if (id1 < 0 || id1 >= server.dbnum ||
        id2 < 0 || id2 >= server.dbnum) return C_ERR;
    if (id1 == id2) return C_OK;
    redisDb aux = server.db[id1];
    redisDb *db1 = &server.db[id1], *db2 = &server.db[id2];

    /* Swap hash tables. Note that we don't swap blocking_keys,
     * ready_keys and watched_keys, since we want clients to
     * remain in the same DB they were. */
    db1->dict = db2->dict;
    db1->expires = db2->expires;
    db1->avg_ttl = db2->avg_ttl;
    db1->expires_cursor = db2->expires_cursor;

    db2->dict = aux.dict;
    db2->expires = aux.expires;
    db2->avg_ttl = aux.avg_ttl;
    db2->expires_cursor = aux.expires_cursor;

    /* Now we need to handle clients blocked on lists: as an effect
     * of swapping the two DBs, a client that was waiting for list
     * X in a given DB, may now actually be unblocked if X happens
     * to exist in the new version of the DB, after the swap.
     *
     * However normally we only do this check for efficiency reasons
     * in dbAdd() when a list is created. So here we need to rescan
     * the list of clients blocked on lists and signal lists as ready
     * if needed.
     *
     * Also the swapdb should make transaction fail if there is any
     * client watching keys */
    scanDatabaseForReadyLists(db1);
    touchAllWatchedKeysInDb(db1, db2);
    scanDatabaseForReadyLists(db2);
    touchAllWatchedKeysInDb(db2, db1);
    return C_OK;
}

3.1 通知客户端ready

因为有些客户端在使用B[LR]POP 监听数据，交换了数据库，有些数值就可能已经ready了。

所以首先做的是：通知这两个数据库的客户端，即：遍历监听本数据库的 key 列表，尝试得到对应的 value，如果可以得到 value，就通知客户这个 key 已经ready了。

/* Helper function for dbSwapDatabases(): scans the list of keys that have
 * one or more blocked clients for B[LR]POP or other blocking commands
 * and signal the keys as ready if they are of the right type. See the comment
 * where the function is used for more info. */
void scanDatabaseForReadyLists(redisDb *db) {
    dictEntry *de;
    dictIterator *di = dictGetSafeIterator(db->blocking_keys);
    while((de = dictNext(di)) != NULL) {
        robj *key = dictGetKey(de);
        robj *value = lookupKey(db,key,LOOKUP_NOTOUCH);
        if (value) signalKeyAsReady(db, key, value->type);
    }
    dictReleaseIterator(di);
}

3.2 通知watch客户端

这里是通知 watch 的客户端，本数据库的数据已经有问题，所以客户端需要处理。

可以看到，会遍历 watched keys，得到这些key对应的client，把这些client 的 flag 添加上 CLIENT_DIRTY_CAS。

/* Set CLIENT_DIRTY_CAS to all clients of DB when DB is dirty.
 * It may happen in the following situations:
 * FLUSHDB, FLUSHALL, SWAPDB
 *
 * replaced_with: for SWAPDB, the WATCH should be invalidated if
 * the key exists in either of them, and skipped only if it
 * doesn't exist in both. */
void touchAllWatchedKeysInDb(redisDb *emptied, redisDb *replaced_with) {
    listIter li;
    listNode *ln;
    dictEntry *de;

    if (dictSize(emptied->watched_keys) == 0) return;

    dictIterator *di = dictGetSafeIterator(emptied->watched_keys);
    while((de = dictNext(di)) != NULL) {
        robj *key = dictGetKey(de);
        list *clients = dictGetVal(de);
        if (!clients) continue;
        listRewind(clients,&li);
        while((ln = listNext(&li))) {
            client *c = listNodeValue(ln);
            if (dictFind(emptied->dict, key->ptr)) {
                c->flags |= CLIENT_DIRTY_CAS;
            } else if (replaced_with && dictFind(replaced_with->dict, key->ptr)) {
                c->flags |= CLIENT_DIRTY_CAS;
            }
        }
    }
    dictReleaseIterator(di);
}

这里需要讲解下 Watch的机制。

0x04 Watch机制

4.1 watch 命令

Redis Watch 命令用于监视一个(或多个) key ，如果在事务执行之前这个(或这些) key 被其他命令所改动，那么事务将被打断

语法
redis Watch 命令基本语法如下：
WATCH key [key …]

验证：

首先开启两个redis客户端，客户端1和客户端2.

1. 客户端1中，先set一个值

redis 127.0.0.1:6379> set number 10
OK
12

1. 客户端1开启Watch 此值。

redis 127.0.0.1:6379> watch number
OK
12

1. 客户端1开启事务，修改此值

redis 127.0.0.1:6379> multi
OK
redis 127.0.0.1:6379> set number 100
QUEUED
redis 127.0.0.1:6379> get number
QUEUED
redis 127.0.0.1:6379>
1234567

注意此时先不要exec执行

1. 客户端2，去修改此值

redis 127.0.0.1:6379> set number 500
OK
12

1. 客户端1，执行exec执行

redis 127.0.0.1:6379> exec
(nil)
redis 127.0.0.1:6379> get number
"500"
1234

发现为nil，执行未成功，客户端 1 获取的值为客户端 2 修改后的值。

逻辑如下：

Redis Client 1          Redis Server              Redis Client 2
      +                       +                        +
      |                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        |
set number 10 +-------------> |                        |
      +                       v                        |
      |                  number = 10                   |
      |                       +                        |
      |                       |                        |
      v        start watch    |                        |
watch number +--------------> |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v        begin traction |                        |
    multi    ---------------> |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        |
set number 100                |                        |
      +                       |                        |
      |                       |                        |
      |                       |                        |
      v                       |                        v
  get number                  +<---------------+  set number 500
      +                       v                        +
      |                  number = 500                  |
      |                       +                        |
      v      exec will fail   |                        |
    exec +----------------->  |                        |
      +                       |                        |
      | nil                   |                        |
      |                       |                        |
      v                       |                        |
                              v                        |
  get number <---------+ number = 500                  |
      +                       +                        |
      |                       |                        |
      +                       v                        +

4.2 机制说明

4.2.1 Redis 事务

Redis保证一个事务中的所有命令要么都执行，要么都不执行。如果在发送EXEC命令前客户端断线了，则Redis会清空事务队列，事务中的所有命令都不会执行。而一旦客户端发送了EXEC命令，所有的命令就都会被执行，即使此后客户端断线也没关系，因为Redis中已经记录了所有要执行的命令。

除此之外，Redis的事务还能保证一个事务内的命令依次执行而不被其他命令插入。试想客户端A需要执行几条命令，同时客户端B发送了一条命令，如果不使用事务，则客户端B的命令可能会插入到客户端A的几条命令中执行。如果不希望发生这种情况，也可以使用事务。

4.2.2 不需要回滚

redis的watch+multi实际是一种乐观锁。

若一个事务中有多条命令，若有一条命令错误，事务中的所有命令都不会执行。所以与mysql的事务不同，redis的事务执行中时不会回滚，哪怕出现错误，之前已经执行的命令结果也不会回滚，因为不需要回滚。

用WATCH提供的乐观锁功能，在你EXEC的那一刻，如果被WATCH的键发生过改动，则MULTI到EXEC之间的指令全部不执行，不需要rollback。

4.2.3 提示失败

当客户端A和客户端B同时执行一段代码时候，因为事务的执行是串行的，假设A客户端先于B执行，那么当A执行完成时，会将客户端A从watch了这个key的列表中删除，并且将列表中的所有客户端都设置为CLIENT_DIRTY_CAS，之后当B执行的时候，事务发现B的状态是CLIENT_DIRTY_CAS，便终止事务并返回失败。

4.3 Watch 源码

4.3.1 添加 watch

通过 watchCommand 来给一个client添加一个watch key，最终在 watched_keys 中插入这个 watchedkey。

/* watch命令 */
void watchCommand(client *c) {
    int j;
 
    if (c->flags & CLIENT_MULTI) {
        addReplyError(c,"WATCH inside MULTI is not allowed");
        return;
    }
    for (j = 1; j < c->argc; j++)
        watchForKey(c,c->argv[j]);
    
    addReply(c,shared.ok);
}
 
typedef struct watchedKey {
    robj *key;
    redisDb *db;
} watchedKey;
 
/* watch一个key */
void watchForKey(client *c, robj *key) {
    list *clients = NULL;
    listIter li;
    listNode *ln;
    watchedKey *wk;
 
    /* 检查key是否已经watch 如果已经watch 直接返回 */
    // 创建一个迭代器
    listRewind(c->watched_keys,&li);
    // 遍历客户端已经watch的key
    while((ln = listNext(&li))) {
        wk = listNodeValue(ln);
        // 当发现已经存在此key，直接返回
        if (wk->db == c->db && equalStringObjects(key,wk->key))
            return; /* Key already watched */
    }
    /* 没有被watch，继续一下处理 */
    // 获取hash表中当前key的客户端链表
    clients = dictFetchValue(c->db->watched_keys,key);
    // 如果不存在，则创建一个链表用于存储
    if (!clients) {
        clients = listCreate();
        dictAdd(c->db->watched_keys,key,clients);
        incrRefCount(key);
    }
    // 添加当前客户端到链表末尾
    listAddNodeTail(clients,c);
    /* 维护客户端中的watch_keys 链表 */
    wk = zmalloc(sizeof(*wk));
    wk->key = key;
    wk->db = c->db;
    incrRefCount(key);
    listAddNodeTail(c->watched_keys,wk);
}

具体如下，client 使用 watched_keys 来监控一系列的 key：

+----------------------+
| client               |
|                      |       +------------+     +-------------+
|                      |       | wk         |     | wk          |
|      watched_keys +--------> |      key 1 | ... |       key n |
|                      |       |      db  1 |     |       db  n |
+----------------------+       +------------+     +-------------+

4.3.2 执行命令

具体就是：

在执行命令之前，如果发现client的状态已经被设置为 CLIENT_DIRTY_CAS，则直接终止事务，不会执行事务队列中的命令；
如果在执行 multi 命令过程中，一旦发现问题，就退出遍历，调用 discardTransaction，设置客户端 flags 加上CLIENT_DIRTY_CAS。

具体如下：

/* exec 命令 */
void execCommand(client *c) {
    int j;
    robj **orig_argv;
    int orig_argc;
    struct redisCommand *orig_cmd;
    int must_propagate = 0; /* Need to propagate MULTI/EXEC to AOF / slaves? */
    int was_master = server.masterhost == NULL;
	
    // 未执行multi，则返回
    if (!(c->flags & CLIENT_MULTI)) {
        addReplyError(c,"EXEC without MULTI");
        return;
    }
	
    /*
     * 关键
     * 处理客户端状态 以下两种状态会直接终止事务，不会执行事务队列中的命令
     * 1. CLIENT_DIRTY_CAS => 当因为watch的key被touch了
     * 2. CLIENT_DIRTY_EXEC => 当客户端入队了不存在的命令
     */   
    if (c->flags & (CLIENT_DIRTY_CAS|CLIENT_DIRTY_EXEC)) {
        addReply(c, c->flags & CLIENT_DIRTY_EXEC ? shared.execaborterr :
                                                  shared.nullmultibulk);
        discardTransaction(c);
        goto handle_monitor;
    }
 
    /* 执行队列中的命令 */
    // 清空当前客户端中存储的watch了的key，和hash表中客户端node
    unwatchAllKeys(c); /* Unwatch ASAP otherwise we'll waste CPU cycles */
    orig_argv = c->argv;
    orig_argc = c->argc;
    orig_cmd = c->cmd;
    addReplyMultiBulkLen(c,c->mstate.count);
    // 执行队列中的命令
    for (j = 0; j < c->mstate.count; j++) {
        c->argc = c->mstate.commands[j].argc;
        c->argv = c->mstate.commands[j].argv;
        c->cmd = c->mstate.commands[j].cmd;
 
        /* ACL permissions are also checked at the time of execution in case
         * they were changed after the commands were ququed. */
        int acl_errpos;
        int acl_retval = ACLCheckCommandPerm(c,&acl_errpos);
        if (acl_retval == ACL_OK && c->cmd->proc == publishCommand)
            acl_retval = ACLCheckPubsubPerm(c,1,1,0,&acl_errpos);
        if (acl_retval != ACL_OK) {
            char *reason;
            switch (acl_retval) {
            case ACL_DENIED_CMD:
                reason = "no permission to execute the command or subcommand";
                break;
            case ACL_DENIED_KEY:
                reason = "no permission to touch the specified keys";
                break;
            case ACL_DENIED_CHANNEL:
                reason = "no permission to publish to the specified channel";
                break;
            default:
                reason = "no permission";
                break;
            }
        } else {
            // 这里会call相关的命令
            // 如果是涉及到修改相关的命令，不管有没有更改值，都会将hash表中watch了key的客户端的状态置为CLIENT_DIRTY_CAS            
            call(c,server.loading ? CMD_CALL_NONE : CMD_CALL_FULL);
            serverAssert((c->flags & CLIENT_BLOCKED) == 0);
        }

        /* Commands may alter argc/argv, restore mstate. */
        c->mstate.commands[j].argc = c->argc;
        c->mstate.commands[j].argv = c->argv;
        c->mstate.commands[j].cmd = c->cmd;
    }
    
    c->argv = orig_argv;
    c->argc = orig_argc;
    c->cmd = orig_cmd;
    discardTransaction(c);
 
handle_monitor:
    /* Send EXEC to clients waiting data from MONITOR. We do it here
     * since the natural order of commands execution is actually:
     * MUTLI, EXEC, ... commands inside transaction ...
     * Instead EXEC is flagged as CMD_SKIP_MONITOR in the command
     * table, and we do it here with correct ordering. */
    if (listLength(server.monitors) && !server.loading)
        replicationFeedMonitors(c,server.monitors,c->db->id,c->argv,c->argc);
}
 
/* 清空当前事务数据 */
void discardTransaction(client *c) {
    freeClientMultiState(c);
    initClientMultiState(c);
    c->flags &= ~(CLIENT_MULTI|CLIENT_DIRTY_CAS|CLIENT_DIRTY_EXEC);
    unwatchAllKeys(c);
}

逻辑如下图：

Client 监控了一系列 key；
当 Redis DB 执行 multi 命令失败之后，会设置 flags 为 CLIENT_DIRTY_CAS；
客户端在获得 key 的时候，发现 flag 被设置了，就不会执行事务队列中的命令；

+-------------------+
| client            |
|                   |       +-------------+     +--------------+
|                   |  1    | wk          |     | wk           |
|   watched_keys +--------> |      key 1  | ... |       key n  |
|                   |       |      db  1  |     |       db  n  |
|            ^      |       +-------------+     +--------------+
|            |      |
|            | 3    |                                      +----------------------+
|            |      |                                      | Redis DB             |
|            |      |                                      |                      |
|            +      |  2 set CLIENT_DIRTY_CAS when error   |                      |
|          flags <--------------------------------------------+ execCommand(multi)|
|                   |                                      |                      |
+-------------------+                                      +----------------------+