redis SETBIT命令原理

/* SETBIT key offset bitvalue */
bitset的使用位来替代传统的整形数字，标识某个数字对应的值是否存在
底层有一个byte[]来实现，byte是程序语言中仅次于位(Bit)的类型，1byte=8bit
在此之上，有short、int、long
1short=2byte=16bit
1int = 4byte = 32bit
1long = 8byte= 64bit

一个数字1转换成2进制bit，如果它一个byte，则为
00000001 从左向右，第1位是符号位，因此byte的大小范围绝对值右后7位决定

基于此原理，我们可以将byte转换成bit形式，如果用0表示不存在，1表示存在，则一个byte就能表示8个事务是否存在
这就是bitmap的基本原理，因为存储8个信息只耗费1byte，那么存储8亿个信息只耗费1亿byte，相比之下，如果使用int来代替，则需要8亿int，也就是32亿byte
他们的相差了32倍的存储空间，如果是long型，则再乘以4，但是为这么多数字分配数组空间的代价实在是太大了，缺少了实用性，因此一般存储个信息个数都只存储int范围。

比较实用的场景如下：
假如有1亿个用户，记录所有用户今天是否登录过，如果登录过bit位值赋1，否则赋0

使用byte[]存储这种状态信息，需要1亿/8 = 1250万byte =12500KB = 12.5MB

场景介绍完后，我们来看看redis中的应用，通过setbitCommand方法来一窥究竟

/* SETBIT key offset bitvalue */
void setbitCommand(client *c) {
    robj *o;  //redis object，可以简单的理解成key对应的value对象，有对应的数据结构,参见https://www.cnblogs.com/windliu/p/9183024.html尾部
    char *err = "bit is not an integer or out of range"; //错误提示字符串
    size_t bitoffset;  //size_t是标准C库中定义的，应为unsigned int，在64位系统中为 long unsigned int。
    ssize_t byte, bit;
    int byteval, bitval;
    long on;

    if (getBitOffsetFromArgument(c,c->argv[2],&bitoffset,0,0) != C_OK)  //offset的校验操作，访问内存太大或无效的数字，使用的方式是offset/8,保证他是非负整数，且不超过512MB
    //赋值bitoffset = 转换为数字型的offset
        return;

    if (getLongFromObjectOrReply(c,c->argv[3],&on,err) != C_OK)  //校验bit位的value是否在long范围内
        return;

    /* Bits can only be set or cleared... */  //位只有被set或清除
    if (on & ~1) {  //算式，如果on 不为0和1，则算式成立，返回错误
        addReplyError(c,err);
        return;
    }

    if ((o = lookupStringForBitCommand(c,bitoffset)) == NULL) return;

    /* Get current values */
    byte = bitoffset >> 3;
    byteval = ((uint8_t*)o->ptr)[byte];  //由于byte是primitive(Java的观点),如果没有赋值过,值就是0
    bit = 7 - (bitoffset & 0x7);  //从左向右方，就像数组下标是从0 -> ...一样
    bitval = byteval & (1 << bit);

    /* Update byte with new bit value and return original value */
    byteval &= ~(1 << bit);
    byteval |= ((on & 0x1) << bit);
    ((uint8_t*)o->ptr)[byte] = byteval;
    signalModifiedKey(c->db,c->argv[1]);
    notifyKeyspaceEvent(NOTIFY_STRING,"setbit",c->argv[1],c->db->id);
    server.dirty++;
    addReply(c, bitval ? shared.cone : shared.czero); //返回以前的值
}

1 -> 00000001
~1 -> 11111110

on & ~1 -> 假如on =1  -> 0; on =0 -> 0; on = 3 -> 00000010 = 2

也就是说，不为1和0的byte范围，on & ~1 都是1个大于0的数字，将直接返回错误
同理，当1作为int或long型的时候，依然保留此性质；

下面的代码可以看到bitset具体的存储结构为一个sds字符串

/* This is an helper function for commands implementations that need to write
 * bits to a string object. The command creates or pad with zeroes the string
 * so that the 'maxbit' bit can be addressed. The object is finally
 * returned. Otherwise if the key holds a wrong type NULL is returned and
 * an error is sent to the client. */
robj *lookupStringForBitCommand(client *c, size_t maxbit) {
    size_t byte = maxbit >> 3;
    robj *o = lookupKeyWrite(c->db,c->argv[1]);  //找到key对应的对象

    if (o == NULL) {  //对象不存在，创建
        o = createObject(OBJ_STRING,sdsnewlen(NULL, byte+1)); //大小为byte + 1,sds:动态简单字符串，初始化值为NULL,初始化长度 byte + 1,因为max >> 3可能有余数，+1保证maxbit都能存进来，如果初始化值为null，不会占用空间
        dbAdd(c->db,c->argv[1],o); //放入db中，redis有16个db，默认使用db0
    } else {
        if (checkType(c,o,OBJ_STRING)) return NULL; //key对应的不是一个字符串，报错
        o = dbUnshareStringValue(c->db,c->argv[1],o);
        o->ptr = sdsgrowzero(o->ptr,byte+1);
    }
    return o;
}


typedef char *sds;
 
struct sdshdr {
    int len;     //buf已占用的长度，即当前字符串长度值
    int free;    //buf空余可用的长度，append时使用
    char buf[];  //实际保存字符串数据
}	


/* Create a new sds string with the content specified by the 'init' pointer
 * and 'initlen'.
 * If NULL is used for 'init' the string is initialized with zero bytes.
 *
 * The string is always null-termined (all the sds strings are, always) so
 * even if you create an sds string with:
 *
 * mystring = sdsnewlen("abc",3);
 *
 * You can print the string with printf() as there is an implicit \0 at the
 * end of the string. However the string is binary safe and can contain
 * \0 characters in the middle, as the length is stored in the sds header. */
sds sdsnewlen(const void *init, size_t initlen) {
    void *sh;
    sds s;
    char type = sdsReqType(initlen);
    /* Empty strings are usually created in order to append. Use type 8
     * since type 5 is not good at this. */
    if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    int hdrlen = sdsHdrSize(type);
    unsigned char *fp; /* flags pointer. */

    sh = s_malloc(hdrlen+initlen+1);
    if (sh == NULL) return NULL;
    if (!init)
        memset(sh, 0, hdrlen+initlen+1);
    s = (char*)sh+hdrlen;
    fp = ((unsigned char*)s)-1;
    switch(type) {
        case SDS_TYPE_5: {
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
    }
    if (initlen && init)
        memcpy(s, init, initlen);
    s[initlen] = '\0';
    return s;
}

这里需要拓展一下sds字符串，他里面包含了header和char * ，char在c和c++语言中占用一个字节，如果用char来进行位运算，结果和byte来运算是一样

没有使用byte[]的原因

程序中，声明一个数组的时候，需要定义长度及类型，此时就会直接分配类型占用字节数 * 长度len 的内存空间
数组的长度一旦确定，就无法动态变化，并且sdk对初始化null之类的操作有很多优化

sds字符串中的char数组可以直接对应byte[]数组

sizeOf、size_t方法大有深意:
CPU一次性能读取数据的二进制位数称为字长，也就是我们通常所说的32位系统（字长4个字节）、64位系统（字长8个字节）的由来。所谓的8字节对齐，就是指变量的起始地址是8的倍数。比如程序运行时（CPU）在读取long型数据的时候，只需要一个总线周期，时间更短，如果不是8字节对齐的则需要两个总线周期才能读完数据。(原文：https://blog.csdn.net/guodongxiaren/article/details/44747719 )，size_t承担了这个工作
字节如果不对齐可能的情况：
0x10000 分别存储了int a(4byte)\short b(2byte)\int c-byte1\int c-byte2
0x10008 分别存储了int c-byte3\int c-byte4...
为了读int c变量，可能需要两次读操作

位置寻找方法图

更新操作

然后修改指定bit位，更新char[] buff对应下标的值,返回原来的值，结束

posted on 2019-01-11 17:21 j.liu windliu 阅读(1713) 评论(0) 编辑收藏举报