自顶向下redis4.0（2）文件事件与客户端

redis4.0的文件事件与客户端

redis4.0的文件事件与客户端

简介

文件事件的流程大概如下：

在服务器初始化时生成aeEventLoop并赋值给server，接着创建监听TCP连接事件。
处理TCP连接时会创建client类型的对象，将其绑定在accept函数返回的文件描述符fd上，并对fd注册一个可读事件，当客户端数据来临时，readQueryFromClient会对数据进行处理。
redis处理完数据后，会调用write函数将数据返回给客户端（但不是在一个循环里）。如果函数返回的值小于写入的值，说明系统缓存区空间不够，或者文件描述符在中途被占用，那么redis会注册一个可写事件，当可写事件触发时，sendReplyToClient函数会写入剩余的数据。
当客户端断开连接，服务器会释放client相关的资源，随之删除对应的文件事件。

正文

准备阶段

在初始化服务器时，server函数创建clients，clients_pending_write等字段，并通过aeCreateEventLoop创建一个aeEventLoop对象。

aeEventLoop *aeCreateEventLoop(int setsize) {
    aeEventLoop *eventLoop;
    int i;

    eventLoop = zmalloc(sizeof(*eventLoop));
    eventLoop->events = zmalloc(sizeof(aeFileEvent)*setsize);
    eventLoop->fired = zmalloc(sizeof(aeFiredEvent)*setsize);
    eventLoop->setsize = setsize;
    if (aeApiCreate(eventLoop) == -1) goto err;
    /* Events with mask == AE_NONE are not set.*/
    for (i = 0; i < setsize; i++)
        eventLoop->events[i].mask = AE_NONE;
    return eventLoop;
}

此处传入参数setsize的值为maxClients+128，maxClients默认值为10,000。events用于存放注册的文件事件，而fired则在事件触发时，存放被触发的事件。两者的长度都为setsize大小。

紧接着便会注册第一个文件事件。

aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL)

这里我们更进一步看下aeCreateFileEvent的代码， fd文件描述符被用于偏移来获取对应的文件事件结构，因此fd的值必须小于之前注册的事件大小的值。第一个被用于注册文件事件的fd用于监听TCP连接，由于进程启动时会打开一些其他的文件，因此eventLoop->events的空间并没有并完全利用。此处还通过mask来注册对应的事件触发后的处理函数。如果是监听可读事件，那么rfileProc处理函数会被赋值。可写事件同理。此时并没有传入clientData，我们会在下文再回到这个函数。

int aeCreateFileEvent(aeEventLoop *eventLoop, int fd, int mask, aeFileProc *proc, void *clientData)
{
    if (fd >= eventLoop->setsize) {
        errno = ERANGE;
        return AE_ERR;
    }
    aeFileEvent *fe = &eventLoop->events[fd]; 

    if (aeApiAddEvent(eventLoop, fd, mask) == -1)
        return AE_ERR;
    fe->mask |= mask;
    if (mask & AE_READABLE) fe->rfileProc = proc;
    if (mask & AE_WRITABLE) fe->wfileProc = proc;
    fe->clientData = clientData;
    if (fd > eventLoop->maxfd)
        eventLoop->maxfd = fd;
    return AE_OK;
}

接受客户端连接

当接受来自客户端连接时，便会调用acceptTcpHandler函数，该函数会接受所有客户端的请求，但一次最多接受MAX_ACCEPTS_PER_CALL1000个客户端，并且如果在轮询中发现没有客户端请求，就会立刻返回。接受了一个客户端连接请求后，便会进入处理函数acceptCommonHandler，它会创建一个client的对象，如果连接的数量大于设置的值，则会断开连接。如果redis跑在保护模式，则可能返回错误信息。

If no pending connections are present on the queue, and the socket is
not marked as nonblocking, accept() blocks the caller until a
connection is present.  If the socket is marked nonblocking and no
pending connections are present on the queue, accept() fails with the
error EAGAIN or EWOULDBLOCK.

最主要的代码位于createClient，它会注册客户端可读事件，关联readQueryFromClient函数，并且初始化client的一些属性。

client *createClient(int fd) {
    client *c = zmalloc(sizeof(client));

    if (fd != -1) {
        anetNonBlock(NULL,fd);
        anetEnableTcpNoDelay(NULL,fd);
        if (server.tcpkeepalive)
             anetKeepAlive(NULL,fd,server.tcpkeepalive);
        if (aeCreateFileEvent(server.el,fd,AE_READABLE,
            readQueryFromClient, c) == AE_ERR)
        {
            close(fd);
            zfree(c);
            return NULL;
        }
    }

    selectDb(c,0);
    uint64_t client_id;

    client_id = server.next_client_id;
    server.next_client_id += 1;

    c->id = client_id;
    c->fd = fd;
    c->name = NULL;
    c->bufpos = 0; //下一个返回数据存入位置
    				// c->buf 数组存储返回给客户端的数据
    c->querybuf = sdsempty(); //查询缓存
    c->reqtype = 0; //查询类型 一般为multi
    c->argc = 0; //参数个数 由querybuf解析而得
    c->argv = NULL;//参数值 由querybuf解析而得
    c->cmd = c->lastcmd = NULL;
    c->multibulklen = 0; //查询数据的行数
    c->bulklen = -1;//一行查询数据的长度
    c->sentlen = 0;//已经发送的数据长度
    c->flags = 0; 
    c->ctime = c->lastinteraction = server.unixtime;
    c->reply = listCreate(); //如果buf 数组溢出，则使用reply链表
    c->reply_bytes = 0;		//reply链表中对象总共的字节数
    c->obuf_soft_limit_reached_time = 0;
    listSetFreeMethod(c->reply,freeClientReplyValue);
    listSetDupMethod(c->reply,dupClientReplyValue);
    if (fd != -1) linkClient(c);
    return c;
}

处理数据

redis通过aeProcessEvents函数处理各种事件，首先它会调用aeApiPoll函数通过多路复用函数来检查已经触发的事件，并将已经触发事件的文件描述符，事件类型赋值给eventLoop->fired。然后根据事件触发的类型，调用之前注册的函数。

int aeProcessEvents(aeEventLoop *eventLoop, int flags)
{
    int processed = 0, numevents;

    int j;
    struct timeval tv, *tvp;

    tvp = NULL; /* wait forever */

    numevents = aeApiPoll(eventLoop, tvp);

    for (j = 0; j < numevents; j++) {
        aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd];
        int mask = eventLoop->fired[j].mask;
        int fd = eventLoop->fired[j].fd;


        if (fe->mask & mask & AE_READABLE) {
            fe->rfileProc(eventLoop,fd,fe->clientData,mask);
        }

        if (fe->mask & mask & AE_WRITABLE) {
            if (!fired || fe->wfileProc != fe->rfileProc) {
                fe->wfileProc(eventLoop,fd,fe->clientData,mask);
            }
        }

        processed++;
    }

    return processed; /* return the number of processed file/time events */
}

如果此时有来自客户端的数据，那么将会触发AE_READABLE事件，调用readQueryFromClient函数。默认情况一次读取16KB，除非上次已经读取过数据，并且数据量较大，一行长度超过32KB。（超过32KB则会对其优化，避免了字符串的拷贝，代价是多了几次read调用）。

如果超过32KB，并且剩余长度小于16KB，那么一次读取剩余该行长度的值。这是因为TCP接受的数据不一定是完整的数据，如果是PROTO_REQ_MULTIBULK多行请求，并且数据量过大，在redis开始处理请求前需要接收全部的数据，等待的时间过长，并且解析完毕之后，执行命令的时间和下发数据的长度也会影响性能。建议一次请求不超过16KB，但这16KB中还包含着*/r/n等格式符号，因此请求的数据量还要再小一些，才能保证服务端尽可能在一次接收数据的过程中完成命令的解析。

void readQueryFromClient(aeEventLoop *el, int fd, void *privdata, int mask) {
    client *c = (client*) privdata;
    int nread, readlen;
    size_t qblen;

    readlen = PROTO_IOBUF_LEN;//1024*16 bytes

    if (c->reqtype == PROTO_REQ_MULTIBULK && c->multibulklen && c->bulklen != -1
        && c->bulklen >= PROTO_MBULK_BIG_ARG)
    {
        ssize_t remaining = (size_t)(c->bulklen+2)-sdslen(c->querybuf);

        //如果超过**32KB**，并且剩余长度小于**16KB**，那么一次读取剩余该行长度的值。
        // 如果触发，则在processMultibulkBuffer可以直接使用现有的字符串避免了字符串的复制，代价是多调用了几次 read(2)函数。
        if (remaining < readlen) readlen = remaining;
    }

    qblen = sdslen(c->querybuf);

    c->querybuf = sdsMakeRoomFor(c->querybuf, readlen);
    nread = read(fd, c->querybuf+qblen, readlen);
    if (nread == -1) {
        if (errno == EAGAIN) {
            return;
        } else {
            serverLog(LL_VERBOSE, "Reading from client: %s",strerror(errno));
            freeClient(c); 
            return;
        }
    } else if (nread == 0) {
        serverLog(LL_VERBOSE, "Client closed connection");
        freeClient(c); 
        return;
    } 

    sdsIncrLen(c->querybuf,nread);

    processInputBuffer(c);

}

接着就会进入processInputBuffer函数，此时数据可能全部抵达，也可能部分抵达。processInputBuffer函数的主要功能是将，client->querybuff里面的数据解析，并转化为client->argc 和client->argv的数据。如果数据全部抵达，那么接着会进入到processComand函数，查找命令表，执行命令并返回数据给客户的。如果数据部分抵达，但是一行的数据内容抵达，那么该行数据会被解析到client->argc和client->argv中去。

返回数据结果

在这里我们假设客户端输入的字符串是quit，processComand函数会调用addReply函数将当前的client加入到clients_pending_write链表中。

再将存储OK字符串的对象添加到缓冲区，服务端返回给客户端的编码类型只可能是字符型或者是INT型。首先redis会尝试将结果添加到缓冲区，缓冲区的大小默认16KB，并且不能通过配置更改。如果缓冲区会溢出，那么redis会将数据添加到client->reply链表中。

void addReply(client *client, robj *obj) {
    if (prepareClientToWrite(client) != C_OK) return; 

    if (sdsEncodedObject(obj)) {
        if (_addReplyToBuffer(client,obj->ptr,sdslen(obj->ptr)) != C_OK)
            _addReplyObjectToList(client,obj);
    } else if (obj->encoding == OBJ_ENCODING_INT) {
        ...
    } else {
        // serverPanic("Wrong obj->encoding in addReply()");
        serverLog(LL_WARNING, "Wron obj->encoding in addReply()");
    }
}

此时数据还没有返回给客户端，在redis进入下一次循环的时候，会调用beforeSleep函数将数据返回给客户端。

为什么redis不直接将数据返回给客户端呢？

源码的注释给出了答案：为了实现fsync=always的效果，将返回数据放在beforeSleep中，可以通过AOF持久后，再返回给客户端结果。

 /* For the fsync=always policy, we want that a given FD is never
  * served for reading and writing in the same event loop iteration,
  * so that in the middle of receiving the query, and serving it
  * to the client, we'll call beforeSleep() that will do the
  * actual fsync of AOF to disk. AE_BARRIER ensures that. */

beforeLoop会接着调用handleClientsWithPendingWrites函数来处理有缓存数据的client，writeToClient函数会将buf中和reply链表中的数据全部发送给客户端，如果实际发送的数据小于应当发送的数据，则表示系统缓存区空间不够，或者文件描述符在中途被占用，那么redis会创建一个事件，当监听到文件描述符可读时，再将剩余数据写入。

int handleClientsWithPendingWrites(void) {
    listIter li;
    listNode *ln;
    int processed = listLength(server.clients_pending_write);

    listRewind(server.clients_pending_write,&li);
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        c->flags &= ~CLIENT_PENDING_WRITE;
        listDelNode(server.clients_pending_write,ln);

        /* Try to write buffers to the client socket. */
        if (writeToClient(c->fd,c,0) == C_ERR) continue;

        /* If after the synchronous writes above we still have data to
         * output to the client, we need to install the writable handler. */
        if (clientHasPendingReplies(c)) {
             int ae_flags = AE_WRITABLE; 
            if (aeCreateFileEvent(server.el, c->fd, ae_flags,
                sendReplyToClient, c) == AE_ERR)
            {
                    freeClientAsync(c);
            }
        }
    }
    return processed;
}

在写完数据后，发现客户端有被标记CLIENT_CLOSE_AFTER_REPLY，那么将会释放客户端的资源。

if (c->flags & CLIENT_CLOSE_AFTER_REPLY) {
    freeClient(c);
    return C_ERR;
}

参考文献

accept函数

《Redis设计与实现》

posted @ 2020-12-09 17:15 dewxin 阅读(111) 评论(0) 编辑收藏举报

刷新页面返回顶部

Dewxin