基于EPOLL模型的局域网聊天室和Echo服务器

一、EPOLL的优点

在Linux中，select/poll/epoll是I/O多路复用的三种方式，epoll是Linux系统上独有的高效率I/O多路复用方式，区别于select/poll。先说select/poll的缺点，以体现epoll的优点。

select:

(1)可监听的socket受到限制，在32位的系统中，默认最大值为1024.

(2)采用轮询方式，当要监听的sock数量很大时，效率低。

(3)随着要监听socket数据的增加，要维护一个存放大量fd的数据结构，系统开销太大。

poll：

解决了可监听socket数据受限的问题（采用链表存储的方式），但是其他确定跟select一样，在有大量并发时，效率并不高。

epoll：

相对于select/poll方式，epoll最大的优点是把哪个fd发生的I/O事件通知我们，而不是像select/poll那样，只是知道有I/O事件发生，具体是哪些fd，并不知道，所以需要从头到尾轮询，而随着要监听的fd数量增加时，效率会变低，而且当只有几个活跃的fd时，这个低效率的缺点会更加明显。总结起来就是：

(1)没有最大可监听数量的限制

(2)效率并不会因为要监听数量的增加而变得低效率

(3)使用mmap文件映射内存来加快消息传递

二、EPOLL的ET模式和LT模式

LT模式，也就是水平触发(select/poll都是水平触发的)。什么意思呢？就是说，比如对于写操作，只要系统缓冲区还有空间可以写，就一直触发可写EPOLLOUT，而读操作，只要系统缓冲区还有未读的数据，就一直触发可读EPOLLIN。

而ET模式，就是边沿触发，边沿，类似于电子电路中的边沿概念。具体来说，有点复杂，请看下面：

对于读操作：

(1) 当buffer由不可读状态变为可读的时候，即由空变为不空的时候。

(2) 当有新数据到达时，即buffer中的待读内容变多的时候。

(3) 当buffer中有数据可读（即buffer不空）且用户对相应fd进行epoll_mod IN事件时。

对于写操作：

(1) 当buffer由不可写变为可写的时候，即由满状态变为不满状态的时候。

(2) 当有旧数据被发送走时，即buffer中待写的内容变少得时候。

(3) 当buffer中有可写空间（即buffer不满）且用户对相应fd进行epoll_mod OUT事件时（具体见下节内容）。

请看下面图示：

(1)可读:由空到非空

(2)可读，可读数据变多了

图1 ET读触发的两种情况

图2 ET写触发的两种情况

(注：这几个图来自：http://blog.chinaunix.net/uid-28541347-id-4285054.html)

三、EPOLL 触发时机

（1）EPOLLIN

ET模式：每次EPOLL_CTL_ADD或EPOLL_CTL_MOD时，如果加入前，就是可读状态，那么加入后会触发1次，不管sock缓冲是否读完，只要对方有send或connect或close或强退，就会触发EPOLLIN（ET/LT是针对一次socket fd就绪的，即一次fd就绪后有数据没读完/没写完，是否还会通知，所以该连接新的数据到来，该事件会触发）

LT模式：只要socket可读，就会一直触发EPOLLIN

（2）EPOLLOUT

ET模式：每次EPOLL_CTL_ADD或EPOLL_CTL_MOD时，如果加入前，就是可写状态，那么加入后会触发1次，如果EPOLLOUT与EPOLLIN一起注册，不管sock发送缓冲是否从满变不满，只要socket发送是不满的，那么每次EPOLLIN触发时，都会触发EPOLLOUT，即获取到不被期望的写事件，这也是为什么要使用ATM模式的原因

LT模式：只要socket可写，就会一直触发EPOLLOUT

简单来说，ET模式下，只要监听了EPOLLIN和EPOLLOUT，socket的每次动作（包括close与不close直接强退），都会触发1次，与读写缓冲的状态无关。

注意：EPOLLERR/EPOLLHUP，这两个是默认已经加到epoll events里面的，无需手动加入。

四、EPOLL ET模式读写以及accept方式

1. EPOLL ET模式的fd为什么要设置为非阻塞模式？

答：因为ET模式下的读写需要一直读或写直到出错（对于读，当读到的实际字节数小于请求字节数时就可以停止），而如果你的文件描述符如果不是非阻塞的，那这个一直读或一直写势必会在最后一次阻塞。这样就不能在阻塞在epoll_wait上了，造成其他文件描述符的任务饿死。下面是设置为非阻塞的代码：

void set_nonblock(int fd)
{
    int fl = fcntl(fd, F_GETFL);
    assert(fl != -1);
    int rc = fcntl(fd, F_SETFL, fl | O_NONBLOCK);
    assert(rc != -1);
}

View Code

2. ET模式的读写

对于读操作，就一直读，直到遇到EAGAIN错误，或者读到为0（对端关闭）或者小于buffer（一次读取的信息）。

对于写操作，就一直写，直到数据发送完，或者 errno = EAGAIN（表示系统缓冲区已满，这个时候，可以选择返回或者等待）。下面是伪代码:

读操作：

/*
 * Return Value: data len that have read
 * Error: -1: read failed, -2: peer fd is closed, -3: no more space
 */
int sock_recv(int fd, char *ptr, int len)
{
    assert(len > 0 && fd > 0);
    assert(ptr != NULL);
    int nread = 0, n = 0;
    while(1) {
        nread = read(fd, ptr+n, len-1);
        if(nread < 0) {
            if(errno == EAGAIN || errno == EWOULDBLOCK) {
                return nread; //have read one
            } else if(errno == EINTR) {
                continue; //interrupt by signal, continue
            } else if(errno == ECONNRESET) {
                return -1; //client send RST
            } else {
                return -1; //faild
            }
        } else if(nread == 0) {
            return -2; //client is closed
        } else if(nread < len-1) {
            return nread; //no more data, read done
        } else {
            /*
             * Here, if nread == len-1, maybe have add done,
             * For simple, we just return here,
             * A better way is to MOD EPOLLIN into epoll events again
             */
            return -3; //no more space
        }
    }

    return nread;
}

View Code

写操作：

/*
 * Return Value: data len that can not send out
 * Normal Value: 0, Error Value: -1
 */
int sock_send(int fd, char *ptr, int len)
{
    assert(fd > 0);
    assert(ptr != NULL);
    assert(len > 0);
    int nsend = 0, n = len;
    while(n > 0) {
        nsend = send(fd, ptr+len-n, n, 0);
        if(nsend < 0) {
            if(errno == EINTR) {
                nsend = 0; //interrupt by signal
            } else if(errno == EAGAIN) {
                //Here, write buffer is full, for simple, just sleep,
                //A better is add EPOLLOUT again?
                usleep(1); 
                continue;
            } else {
                return -1; //send failed!
            }
        }
        
        if(nsend == n) {
            return 0; //send all data
        }

        n -= nsend;
    }

    return n;
}

View Code

3. ET模式下accept问题

考虑这种情况：多个连接同时到达，服务器的TCP就绪队列瞬间积累多个就绪连接，由于是边缘触发模式，epoll只会通知一次，accept只处理一个连接，导致TCP就绪队列中剩下的连接都得不到处理。解决办法是用while循环抱住accept调用，处理完TCP就绪队列中的所有连接后再退出循环。如何知道是否处理完就绪队列中的所有连接呢？accept返回-1并且errno设置为EAGAIN就表示所有连接都处理完。

综合以上两种情况，服务器应该使用非阻塞地accept，accept在ET模式下的正确使用方式为：

while((fd = accept(listenfd, (struct sockaddr *)&addr, (size_t *)&addrlen)) > 0)
{
    handle_client(fd);
}

if(fd == -1)
{
    if(errno != EAGAIN && errno != ECONNABORTED && errno != EPROTO && errno != EINTR)
    {
         printf("accept failed!");
    }
}

View Code

五、基于EPOLL模型的局域网聊天室

所谓局域网聊天室，就是把客户端发过来的信息，转发给其他客户端。具体实现如下：

1. 聊天室服务端

(1)在accept之后，把新的connect_fd EPOLL_CTL_ADD EPOLLIN 到epoll events中。同时，在accept之后，服务端就可以发送消息给客户端了（这个时候，服务端不能在这里接收客户端的消息，请问为什么？）。

(2)监听EPOLLIN事件，接收来自客户端的信息，并把它转发给其他客户端（请问为什么在这里可以直接转发消息给客户端，而不需要再MOD EPOLLOUT事件，然后再监听EPOLLOUT事件？）

(3)使用双链表存储客户端的信息（为什么使用双链表？因为首先你并不知道有多少个客户端，其次使用双链表便于动态增加或删除客户端信息（当客户端退出的时候，要删除对应的记录））

下面是相关的伪代码：

int nfds = epoll_wait(efd, p_events, MAX_EPOLL_NUM, -1);
int i, conn_fd;
for(i = 0; i < nfds; i++)
{
    if(p_events[i].data.fd == fd) //new connect is come in, accept it
    {
        while((conn_fd = accept(fd, (struct sockaddr *)&client_addr, &client_addr_len)) > 0)
        {
            ev.data.fd = conn_fd;
            ev.events = EPOLLIN;
            rc = epoll_ctl(efd, EPOLL_CTL_ADD, conn_fd, &ev);

            bzero(message, MAX_BUF_SIZE);
            sprintf(message, STR_WELCOME, conn_fd);
            rc = send(conn_fd, message, strlen(message), 0);

            //insert our double list to store client information
        }

        if(conn_fd == -1)
        {
            if(errno != EAGAIN && errno != ECONNABORTED 
               && errno != EPROTO && errno != EINTR)
            {
                perror("accept");
                return -1;
            }
            continue; //should not return here, since maybe we have handle all fd
        }
    }
    else if(p_events[i].events & EPOLLERR || p_events[i].events & EPOLLHUP)
    {
        //happen error, delete it 
        rc = epoll_ctl(efd, EPOLL_CTL_DEL, p_events[i].data.fd, &ev);
        assert(rc != -1);
        close(p_events[i].data.fd);
        p_events[i].data.fd = -1;
    }
    else
    {
        //After accept, we can receive msg from clinet and resend back to other clients.
        handle_message(&head, &tail, p_events[i].data.fd);
    }
}                    

int handle_message(struct double_list **head, struct double_list **tail, int fd)
{
    //receive msg from fd
    
    //send msg to other client except fd
}

View Code

2. 聊天室客户端

客户端实现的功能是，基于EPOLL模型，等待用户输入，把所输入的信息发送给服务端，并从服务端接收信息，最后显示出来。具体实现为父进程+子进程，使用PIPE的IPC方式。子进程等待用户出入，然把消息通过PIPE发送给父进程，而父进程从子进程接收信息再发送给服务端，并从服务端接收信息再显示出来。下面是伪代码：

#define CHK(eval) if(eval < 0){perror("eval"); exit(-1);}
#define CHK2(res, eval) if((res = eval) < 0){perror("eval"); exit(-1);}

int pipe_fd[2]; //pipe_fd[0]: read, pipe_fd[1]: write
CHK(pipe(pipe_fd));
ev.data.fd = fd;
CHK2(rc, epoll_ctl(efd, EPOLL_CTL_ADD, fd, &ev));
ev.data.fd = pipe_fd[0];
CHK2(rc, epoll_ctl(efd, EPOLL_CTL_ADD, pipe_fd[0], &ev));
    
int exit_flag = 0;
CHK2(rc, fork());

if(rc < 0)
{
    perror("fork");
}
else if(rc == 0)
{
    //child recv message from input and pass it to parent to send to server
    close(pipe_fd[0]); //close read fd
    while(exit_flag == 0)
    {    
        printf("Enter 'exit' to exit\n");
        fgets(message, sizeof(message), stdin);
        message[strlen(message)-1] = '\0';

        CHK(write(pipe_fd[1], message, strlen(message))); //pass it to parent
    }
}
else
{
    //parent recv message from server and print it
    close(pipe_fd[1]); //close write fd
    int i, n, has_data_flag, nread, nfds = 0;
    while(exit_flag == 0)
    {
        CHK2(nfds, epoll_wait(efd, events, MAX_EPOLL_NUM, -1));

        for(i=0; i<nfds; i++)
        {
            if(events[i].data.fd == fd)
                //msg from char server, receive it and print it to stdout
            else if(events[i].data.fd == pipe_fd[0])
                //msg from child process, recive it and resend it to char server
        }
    }
}

if(rc == 0)
{
    //child
    close(pipe_fd[1]); //close write fd
}
else
{
    //parent
    close(pipe_fd[0]); //close read fd
    close(fd);
}

View Code

六、基于EPOLL的echo服务器

所谓echo服务，就是实现回显功能，具体就是，echo客户端把用户出入的信息发给echo服务端，echo服务端再把消息返回发送给客户端，最后客户端再把接收的消息显出出来。

其实，跟上面局域网聊天室非常类似，只要稍微改改代码就可以了，这里就不具体分析了。

七、EPOLL并发测试

写一个客户端，并发1000个fd去并发连接上面的聊天室服务端，可以看到EPOLL对于并发的处理效率还是挺高的（TBD：用select /poll来做测试对比），下面是部分代码：

clock_t start_time = clock();
    for(i=0; i<MAX_CLIENT_NUM; i++)
    {
        fd = socket(AF_INET, SOCK_STREAM, 0);
        assert(fd != -1);

        rc = connect(fd, (struct sockaddr *)&serv_addr, serv_addr_len);
        assert(rc != -1);
        fds[i] = fd;

        bzero(message, MAX_BUF_SIZE);
        rc = recv(fd, message, MAX_BUF_SIZE, 0);
        printf("%s\n", message);
    }
    
    for(i=0; i<MAX_CLIENT_NUM; i++)
    {
        close(fds[i]);
    }
    printf("Total connections: %d, Test passed at: %.2f seconds\n", MAX_CLIENT_NUM, (double)(clock()-start_time)/CLOCKS_PER_SEC);

View Code

八、写在最后

上面所有的代码都可以在我的GitHub上找到。我的GitHub地址：https://github.com/wolf623/chat_epoll

参考：http://blog.chinaunix.net/uid-28541347-id-4296180.html

<wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">

来自为知笔记(Wiz)

posted on 2017-11-09 10:54 我是修行者阅读(672) 评论(0) 编辑收藏举报