epoll
http://www.cnblogs.com/bigwangdi/p/3182958.html
epoll
is a Linux kernel system call, a scalable I/O event notification mechanism, first introduced in Linux kernel 2.5.44.[1] It is meant to replace the older POSIX select(2) and poll(2) system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll operates in O(1) time[2]). epoll is similar to FreeBSD's kqueue, in that it operates on a configurable kernel object, exposed to user space as a file descriptor of its own.
中文版 推荐看下
epoll 使用详解
EPOLL事件有两种模型 Level Triggered (LT) 和 Edge Triggered (ET):
LT(level triggered,水平触发模式)是缺省的工作方式,并且同时支持 block 和 non-block socket。在这种做法中,内核告诉你一个文件描述符是否就绪了,然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作,内核还是会继续通知你的,所以,这种模式编程出错误可能性要小一点。
ET(edge-triggered,边缘触发模式)是高速工作方式,只支持no-block socket。在这种模式下,当描述符从未就绪变为就绪时,内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪,并且不会再为那个文件描述符发送更多的就绪通知,等到下次有新的数据进来的时候才会再次出发就绪事件。
epoll的优点主要是一下几个方面:
-
监视的描述符数量不受限制,它所支持的FD上限是最大可以打开文件的数目,这个数字一般远大于2048,举个例子,在1GB内存的机器上大约是10万左 右,具体数目可以cat /proc/sys/fs/file-max察看,一般来说这个数目和系统内存关系很大。select的最大缺点就是进程打开的fd是有数量限制的。这对 于连接数量比较大的服务器来说根本不能满足。虽然也可以选择多进程的解决方案( Apache就是这样实现的),不过虽然linux上面创建进程的代价比较小,但仍旧是不可忽视的,加上进程间数据同步远比不上线程间同步的高效,所以也 不是一种完美的方案。
-
IO的效率不会随着监视fd的数量的增长而下降。epoll不同于select和poll轮询的方式,而是通过每个fd定义的回调函数来实现的。只有就绪的fd才会执行回调函数。
3.支持电平触发和边沿触发(只告诉进程哪些文件描述符刚刚变为就绪状态,它只说一遍,如果我们没有采取行动,那么它将不会再次告知,这种方式称为边缘触发)两种方式,理论上边缘触发的性能要更高一些,但是代码实现相当复杂。
4.mmap加速内核与用户空间的信息传递。epoll是通过内核于用户空间mmap同一块内存,避免了无畏的内存拷贝。
#include <sys/epoll.h>
//epoll_create, epoll_create1 - open an epoll file descriptor
int epoll_create(int size);
int epoll_create1(int flags);
//control interface for an epoll descriptor
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
EPOLL_CTL_ADD//注册新的fd到epfd中
Register the target file descriptor fd on the epoll instance referred to by the file descriptor epfd and associate the event event with the internal file linked to fd.
EPOLL_CTL_MOD//修改已经注册的fd的监听事件
Change the event event associated with the target file descriptor fd.
EPOLL_CTL_DEL//从epfd中删除一个fd
Remove (deregister) the target file descriptor fd from the epoll instance referred to by epfd.
The event is ignored and can be NULL (but see BUGS below).
第三个参数是需要监听的fd,第四个参数是告诉内核需要监听什么事,struct epoll_event 结构如下:
The event argument describes the object linked to the file descriptor fd. The struct epoll_event is defined as:
typedef union epoll_data {
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;
struct epoll_event {
uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
//epoll_wait, epoll_pwait - wait for an I/O event on an epoll file
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
int epoll_pwait(int epfd, struct epoll_event *events, int maxevents, int timeout, const sigset_t *sigmask);
//events 可以是以下几个宏的集合:
EPOLLIN //表示对应的文件描述符可以读(包括对端SOCKET正常关闭);
EPOLLOUT //表示对应的文件描述符可以写;
EPOLLPRI //表示对应的文件描述符有紧急的数据可读(这里应该表示有带外数据到来);
EPOLLERR //表示对应的文件描述符发生错误;
EPOLLHUP //表示对应的文件描述符被挂断;
EPOLLET //将EPOLL设为边缘触发(Edge Triggered)模式,这是相对于水平触发(Level Triggered)来说的。
EPOLLONESHOT //只监听一次事件,当监听完这次事件之后,如果还需要继续监听这个socket的话,需要再次把这个socket加入到EPOLL队列里。
//当对方关闭连接(FIN), EPOLLERR,都可以认为是一种EPOLLIN事件,在read的时候分别有0,-1两个返回值。
参数events用来从内核得到事件的集合,maxevents 告之内核这个events有多大,这个 maxevents 的值不能大于创建 epoll_create() 时的size,参数 timeout 是超时时间(毫秒,0会立即返回,-1将不确定,也有说法说是永久阻塞)。该函数返回需要处理的事件数目,如返回0表示已超时。
#include <unistd.h>
int close(int fd);//close a file descriptor
close() closes a file descriptor, so that it no longer refers to any
file and may be reused. Any record locks (see fcntl(2)) held on the
file it was associated with, and owned by the process, are removed
(regardless of the file descriptor that was used to obtain the lock).
If fd is the last file descriptor referring to the underlying open
file description (see open(2)), the resources associated with the
open file description are freed; if the descriptor was the last
reference to a file which has been removed using unlink(2), the file
is deleted.
example 2
#define MAX_EVENTS 10
struct epoll_event ev, events[MAX_EVENTS];
int listen_sock, conn_sock, nfds, epollfd;
/* Code to set up listening socket, 'listen_sock',
(socket(), bind(), listen()) omitted */
epollfd = epoll_create1(0);// returns a file descriptor referring to the new epoll instance
if (epollfd == -1) {
perror("epoll_create1");
exit(EXIT_FAILURE);
}
ev.events = EPOLLIN;
ev.data.fd = listen_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) { //添加listen_sock 的fd
perror("epoll_ctl: listen_sock");
exit(EXIT_FAILURE);
}
for (;;) {
nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
if (nfds == -1) {
perror("epoll_wait");
exit(EXIT_FAILURE);
}
for (n = 0; n < nfds; ++n) {
if (events[n].data.fd == listen_sock) {
conn_sock = accept(listen_sock,
(struct sockaddr *) &local, &addrlen);
if (conn_sock == -1) {
perror("accept");
exit(EXIT_FAILURE);
}
setnonblocking(conn_sock);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = conn_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
&ev) == -1) {
perror("epoll_ctl: conn_sock");
exit(EXIT_FAILURE);
}
} else {
do_use_fd(events[n].data.fd);
}
}
}