       epoll - I/O event notification facility
       /*I/O 事件通知设施*/

       #include <sys/epoll.h>

       The epoll API performs a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them.  
       The epoll API can be used either as an edge-triggered or a level-triggered interface and scales well to large numbers of watched file descriptors.
       /*epoll API执行与poll(2)类似的任务:监视多个文件描述符,以查看其中任何一个文件描述符上是否可以进行I/O。          
       epoll API既可以用作边缘触发接口,也可以用作水平触发接口,并且可以很好地扩展到大量被监视的文件描述符。*/

       The central concept of the epoll API is the epoll instance, an in-kernel data structure which, from a user-space perspective,
        can be considered as a container for two lists:
        /*epoll API的核心概念是epoll实例,这是一个内核内的数据结构,从用户空间的角度来看,可以看作是两个列表的容器:*/

       *   The interest list (sometimes also called the epoll set): the set of file descriptors that the process has registered an interest in monitoring.

       *   The ready list: the set of file descriptors that are "ready" for I/O.  
       The ready list is a subset of (or, more precisely, a set of references to) the file descriptors in the interest list that is dynamically populated 
       by the kernel as a result of I/O activity on those file descriptors.
       * 兴趣列表(有时也称为epoll集):进程已注册有兴趣监视的文件描述符集。
       * ready list:准备好I/O的文件描述符集合。  
       * 就绪列表是兴趣列表中的文件描述符的子集(或者更准确地说,是一组引用),由内核动态填充,作为这些文件描述符上的I/O活动的结果。

       The following system calls are provided to create and manage an epoll instance:

       *  epoll_create(2) creates a new epoll instance and returns a file descriptor referring to that instance.  
       (The more recent epoll_create1(2) extends the functionality of epoll_create(2).)

       *  Interest in particular file descriptors is then registered via epoll_ctl(2), which adds items to the interest list of the epoll instance.

       *  epoll_wait(2) waits for I/O events, blocking the calling thread if no events are currently available.  
       (This system call can be thought of as fetching items from the ready list of the epoll instance.)

       *  epoll_create(2)创建一个新的epoll实例,并返回一个引用该实例的文件描述符。 (最近的epoll_create1(2)扩展了epoll_create(2)的功能)。

       *  然后通过epoll_ctl(2)注册对特定文件描述符的兴趣,它将项目添加到epoll实例的兴趣列表中。

       *  epoll_wait(2)等待I/O事件,如果当前没有可用的事件,则阻塞调用线程。 (系统调用可以被认为是从epoll实例的就绪列表中获取项目)。

   Level-triggered and edge-triggered /*水平触发 和边沿触发*/
       The epoll event distribution interface is able to behave both as edge-triggered (ET) and as level-triggered (LT).  
       The difference between the two mechanisms can be described as follows.  Suppose that this scenario happens:

       1. The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance.

       2. A pipe writer writes 2 kB of data on the write side of the pipe.

       3. A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.

       4. The pipe reader reads 1 kB of data from rfd.

       5. A call to epoll_wait(2) is done.
       1. 表示管道读端的文件描述符(rfd)在epoll实例上注册。

       2. 管道写入器在管道的写入端写入2 kB数据。

       3. 调用epoll_wait(2)将返回rfd作为就绪文件描述符。

       4. 管道读取器从rfd读取1 kB数据。

       5. 调用epoll_wait(2)完成。--->可能会产生无限期的阻塞 

       If the rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag, 
       the call to epoll_wait(2) done in step 5 will probably hang despite the available data still present in the file input buffer; 
       meanwhile the remote peer might be expecting a response based on the data it already sent.  
       The reason for this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor.  
       So, in step 5 the caller might end up waiting for some data that is already present inside the input buffer.  
       In the above example, an event on rfd will be generated because of the write done in 2 and the event is consumed in 3.  
       Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might block indefinitely.





       An application that employs the EPOLLET flag should use nonblocking file descriptors to avoid having a blocking read or write starve a task that is 
       handling multiple file descriptors. 
       The suggested way to use epoll as an edge-triggered (EPOLLET) interface is as follows:

              i   with nonblocking file descriptors; and

              ii  by waiting for an event only after read(2) or write(2) return EAGAIN.
              i 使用非阻塞文件描述符;以及

              ii 通过仅在read(2)或write(2)之后等待事件返回EAGAIN。

       By contrast, when used as a level-triggered interface (the default, when EPOLLET is not specified), epoll is simply a faster poll(2), 
       and can be used wherever the latter is used since it shares the same semantics.

       Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, 
       the caller has the option to specify the EPOLLONESHOT flag,
        to tell epoll to disable the associated file descriptor after the receipt of an event with epoll_wait(2).  
        When the EPOLLONESHOT flag is specified, it is the caller's responsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.


       If multiple threads (or processes, if child processes have inherited the epoll file descriptor across fork(2)) are blocked in epoll_wait(2) 
       waiting on the same epoll file descriptor and a file descriptor in the interest list that is marked for edge-triggered (EPOLLET) notification becomes ready, 
       just one of the threads (or processes) is awoken from epoll_wait(2).  
       This provides a useful optimization for avoiding "thundering herd" wake-ups in some scenarios.

   Interaction with autosleep /*与autosleep的交互*/
       If the system is in autosleep mode via /sys/power/autosleep and an event happens which wakes the device from sleep, 
       the device driver will keep the device awake only until that event is queued.  
       To keep the device awake until the event has been processed, 
       it is necessary to use the epoll_ctl(2) EPOLLWAKEUP flag.


       When the EPOLLWAKEUP flag is set in the events field for a struct epoll_event, the system will be kept awake from the moment the event is queued, 
       through the epoll_wait(2) call which returns the event until the subsequent epoll_wait(2) call.  
       If the event should keep the system awake beyond that time, then a separate wake_lock should be taken before the second epoll_wait(2) call.

   /proc interfaces
       The following interfaces can be used to limit the amount of kernel memory consumed by epoll:

       /proc/sys/fs/epoll/max_user_watches (since Linux 2.6.28)
              This specifies a limit on the total number of file descriptors that a user can register across all epoll instances on the system.  
              The limit is per real user ID.  Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes on a 64-bit kernel.  
              Currently, the default value for max_user_watches is 1/25 (4%) of the available low memory, divided by the registration cost in bytes.

   Example for suggested usage
       While the usage of epoll when employed as a level-triggered interface does have the same semantics as poll(2), 
       the edge-triggered usage requires more clarification to avoid stalls in the application event loop.  
       In this example, listener is a nonblocking socket on which listen(2) has been called.  
       The function do_use_fd() uses the new ready file descriptor until EAGAIN is returned by either read(2) or write(2). 
        An event-driven state machine application should, after having received EAGAIN, 
        record its current state so that at the next call to do_use_fd() it will continue to read(2) or write(2) from where it stopped before.

        记录它的当前状态,以便在下一次调用do_use_fd()时,它将继续从之前停止的位置执行read(2) or write(2)。

           #define MAX_EVENTS 10
           struct epoll_event ev, events[MAX_EVENTS];
           int listen_sock, conn_sock, nfds, epollfd;

           /* Code to set up listening socket, 'listen_sock',
              (socket(), bind(), listen()) omitted */

           epollfd = epoll_create1(0);
           if (epollfd == -1) {

           ev.events = EPOLLIN;
           ev.data.fd = listen_sock;
           if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
               perror("epoll_ctl: listen_sock");

           for (;;) {
               nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
               if (nfds == -1) {

               for (n = 0; n < nfds; ++n) {
                   if (events[n].data.fd == listen_sock) {
                       conn_sock = accept(listen_sock,
                                          (struct sockaddr *) &addr, &addrlen);
                       if (conn_sock == -1) {
                       ev.events = EPOLLIN | EPOLLET;
                       ev.data.fd = conn_sock;
                       if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
                                   &ev) == -1) {
                           perror("epoll_ctl: conn_sock");
                   } else {

       When used as an edge-triggered interface, for performance reasons, 
       it is possible to add the file descriptor inside the epoll interface (EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).  
       This allows you to avoid continuously switching between EPOLLIN and EPOLLOUT calling epoll_ctl(2) with EPOLL_CTL_MOD.

   Questions and answers
       0.  What is the key used to distinguish the file descriptors registered in an interest list?

           The key is the combination of the file descriptor number and the open file description
            (also known as an "open file handle", the kernel's internal representation of an open file).

    0.  用于区分兴趣列表中注册的文件描述符的关键是什么?


       1.  What happens if you register the same file descriptor on an epoll instance twice?

           You will probably get EEXIST.  However, it is possible to add a duplicate
            (dup(2), dup2(2), fcntl(2) F_DUPFD) file descriptor to the same epoll instance.  
            This can be a useful technique for filtering events, if the duplicate file descriptors are registered with different events masks.

    1.  如果在epoll实例上注册相同的文件描述符两次会发生什么?


       2.  Can two epoll instances wait for the same file descriptor?  If so, are events reported to both epoll file descriptors?

           Yes, and events would be reported to both.  However, careful programming may be needed to do this correctly.
    2.  两个epoll实例可以等待相同的文件描述符吗?  如果是,事件是否报告给两个epoll文件描述符?

           是的,事件将向双方报告。  但是,可能需要仔细编程才能正确执行此操作。

       3.  Is the epoll file descriptor itself poll/epoll/selectable?

           Yes.  If an epoll file descriptor has events waiting, then it will indicate as being readable.
    3.  epoll文件描述符本身是否为poll/epoll/selectable?

           是的  如果epoll文件描述符有等待的事件,则它将指示为可读。

       4.  What happens if one attempts to put an epoll file descriptor into its own file descriptor set?

           The epoll_ctl(2) call fails (EINVAL).  However, you can add an epoll file descriptor inside another epoll file descriptor set.
    4.  如果试图将epoll文件描述符放入自己的文件描述符集中会发生什么?

           epoll_ctl(2)调用失败(EINVAL)。  但是,您可以在另一个epoll文件描述符集中添加epoll文件描述符。

       5.  Can I send an epoll file descriptor over a UNIX domain socket to another process?

           Yes, but it does not make sense to do this, since the receiving process would not have copies of the file descriptors in the interest list.
    5.  我可以通过UNIX域套接字向另一个进程发送epoll文件描述符吗?


       6.  Will closing a file descriptor cause it to be removed from all epoll interest lists?

           Yes, but be aware of the following point.  A file descriptor is a reference to an open file description (see open(2)). 
            Whenever a file descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2),
             a new file descriptor referring to the same open file description is created. 
              An open file description continues to exist until all file descriptors referring to it have been closed.

           A file descriptor is removed from an interest list only after all the file descriptors referring to the underlying open file description have been closed.  
           This means that even after a file descriptor that is part of an interest list has been closed, 
           events may be reported for that file descriptor if other file descriptors referring to the same underlying file description remain open.  
           To prevent this happening, the file descriptor must be explicitly removed from the interest list (using epoll_ctl(2) EPOLL_CTL_DEL) before it is duplicated.  
           Alternatively, the application must ensure that all file descriptors are closed 
           (which may be difficult if file descriptors were duplicated behind the scenes by library functions that used dup(2) or fork(2)).
    6.  关闭一个文件描述符会导致它从所有epoll兴趣列表中删除吗?


       7.  If more than one event occurs between epoll_wait(2) calls, are they combined or reported separately?

           They will be combined.
    7.  如果在epoll_wait(2)调用之间发生了多个事件,它们是合并报告还是单独报告?


       8.  Does an operation on a file descriptor affect the already collected but not yet reported events?

           You can do two operations on an existing file descriptor.  Remove would be meaningless for this case.  Modify will reread available I/O.
    8.  对文件描述符的操作是否会影响已收集但尚未报告的事件?


       9.  Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior)?

           Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation.  
           You must consider it ready until the next (nonblocking) read/write yields EAGAIN. 
            When and how you will use the file descriptor is entirely up to you.

           For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), 
           the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.

           For stream-oriented files (e.g., pipe, FIFO, stream socket), 
           the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor.  
           For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes,
            you can be sure of having exhausted the read I/O space for the file descriptor.  The same is true when writing using write(2).  
            (Avoid this latter technique if you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)

    9.  当使用EPOLLET标志(边缘触发行为)时,我需要连续读/写文件描述符直到EAGAIN吗?


           对于面向分组/令牌的文件(例如, 数据报套接字,规范模式下的终端),             
           检测 读/写 I/O空间结束的唯一方法是继续 读/写 直到EAGAIN。

           对于面向流的文件(例如, 管道、FIFO、流套接字),             
           读/写 I/O空间 耗尽的情况也可以通过检查从目标文件描述符 读取/写入 目标文件描述符的数据量来检测。              

   Possible pitfalls and ways to avoid them 
       o Starvation (edge-triggered)

       If there is a large amount of I/O space, it is possible that by trying to drain it the other files will not get processed causing starvation. 
        (This problem is not specific to epoll.)

       The solution is to maintain a ready list and mark the file descriptor as ready in its associated data structure, 
       thereby allowing the application to remember which files need to be processed but still round robin amongst all the ready files.  
       This also supports ignoring subsequent events you receive for file descriptors that are already ready.




       o If using an event cache...

       If you use an event cache or store all the file descriptors returned from epoll_wait(2), 
       then make sure to provide a way to mark its closure dynamically (i.e., caused by a previous event's processing).  
       Suppose you receive 100 events from epoll_wait(2), and in event #47 a condition causes event #13 to be closed.  
       If you remove the structure and close(2) the file descriptor for event #13, 
       then your event cache might still say there are events waiting for that file descriptor causing confusion.

       One solution for this is to call, 
       during the processing of event 47, epoll_ctl(EPOLL_CTL_DEL) to delete file descriptor 13 and close(2), 
       then mark its associated data structure as removed and link it to a cleanup list.  
       If you find another event for file descriptor 13 in your batch processing, 
       you will discover the file descriptor had been previously removed and there will be no confusion.




       The epoll API was introduced in Linux kernel 2.5.44.  Support was added to glibc in version 2.3.2.

       The epoll API is Linux-specific.  Some other systems provide similar mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.

       The set of file descriptors that is being monitored via an epoll file descriptor 
       can be viewed via the entry for the epoll file descriptor in the process's /proc/[pid]/fdinfo directory.  
       See proc(5) for further details.

       The kcmp(2) KCMP_EPOLL_TFD operation can be used to test whether a file descriptor is present in an epoll instance.

       epoll_create(2), epoll_create1(2), epoll_ctl(2), epoll_wait(2), poll(2), select(2)

 2.epoll_ctl - control interface for an epoll file descriptor

/*epoll_ctl -epoll文件描述符的控制接口*/

       epoll_ctl - control interface for an epoll file descriptor
       /*epoll_ctl -epoll文件描述符的控制接口*/

       #include <sys/epoll.h>

       int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

       This  system  call is used to add, modify, or remove entries in the interest list of the epoll(7) instance referred to by the file descriptor epfd.  
       It requests that the operation op be per‐formed for the target file descriptor, fd.

       Valid values for the op argument are: 

              Add fd to the interest list and associate the settings specified in event with the internal file linked to fd.

              Change the settings associated with fd in the interest list to the new settings specified in event.

              Remove (deregister) the target file descriptor fd from the interest list.  The event argument is ignored and can be NULL (but see BUGS below).
              /*从兴趣列表中删除(注销)目标文件描述符fd。  事件参数被忽略,可以为NULL(但请参阅下面的BUGS)。*/

       The event argument describes the object linked to the file descriptor fd.  The struct epoll_event is defined as:
       /*事件参数描述链接到文件描述符fd的对象。 结构epoll_event定义为:*/

           typedef union epoll_data {
               void        *ptr;
               int          fd;
               uint32_t     u32;
               uint64_t     u64;
           } epoll_data_t;

           struct epoll_event {
               uint32_t     events;      /* Epoll events */
               epoll_data_t data;        /* User data variable */

       The events member is a bit mask composed by ORing together zero or more of the following available event types:

              The associated file is available for read(2) operations.

              The associated file is available for write(2) operations.

       EPOLLRDHUP (since Linux 2.6.17)
              Stream socket peer closed connection, or shut down writing half of connection.
                (This flag is especially useful for writing simple code to detect peer shutdown when using  Edge Triggered monitoring.)

              There is an exceptional condition on the file descriptor.  See the discussion of POLLPRI in poll(2).
              /*文件描述符上存在异常情况。 参见poll(2)中对POLLPRI的讨论。*/

              Error  condition happened on the associated file descriptor.  
              This event is also reported for the write end of a pipe when the read end has been closed.  
              epoll_wait(2) will always re‐port for this event; it is not necessary to set it in events.

              Hang up happened on the associated file descriptor.  epoll_wait(2) will always wait for this event; it is not necessary to set it in events.

              Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel.  
              Subsequent reads  from  the  channel will return 0 (end of file) only after all outstanding data in the channel has been consumed.


              Sets  the  Edge  Triggered  behavior for the associated file descriptor.  
              The default behavior for epoll is Level Triggered.  
              See epoll(7) for more detailed information about Edge and Level Triggered event distribution architectures.
              epoll的默认行为是Level Triggered。                 

       EPOLLONESHOT (since Linux 2.6.2)
              Sets the one-shot behavior for the associated file descriptor.  
              This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is  internally  disabled
              and no other events will be reported by the epoll interface.  
              The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with a new event mask.

       EPOLLWAKEUP (since Linux 3.5)
              If  EPOLLONESHOT and EPOLLET are clear and the process has the CAP_BLOCK_SUSPEND capability, 
              ensure that the system does not enter "suspend" or "hibernate" while this event is pending or being processed.  
              The event is considered as being "processed" from the time when it is returned by a call to epoll_wait(2) until  the  next  call  to  
              epoll_wait(2)  on  the  same  epoll(7)  file descriptor, the closure of that file descriptor,
              the removal of the event file descriptor with EPOLL_CTL_DEL, or the clearing of EPOLLWAKEUP for the event file descriptor with EPOLL_CTL_MOD.  
              See also BUGS.


       EPOLLEXCLUSIVE (since Linux 4.5)  //独占唤醒模式 
              Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd.  
              When a wakeup event occurs and multiple epoll  file  descriptors are  attached  to  the  same  target file using EPOLLEXCLUSIVE, 
              one or more of the epoll file descriptors will receive an event with epoll_wait(2).  
              The default in this scenario (when EPOLLEXCLUSIVE is not set) is for all epoll file descriptors to receive an event.  
              EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios.

              If the same file descriptor is in multiple epoll instances, some with the EPOLLEXCLUSIVE flag, 
              and others without, 
              then events will be provided to all epoll  instances  that  did  not specify EPOLLEXCLUSIVE, 
              and at least one of the epoll instances that did specify EPOLLEXCLUSIVE.

              The  following  values  may be specified in conjunction with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and EPOLLET.  
              EPOLLHUP and EPOLLERR can also be specified, 
              but this is not required: as usual, these events are always reported if they occur,
              regardless of whether they are specified in events.  
              Attempts to specify other values in  events  yield  the  error EINVAL.

              EPOLLEXCLUSIVE  may be used only in an EPOLL_CTL_ADD operation; 
              attempts to employ it with EPOLL_CTL_MOD yield an error.  
              If EPOLLEXCLUSIVE has been set using epoll_ctl(), then a sub‐sequent EPOLL_CTL_MOD on the same epfd, fd pair yields an error.  
              A call to epoll_ctl() that specifies EPOLLEXCLUSIVE in events and specifies the target file descriptor fd as an epoll
              instance will likewise fail.  
              The error in all of these cases is EINVAL.

       When successful, epoll_ctl() returns zero.  When an error occurs, epoll_ctl() returns -1 and errno is set appropriately.
       /*如果成功,epoll_ctl()返回零。  当发生错误时,epoll_ctl()返回-1,并适当地设置errno。*/

       EBADF  epfd or fd is not a valid file descriptor. //epfd 或者 fd 不是一个可用的文件描述符

       EEXIST op was EPOLL_CTL_ADD, and the supplied file descriptor fd is already registered with this epoll instance.

       EINVAL epfd is not an epoll file descriptor, or fd is the same as epfd, or the requested operation op is not supported by this interface.

       EINVAL An invalid event type was specified along with EPOLLEXCLUSIVE in events.
       // 事件中的伴随着EPOLLEXCLUSIVE指定了无效的事件类型。

       EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.

       EINVAL op was EPOLL_CTL_MOD and the EPOLLEXCLUSIVE flag has previously been applied to this epfd, fd pair.
       // op为EPOLL_CTL_MOD,EPOLLEXCLUSIVE标志先前已应用于此epfd,fd配对。

       EINVAL EPOLLEXCLUSIVE was specified in event and fd refers to an epoll instance.
       // 在event中指定了EPOLLEXCLUSIVE,fd引用epoll实例。

       ELOOP  fd refers to an epoll instance and this EPOLL_CTL_ADD operation would result in a circular loop of epoll instances monitoring one another.
       // fd引用epoll实例,此EPOLL_CTL_ADD操作将导致epoll实例相互监视的循环。

       ENOENT op was EPOLL_CTL_MOD or EPOLL_CTL_DEL, and fd is not registered with this epoll instance.
       // ENOENT操作为EPOLL_CTL_MOD或EPOLL_CTL_DEL,并且fd未注册到此epoll实例。

       ENOMEM There was insufficient memory to handle the requested op control operation.
       // 没有足够的内存去处理op控制操作请求 

       ENOSPC The limit imposed by /proc/sys/fs/epoll/max_user_watches was encountered while trying to register (EPOLL_CTL_ADD) a new file descriptor on an epoll instance.  
       See epoll(7) for further  details.
       // ENOSPC尝试在epoll实例上注册(EPOLL_CTL_ADD)新文件描述符时遇到/proc/sys/fs/epoll/max_user_watches所施加的限制。          

       EPERM  The target file fd does not support epoll.  This error can occur if fd refers to, for example, a regular file or a directory.
       // EPERM目标文件fd不支持epoll。  例如,如果fd引用常规文件或目录,则可能发生此错误。

       epoll_ctl() was added to the kernel in version 2.6.

       epoll_ctl() is Linux-specific.  Library support is provided in glibc starting with version 2.3.2.

       The epoll interface supports all file descriptors that support poll(2).
       // epoll接口支持所有支持poll(2)的文件描述符。

       In kernel versions before 2.6.9, the EPOLL_CTL_DEL operation required a non-null pointer in event, even though this argument is ignored.  
       Since Linux 2.6.9, event can be  specified  as  NULL when using EPOLL_CTL_DEL. 
       Applications that need to be portable to kernels before 2.6.9 should specify a non-null pointer in event.
       从Linux 2.6.9开始,使用EPOLL_CTL_DEL时可以将事件指定为NULL。         

       If  EPOLLWAKEUP  is  specified in flags, but the caller does not have the CAP_BLOCK_SUSPEND capability, 
       then the EPOLLWAKEUP flag is silently ignored. 

       This unfortunate behavior is necessary because no validity checks were performed on the flags argument in the original implementation, 
       and the addition of the EPOLLWAKEUP with a check that caused the call to fail  if  the  caller
       did not have the CAP_BLOCK_SUSPEND capability caused a breakage in at least one existing user-space application that
        happened to randomly (and uselessly) specify this bit.  
       A robust applica‐tion should therefore double check that it has the CAP_BLOCK_SUSPEND capability if attempting to use the EPOLLWAKEUP flag.

       epoll_create(2), epoll_wait(2), poll(2), epoll(7)

       epoll_create, epoll_create1 - open an epoll file descriptor

       #include <sys/epoll.h>

       int epoll_create(int size);
       int epoll_create1(int flags);

       epoll_create() creates a new epoll(7) instance.  Since Linux 2.6.8, the size argument is ignored, but must be greater than zero; see NOTES below.
        /*epoll_create()创建一个新的epoll(7)实例。  从Linux 2.6.8开始,size参数被忽略,但必须大于零;请参阅下面的注释。*/

       epoll_create() returns a file descriptor referring to the new epoll instance.  
       This file descriptor is used for all the subsequent calls to the epoll interface.  
       When no longer required, the file descriptor returned by epoll_create() should be closed by using close(2).  
       When all file descriptors referring to an epoll instance have been closed, the kernel destroys the instance and releases the associated resources for reuse.

       If flags is 0, then, other than the fact that the obsolete size argument is dropped, epoll_create1() is the same as epoll_create(). 
        The following value can be included in flags to obtain different behavior:

              Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.  
              See the description of the O_CLOEXEC flag in open(2) for reasons why this may be useful.


       O_CLOEXEC (since Linux 2.6.23)
              Enable  the  close-on-exec  flag for the new file descriptor. 
               Specifying this flag permits a program to avoid additional fcntl(2) F_SETFD operations to set the FD_CLOEXEC flag.

              Note that the use of this flag is essential in some multithreaded programs, because using a separate fcntl(2) F_SETFD operation to set the
              FD_CLOEXEC  flag  does not suffice to avoid race conditions where one thread opens a file descriptor and attempts to set its close-on-exec
              flag using fcntl(2) at the same time as another thread does a fork(2) plus execve(2).  
              Depending on the order of execution, 
              the  race  may lead  to  the  file  descriptor  returned  by  
              open() being unintentionally leaked to the program executed by the child process created by fork(2). 
               (This kind of race is in principle possible for any system call that creates a file descriptor whose close-on-exec  flag  should
              be set, and various other Linux system calls provide an equivalent of the O_CLOEXEC flag to deal with this problem.)
       O_CLOEXEC(自Linux 2.6.23起)               

       因为使用单独的fcntl(2)F_SETFD操作来设置 FD_CLOEXEC标志
       一个线程打开文件描述符,并试图使用fcntl(2)设置其close-on-exec 标记时,



       On success, these system calls return a nonnegative file descriptor.  On error, -1 is returned, and errno is set to indicate the error.
       EINVAL size is not positive.  /*EINVAL大小不是正数。*/

       EINVAL (epoll_create1()) Invalid value specified in flags. /*EINVAL(epoll_create1())标志中指定的值无效。*/

       EMFILE The per-user limit on the number of epoll instances imposed by /proc/sys/fs/epoll/max_user_instances was encountered.  See epoll(7) for further details.
       /*EMFILE遇到了 对端用户受到了epoll实例数 /proc/sys/fs/epoll/max_user_instances的限制。 请参阅epoll(7)了解更多细节。*/
       /* arm@arm:/proc/sys/fs/epoll$ cat max_user_watches 

       EMFILE The per-process limit on the number of open file descriptors has been reached.

       ENFILE The system-wide limit on the total number of open files has been reached.

       ENOMEM There was insufficient memory to create the kernel object.

       epoll_create() was added to the kernel in version 2.6.  Library support is provided in glibc starting with version 2.3.2.

       epoll_create1() was added to the kernel in version 2.6.27.  Library support is provided in glibc starting with version 2.9.

       epoll_create() is Linux-specific.

       In the initial epoll_create() implementation, 
       the size argument informed the kernel of the number of file descriptors that the caller expected to add to the epoll instance.  
       The kernel used this information as a hint for the amount of space to initially allocate in internal data structures describing events. 
        (If necessary, the kernel would allocate more space if the caller's usage exceeded the hint given in size.)  
        Nowadays, this hint is no longer required (the kernel dynamically sizes the required data structures without needing the hint), 
        but size must still be greater than zero, in order to ensure backward compatibility when new epoll applications are run on older kernels.

       close(2), epoll_ctl(2), epoll_wait(2), epoll(7)

 4. epoll_wait-man

       epoll_wait,  epoll_pwait  -  wait for an I/O event on an epoll file descriptor

       #include <sys/epoll.h>

       int epoll_wait(int epfd, struct epoll_event *events,
                      int maxevents, int timeout);
       int epoll_pwait(int epfd, struct epoll_event *events,
                      int maxevents, int timeout,
                      const sigset_t *sigmask);

       The epoll_wait() system call waits for events on the epoll(7)  instance
       referred to by the file descriptor epfd.  The memory area pointed to by
       events will contain the events that will be available for  the  caller.
       Up  to  maxevents are returned by epoll_wait().  The maxevents argument
       must be greater than zero.

       The  timeout  argument  specifies  the  number  of  milliseconds   that
       epoll_wait()  will block.  Time is measured against the CLOCK_MONOTONIC
       clock.  The call will block until either:

       *  a file descriptor delivers an event;

       *  the call is interrupted by a signal handler; or

       *  the timeout expires.

       *  文件描述符传递某个事件;

       *  调用被某个信号处理程序中断;或

       *  超时。

       Note that the timeout interval will be rounded up to the  system  clock granularity, 
       and kernel scheduling delays mean that the blocking interval may overrun by a small amount.  
       Specifying a timeout of  -1  causes epoll_wait() to block indefinitely, 
       while specifying a timeout equal to zero cause epoll_wait() to return immediately, 
       even if  no  events  are available.

       The struct epoll_event is defined as:

           typedef union epoll_data {
               void    *ptr;
               int      fd;
               uint32_t u32;
               uint64_t u64;
           } epoll_data_t;

           struct epoll_event {
               uint32_t     events;    /* Epoll events */
               epoll_data_t data;      /* User data variable */

       The data field of each returned structure contains the same data as was specified in the  most  recent  call  to  epoll_ctl(2)  
       (EPOLL_CTL_ADD,EPOLL_CTL_MOD) for the corresponding open file description.  
       The events field contains the returned event bit field.

       The relationship between epoll_wait() and epoll_pwait() is analogous to
       the  relationship  between  select(2)  and pselect(2): like pselect(2),
       epoll_pwait() allows an application to safely wait until either a  file
       descriptor becomes ready or until a signal is caught.

       The following epoll_pwait() call:

           ready = epoll_pwait(epfd, &events, maxevents, timeout, &sigmask);

       is equivalent to atomically executing the following calls:

           sigset_t origmask;

           pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
           ready = epoll_wait(epfd, &events, maxevents, timeout);
           // examine and change mask of blocked signals 检查和改变阻塞信号的mask掩码 
           pthread_sigmask(SIG_SETMASK, &origmask, NULL);

       The   sigmask  argument  may  be  specified  as  NULL,  
       in  which  case epoll_pwait() is equivalent to epoll_wait().

       When successful, epoll_wait() returns the number  of  file  descriptors ready for the requested I/O,
        or zero if no file descriptor became ready during the requested  timeout  milliseconds.   
       When  an  error  occurs, epoll_wait() returns -1 and errno is set appropriately.

       EBADF  epfd is not a valid file descriptor.
       // EBADF epfd不是有效的文件描述符。

       EFAULT The  memory  area  pointed  to  by events is not accessible with write permissions.
       // 事件所指向的内存区域无法使用写权限访问。

       EINTR  The call was interrupted by a signal handler before  either  (1) any of the requested events occurred or (2) the timeout expired;
              see signal(7).
       // 调用被信号处理程序中断:(1)任何请求的事件发生之前或(2)超时过期之前; 参见信号(7)。       

       EINVAL epfd is not an epoll file descriptor, or maxevents is less  than or equal to zero.
       // epfd不是epoll文件描述符,或者maxevents小于或等于零。

       epoll_wait()  was  added to the kernel in version 2.6.  Library support
       is provided in glibc starting with version 2.3.2.

       epoll_pwait() was added to Linux in kernel 2.6.19.  Library support  is
       provided in glibc starting with version 2.6.

       epoll_wait() is Linux-specific.

       While  one  thread is blocked in a call to epoll_wait(), it is possible for another thread to add a file descriptor to  the  waited-upon  epoll instance.   
       If the new file descriptor becomes ready, it will cause the epoll_wait() call to unblock.

       If more than maxevents file descriptors are ready when epoll_wait()  is called, 
       then successive epoll_wait() calls will round robin through the set of ready file descriptors.  
       This behavior  helps  avoid  starvation scenarios,  
       where  a  process  fails to notice that additional file descriptors are ready because it focuses on a  set  of  file  descriptors that are already known to be ready.
       如果调用epoll_wait()时有超过maxevents的文件描述符准备就绪, 则连续的epoll_wait()调用将循环通过就绪文件描述符的集合。          

       Note  that  it  is  possible  to call epoll_wait() on an epoll instance whose interest list is currently empty
        (or whose interest list  becomes empty  because file descriptors are closed or removed from the interest in another thread).  
       The call will block until some file descriptor  is later  added to the interest list (in another thread) and that file descriptor becomes ready.

       In kernels before 2.6.37, a timeout  value  larger  than  approximately LONG_MAX  /  HZ  milliseconds is treated as -1 (i.e., infinity).  
       Thus,for example, on a system where sizeof(long) is  4  and  the  kernel  HZ value  is 1000, 
       this means that timeouts greater than 35.79 minutes are treated as infinity.
       在2.6.37之前的内核中,大于大约LONG_MAX / HZ毫秒的超时值被视为-1(即, 无穷大)。          

   C library/kernel differences
       The raw epoll_pwait() system call has a sixth argument, size_t  sigset‐size,  which  specifies the size in bytes of the sigmask argument.  
       The glibc epoll_pwait() wrapper function specifies this argument as a fixed value (equal to sizeof(sigset_t)).
       原始epoll_pwait()系统调用有第六个参数size_t sigset‐size,它指定sigmask参数的字节大小。          
       glibc epoll_pwait()包装器函数将此参数指定为固定值(等于sizeof(sigset_t))。

       epoll_create(2), epoll_ctl(2), epoll(7)

Linux                             2019-03-06                     EPOLL_WAIT(2)


