epoll与select/poll性能,CPU/内存开销对比

Epoll 是 Linux 内核在2.5.44版本引进的一个新特性,旨在替换之前系统中老的 select, poll 等系统请求。这是 Linux I/O 系统一次质的飞跃。关于 Epoll 的详细的介绍见 Wikipedia

Epoll 在绝大多数情况下性能都远超 select 或者 poll,但是除了速度之外,三者之间的 CPU 开销,内存消耗情况又怎么样呢?

本文的内容来自 Stackoverflow 上一次精彩的问答,除了比较 poll, select 和 epoll 在性能,系统资源消耗等方面的差异之外,还指出了epoll 在对普通文件支持方面相对于 select/poll 的不足之处(当然,这三者本身都不支持普通文件,只是作者认为 epoll 对这类问题的处理机制不好,这是个见仁见智的事情,不代表作者的观点是正确的)。希望对这个Topic感兴趣的同学能够看完这篇文章,相信能使你对epoll有个更深的了解。

问:

Everything I’ve read and experienced ( Tornado based apps ) leads me to believe that ePoll is a natural replacement for Select and Poll based networking, especially with Twisted. Which makes me paranoid, its pretty rare for a better technique or methodology not to come with a price.

Reading a couple dozen comparisons between epoll and alternatives shows that epoll is clearly the champion for speed and scalability, specifically that it scales in a linear fashion which is fantastic. That said, what about processor and memory utilization, is epoll still the champ?

答:

For very small numbers of sockets (varies depending on your hardware, of course, but we’re talking about something on the order of 10 or fewer), select can beat epoll in memory usage and runtime speed. Of course, for such small numbers of sockets, both mechanisms are so fast that you don’t really care about this difference in the vast majority of cases.

One clarification, though. Both select and epoll scale linearly. A big difference, though, is that the userspace-facing APIs have complexities that are based on different things. The cost of a select call goes roughly with the value of the highest numbered file descriptor you pass it. If you select on a single fd, 100, then that’s roughly twice as expensive as selecting on a single fd, 50. Adding more fds below the highest isn’t quite free, so it’s a little more complicated than this in practice, but this is a good first approximation for most implementations.

The cost of epoll is closer to the number of file descriptors that actually have events on them. If you’re monitoring 200 file descriptors, but only 100 of them have events on them, then you’re (very roughly) only paying for those 100 active file descriptors. This is where epoll tends to offer one of its major advantages over select. If you have a thousand clients that are mostly idle, then when you use select you’re still paying for all one thousand of them. However, with epoll, it’s like you’ve only got a few – you’re only paying for the ones that are active at any given time.

All this means that epoll will lead to less CPU usage for most workloads. As far as memory usage goes, it’s a bit of a toss up. select does manage to represent all the necessary information in a highly compact way (one bit per file descriptor). And the FD_SETSIZE (typically 1024) limitation on how many file descriptors you can use with select means that you’ll never spend more than 128 bytes for each of the three fd sets you can use with select (read, write, exception). Compared to those 384 bytes max, epoll is sort of a pig. Each file descriptor is represented by a multi-byte structure. However, in absolute terms, it’s still not going to use much memory. You can represent a huge number of file descriptors in a few dozen kilobytes (roughly 20k per 1000 file descriptors, I think). And you can also throw in the fact that you have to spend all 384 of those bytes with select if you only want to monitor one file descriptor but its value happens to be 1024, wheras with epoll you’d only spend 20 bytes. Still, all these numbers are pretty small, so it doesn’t make much difference.

And there’s also that other benefit of epoll, which perhaps you’re already aware of, that it is not limited to FD_SETSIZE file descriptors. You can use it to monitor as many file descriptors as you have. And if you only have one file descriptor, but its value is greater than FD_SETSIZE, epoll works with that too, but select does not.

Randomly, I’ve also recently discovered one slight drawback to epoll as compared to select or poll. While none of these three APIs supports normal files (ie, files on a file system), select and poll present this lack of support as reporting such descriptors as always readable and always writeable. This makes them unsuitable for any meaningful kind of non-blocking filesystem I/O, a program which uses select or poll and happens to encounter a file descriptor from the filesystem will at least continue to operate (or if it fails, it won’t be because of select or poll), albeit it perhaps not with the best performance.

On the other hand, epoll will fail fast with an error (EPERM, apparently) when asked to monitor such a file descriptor. Strictly speaking, this is hardly incorrect. It’s merely signalling its lack of support in an explicit way. Normally I would applaud explicit failure conditions, but this one is undocumented (as far as I can tell) and results in a completely broken application, rather than one which merely operates with potentially degraded performance.

In practice, the only place I’ve seen this come up is when interacting with stdio. A user might redirect stdin or stdout from/to a normal file. Whereas previously stdin and stdout would have been a pipe — supported by epoll just fine — it then becomes a normal file and epoll fails loudly, breaking the application.

翻译:

对于非常少量的套接字(当然,这取决于您的硬件,但是我们讨论的是大约10个或更少的),选择可以在内存使用和运行时速度上超过epoll。当然,对于如此少量的套接字,这两种机制都是如此之快,以至于您并不真正关心在绝大多数情况下的这种差异。


不过,一个澄清。选择和epoll规模线性。然而,一个很大的不同是,面向用户的api具有基于不同事物的复杂性。选择调用的代价与传递给它的最高编号的文件描述符的值大致相同。如果你选择一个fd, 100,那么这大约是选择一个fd, 50的两倍。在最高的下面添加更多的fds并不是完全免费的,所以在实践中比这稍微复杂一点,但是这是大多数实现的良好的第一个近似。


epoll的成本更接近实际发生事件的文件描述符的数量。如果您监视了200个文件描述符,但是其中只有100个具有事件,那么您(非常粗略地)只支付了这100个活动的文件描述符。这就是epoll的主要优势之一。如果你有1000个客户,大部分都是空闲的,那么当你使用select的时候,你仍然在为他们的1000个客户付费。然而,有了epoll,就像你只得到了一些——你只支付那些在任何时候都是活跃的。


所有这一切意味着epoll将减少大多数工作负载的CPU使用量。就内存使用而言,这是一种浪费。select确实能够以高度紧凑的方式(每个文件描述符的一个比特)来表示所有必需的信息。而FD_SETSIZE(通常是1024)限制了您可以使用的文件描述符的数量,这意味着您将不会为您可以使用select(读、写、异常)的三个fd集使用超过128个字节。与384字节相比,epoll有点像猪。每个文件描述符由一个多字节结构表示。但是,从绝对意义上讲,它仍然不会占用太多内存。您可以用几十千字节来表示大量的文件描述符(我认为大约是每1000个文件描述符的20k)。你还可以把所有384个字节都花在选择上,如果你只想监视一个文件描述符,但是它的值正好是1024,而在epoll中,你只需要花费20个字节。不过,所有这些数字都很小,所以没有多大差别。


还有一个epoll的好处,也许你已经知道了,它不局限于FD_SETSIZE文件描述符。您可以使用它来监视您所拥有的许多文件描述符。如果您只有一个文件描述符,但是它的值大于FD_SETSIZE, epoll也可以使用它,但是select不会。


随机地,我最近也发现了epoll的一个小缺点与选择或投票相比。虽然这三个api都不支持普通文件(即文件系统上的文件),但是选择和调查显示缺少支持,因为报告描述符总是可读的,而且总是可写的。这使得它们不适合任何有意义的文件系统类型的非阻塞I / O,一个程序使用选择或调查,碰巧遇到一个文件描述符文件系统至少会继续运行(或如果它失败了,它不会因为选择或调查),尽管它也许不是最好的性能。


另一方面,当被要求监视这样的文件描述符时,epoll将会以一个错误(显然是EPERM)快速失败。严格地说,这几乎是不正确的。这只是表明它缺乏明确的支持。通常情况下,我将为显式的失败条件提供支持,但这一项是没有文档的(据我所知),并导致了一个完全失败的应用程序,而不是仅仅运行具有潜在降级性能的应用程序。


在实践中,我看到的唯一的地方是在与stdio交互的时候。用户可以将stdin或stdout重定向到普通文件。以前的stdin和stdout都是由epoll支持的管道,但是它变成了一个普通的文件,epoll就会大声失败,从而破坏了应用程序。

posted @ 2018-02-11 16:57  dion至君  阅读(758)  评论(0编辑  收藏  举报