使用RabbitMQ过程中遇到的一个问题(队列为空,但内存暴涨)以及与开发者的邮件沟通
我们在使用RabbitMQ的过程中遇到了一个很麻烦的问题。(RabbitMQ version 2.8.1, Erlang version 5.7.4)
我们的使用场景如下:
1.cluster模式(假设有3台机器组成的集群);
2.一个exchange,后边绑定多个队列;
3.多个producer(producer数目无法减少,和上游流程的处理能力相关);
4.producer可能向多个队列里发消息,消息的到来不均匀,导致可能存在多个producer同时向一个队列持续发送消息的状况;
5.消息大小在100KB上下;
6.客户端随机连接一台server;
7.为了增加发送消息的速率,一个连接对应10个channel;
8.消费者能力足够;
发现的问题,如果由多个producer(假如有50个)同时经由一台中转机向同一个队列里发送消息(中转机是指队列不在连接所在的机器上),内存出现暴涨,即使队列为空,也是如此,如下图所示:
显示内存的消耗主要来自于Erlang的binaries类型。
自己使用的rabbitmq-c客户端(参见RabbitMQ用户指南(RabbitMQ-C)),为了避免是客户端的原因造成的,还测了python的客户端pika 0.9.5,现象依旧。
经测试,避免内存暴涨有如下两个措施:
1.减少producer数(比如从50个减到5个,并不会明显降低发送消息的速率);
2.在发送消息选择连接时,只使用队列所在的连接发送消息(意味着客户端需要知道队列所在的server地址);
伴随而来的问题:
1.RabbitMQ自带的HA机制不可用;
2.Topic转发类型不可用;(因为这两个都涉及到消息的转发);
当然,使用场景比较极端,大家在使用时,如果消息数不多或是对速率没有很高的要求,应该不会遇到这个“大坑”;
附与开发者的邮件沟通(Liu Hao is me and Matthias is a RabbitMQ developper):
1.Hi, all,
I am a RabbitMQ user in China. When we use in cluster pattern, we have found a problem. The problem is that if we use many producers (such as fifty) to send messages to the same queue ceaselessly and the connection to the RabbitMQ cluster is not the server which the queue is , the memory increases very quickly. If we use less producers (such as five) or we use the connection to the server which the queue is, the memory is the normal. The consumers have enough power to consume the message. The version of RabbitMQ is 2.8.1.
Has anyone met the same problem?
The attachment is the monitor interface.
Thank all of you very much.
--
-------------------------------------------------------------
刘浩
网络与交换技术国家重点实验室
北京邮电大学
电子邮件: liuhaobupt@gmail.com
-------------------------------------------------------------
Liu Hao
State Key Laboratory of Networking & Switching Technology
Beijing University of Posts & Telecommunications
Email&Gtalk: liuhaobupt@gmail.com
2.Matthias Radestock
8月13日 (1 天前)
发送至 Discussions, 我
On 13/08/12 02:53, Liu Hao wrote:
I am a RabbitMQ user in China. When we use in cluster pattern,
we have found a problem. The problem is that if we use many producers
(such as fifty) to send messages to the same queue ceaselessly and the
connection to the RabbitMQ cluster is not the server which the queue is
, the memory increases very quickly. If we use less producers (such as
five) or we use the connection to the server which the queue is, the
memory is the normal. The consumers have enough power to consume the
message. The version of RabbitMQ is 2.8.1.
Please post the output of 'rabbitmqctl report' for all three machines in your cluster at the time the memory threshold has been exceeded on the queue node.
Matthias.
3.Liu Hao
10:40 (7 小时前)
发送至 Matthias, Discussions
Hi, Matthias,
The destination queue's name is "mqclient_test_queue_1" and it is on the cnbj-cuc-tst01-crl0015 node.
The rabbitmqctl report output of cnbj-cuc-tst01-crl0015 node is "cnbj-cuc-tst01-crl0015" attachment, the other two are similar.
The picture is the monitor interface.
Thank you very much.
4.Matthias Radestock
12:12 (5 小时前)
发送至 我, Discussions
On 14/08/12 03:40, Liu Hao wrote:
> The destination queue's name is "mqclient_test_queue_1" and it is
> on the cnbj-cuc-tst01-crl0015 node.
>
> The rabbitmqctl report output of cnbj-cuc-tst01-crl0015 node is
> "cnbj-cuc-tst01-crl0015" attachment, the other two are similar.
Ah, I completely forgot that 'report' reports on all nodes. Sorry.
There are about 1100 connections and 8400 channels. Are those the
numbers you expect to see?
How big are the messages?
Please run the following on crl0015 when it is using lots of memory:
rabbitmqctl eval 'begin {L, Pid} =
lists:last(lists:sort([{length(element(2, process_info(P, binary))), P}
|| P <- processes()])), {L, Pid, process_info(Pid)} end.'
(all on one line) and post the output.
Regards,
Matthias.
5.Liu Hao
14:14 (3 小时前)
发送至 Matthias, Discussions
Hi, Matthias,
The connections and channels are actually too much. I decrease the connections and channels. Now, I have 40 consumer connections (one connection with one channel), and 50 producer connections (one connection with 10 channels). The memory is the same , acquires a lot.
But I find an interesting fact that If I use 50 producer connections (one connection with only one channel) , the memory will be under 2G, but the most connections are flowed and the publish rate is too slow.
This is just a test demo, and one message is 10KB.
The command report is so big(35M), and I give the beginning and the end of the output to you as the attachment.
Thank you very much.
6.Matthias Radestock
16:38 (1 小时前)
发送至 我, Discussions
On 14/08/12 07:14, Liu Hao wrote:
The connections and channels are actually too much. I decrease the
connections and channels. Now, I have 40 consumer connections (one
connection with one channel), and 50 producer connections (one
connection with 10 channels). The memory is the same , acquires a lot.
But I find an interesting fact that If I use 50 producer
connections (one connection with only one channel) , the memory will be
under 2G, but the most connections are flowed and the publish rate is
too slow.
This is just a test demo, and one message is 10KB.
The command report is so big(35M), and I give the beginning and
the end of the output to you as the attachment.
I think you are simply pushing rabbit beyond the limit of its capability. Internal flow control happens on a per-process-link basis, so when you increase the number of publishing channels that corresponds to a linear increase in the amount of internal buffer space that is potentially required. To the point where all memory is taken up by messages sitting in these buffers.
Publishing across nodes carries an extra cost, so the buffers will fill up at lower publishing rates.
If with 50 producer connections x 1 channel you see most connections flowed, then that is an indications that rabbit is already operating at capacity but is still able to keep overall memory use under control. Adding more producer connections/channels will not increase the sustainable sending rate but will degrade rabbit's ability to control memory use.
Btw, I suggest you upgrade to the latest rabbit version - the flow control code has changed somewhat and there have been some performance improvements.
Regards,
Matthias.
7.Matthias Radestock
16:46 (1 小时前)
发送至 Discussions, 我
On 14/08/12 09:38, Matthias Radestock wrote:
Btw, I suggest you upgrade to the latest rabbit version - the flow
control code has changed somewhat and there have been some performance
improvements.
You may also want to enable hipe compilation - see http://www.rabbitmq.com/configure.html.
I doubt any of this will make much difference though since the bottleneck in your system are the queues, and hipe compilation and most of the performance improvements have little impact on queue performance.