buguge - Keep it simple,stupid

知识就是力量,但更重要的,是运用知识的能力why buguge?

导航

记一次mq无法正常生产消息的事故排查过程

 周一早上上班后,运营反馈,说服务费未同步到代理商系统。查看draft_server系统生产log,显示在往RabbitMQ推数据时出现异常:no route to host。

2019-07-29 01:30:00,136 INFO  [pool-13-thread-30] 201154611 (AgentProfitProducer.java:32) - 代理商服务费入队
2019-07-29 01:31:01,713 INFO  [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0-2021] 201216188 (SimpleMessageListenerContainer.java:1453) - Restarting Consumer: tags=[{}], channel=null, acknowledgeMode=AUTO local queue size=0
2019-07-29 01:31:02,150 INFO  [pool-13-thread-30] 201216625 (AgentProfitServiceImpl.java:182) - [代理商服务费推送]-异常
org.springframework.amqp.AmqpIOException: java.net.NoRouteToHostException: No route to host (Host unreachable)
        at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:309) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:547) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createConnection(ConnectionFactoryUtils.java:90) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:140) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:76) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1374) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1367) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:699) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
--
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_191]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_191]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_191]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_191]
        at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_191]
        at com.rabbitmq.client.impl.FrameHandlerFactory.create(FrameHandlerFactory.java:32) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:811) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:725) ~[amqp-client-3.6.3.jar:?]
        at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:296) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        ... 21 more
2019-07-29 01:31:02,150 INFO  [pool-13-thread-30] 201216625 (AgentProfitServiceImpl.java:184) - 代理商服务费推送结束2019-07-29T01:31:02.150+0800

 

打开vpn连接到生产环境,用本地test程序尝试往生产的mq队列推送消息,发现正常。接下来,rpc调用生产的服务费推送服务,再看生产log,mq依然有问题。不过这次是SocketTimeoutException。

2019-07-29 13:57:23,514 INFO  [pool-13-thread-38] 245997989 (AgentProfitProducer.java:32) - 代理商服务费入队
2019-07-29 13:57:47,563 WARN  [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0-2621] 246022038 (SimpleMessageListenerContainer.java:1462) - Consumer raised exception, processing can restart if the connection factory supports it
2019-07-29 13:57:47,564 INFO  [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0-2621] 246022039 (SimpleMessageListenerContainer.java:1453) - Restarting Consumer: tags=[{}], channel=null, acknowledgeMode=AUTO local queue size=0
2019-07-29 14:00:23,636 INFO  [pool-13-thread-38] 246178111 (AgentProfitServiceImpl.java:182) - [代理商服务费推送]-异常
org.springframework.amqp.AmqpIOException: java.net.SocketTimeoutException: connect timed out
        at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:309) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:547) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createConnection(ConnectionFactoryUtils.java:90) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:140) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:76) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:1374) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:1367) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.core.RabbitTemplate.send(RabbitTemplate.java:699) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
--
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_191]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_191]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_191]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_191]
        at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_191]
        at com.rabbitmq.client.impl.FrameHandlerFactory.create(FrameHandlerFactory.java:32) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:811) ~[amqp-client-3.6.3.jar:?]
        at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:725) ~[amqp-client-3.6.3.jar:?]
        at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:296) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        ... 21 more
2019-07-29 14:00:23,636 INFO  [pool-13-thread-38] 246178111 (AgentProfitServiceImpl.java:184) - 代理商服务费推送结束2019-07-29T14:00:23.636+0800
2019-07-29 14:00:47,648 WARN  [org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer#0-2622] 246202123 (SimpleMessageListenerContainer.java:1462) - Consumer raised exception, processing can restart if the connection factory supports it
org.springframework.amqp.AmqpIOException: java.net.SocketTimeoutException: connect timed out
        at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:309) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:547) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createConnection(ConnectionFactoryUtils.java:90) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:140) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:76) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:472) ~[spring-rabbit-1.6.1.RELEASE.jar:?]
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1280) [spring-rabbit-1.6.1.RELEASE.jar:?]

 

继续分析log,奇怪地发现在这两次往mq放数据之前,都有一个奇怪的Restarting Consumer。

draft_server不仅是mq生产者,还是mq消费者。登陆rabbitmq管理控制台,队列显示的竟然是... no consumers ...。那么,问题也许出现在这里。程序里配置了exchange和queue以及binding,正常情况下服务启动后会自动注册的,而鉴于人为手动在mq管理控制台上删掉consumer或手工创建队列的几率不大,所以,看来上周五上线时未正常发版。

于是,申请让运维同事重新发版,Jenkins构建完毕,服务重启,发现队列有消费者了。
然后,本地再次rpc调用服务器上的那个服务,一切正常,mq可以正常生产消息了。

posted on 2019-08-02 16:07  buguge  阅读(4188)  评论(0编辑  收藏  举报