TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@75bc9f01[Running, pool size

我们在做性能测试的时候，按照梯度来施加压力，每个阶段维持10到20分钟，直到服务器资源耗尽，各个模块的服务都没有出现问题；然后就开始做72小时的稳定性测试，仅仅过了一晚上，差不多十个小时左右，就发现有服务OOM 无法正常进行数据处理了。

TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@75bc9f01[Running, pool size

代码中线程池的配置如下：

    @Value("${threadPool.maximumPoolSize}")
	private int maxThraedCount = 20;
	@Value("${threadPool.queueSize}")
	private int maxQueueSize = 2000;
	@Value("${threadPool.corePoolSize}")
	private int corePoolSize = 5;
	@Value("${threadPool.keepAliveTime}")
	private int keepAliveTime = 60;

	@Value("${alarmSending.timeout}")
	private int alarmSendTimeOut_ms = 5 * 1000;

	/**
	 * The thread pool used to send alarms
	 */
	private ThreadPoolTaskExecutor threadPool;
 
    @PostConstruct
	private void init(){

		threadPool = new ThreadPoolTaskExecutor();
		threadPool.setCorePoolSize(corePoolSize);
		threadPool.setMaxPoolSize(maxThraedCount);
		threadPool.setThreadGroupName("EventManager");
		threadPool.setThreadNamePrefix("alarmSend");
		threadPool.setQueueCapacity(maxQueueSize);
		threadPool.setKeepAliveSeconds(keepAliveTime);
		threadPool.setRejectedExecutionHandler((r, executor) -> {
			throw new RejectedExecutionException();
		});
		threadPool.initialize();
     }

ThreadPoolTaskExecutor的参数：

　　int corePoolSize:线程池维护线程的最小数量. 

　　int maximumPoolSize:线程池维护线程的最大数量. 

　　long keepAliveTime:空闲线程的存活时间. 

　　TimeUnit unit: 时间单位,现有纳秒,微秒,毫秒,秒枚举值.
 
　　BlockingQueue<Runnable> workQueue:持有等待执行的任务队列.
 
　　RejectedExecutionHandler handler: 用来拒绝一个任务的执行，有两种情况会发生这种情况。
 
　　  一是在execute方法中若addIfUnderMaximumPoolSize(command)为false，即线程池已经饱和； 
　　  二是在execute方法中, 发现runState!=RUNNING || poolSize == 0,即已经shutdown,就调用 
    ensureQueuedTaskHandled(Runnable command)，在该方法中有可能调用reject。

开始解决这个问题的时候，想得简单了，想着是不是线程池给得小了，或者是workQueue给的小了？但是扩大了maxPool和workQueue后只是多坚持了几个小时就再次OOM了，冷静下来仔细想想💭；

既然当时压力测试的时候的压力都顶住了，到了稳定性测试却出问题了，可能就不是线程池本身的原因了，可能是长时间的运行有哪些代码new的对象没有被GC回收掉，或者线程没有释放，或者队列中堆积了大量无法处理的任务。。。猜测总归是猜测，还是得看看问题得根本。。。

    @Override
    public void messageArrived(String topic, MqttMessage mqttMessage) throws Exception {
        threadPool.execute(() -> handleMessage(topic, mqttMessage));
    }
## 开始在这个地方也猜测是不是有问题，但是来来回回几次都排除掉了这里的可能，觉得应该没啥问题啊。。。直到进入到handleMessage方法时，发现果然是有点‘东西’的。。。
    private void handleMessage(String topic, MqttMessage mqttMessage) {
         String message = gson.toJson(uplinkRequest);

        //TODO There is a risk of OOM and frequent GC in the case of high concurrency
        /*RestTemplate restTemplate = new RestTemplate();
        HttpHeaders headers = new HttpHeaders();
        headers.add("Connection", "Keep-Alive");
        headers.add("Authorization", http_header);
        headers.setContentType(MediaType.APPLICATION_JSON);*/

        HttpEntity request = new HttpEntity(message, headers);
    }

可以看到在我注释TODO的地方，等于在每一个handleMessage方法，也就是每一个线程里都new了个RestTemplate和HttpHeaders，按照每秒600~1000的数据压力，10几个小时，可不早就OOM了嘛，可能跟我不喜欢修改别人的代码的习惯有关，请示了领导后，修改完代码再次测试下来，问题就再没有出现了。

回想起来还是比较庆幸的，幸亏做了稳定性测试，要是只做了压力测试想着每秒2000都没事，每秒600怎么可能有问题的心态，上线后就傻了，谨记谨记！！！

posted @ 2022-01-27 18:35 zhangdaopin 阅读(186) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

zhangdaopin

TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@75bc9f01[Running, pool size

公告