java 服务cpu 100%解决

online环境发现有个task的微服务白天使用高峰期一直cpu非常高98%-99%

用  如下命令在cpu非常高的时候生成了多个dump文件

jstack <PID> > thread_dump1.txt

jstack <PID> > thread_dump2.txt

jstack <PID> > thread_dump3.txt

jstack <PID> > thread_dump4.txt

jstack <PID> > thread_dump5.txt

jstack <PID> > thread_dump6.txt

 

 

在thread_dump文件中发现很多如下的BLOCKED 的地方大概有100多个

"Thread-509207" #544421 prio=5 os_prio=0 cpu=0.10ms elapsed=712.35s tid=0x00007f2fc85419a0 nid=0xf0335 waiting for monitor entry [0x00007f2f2c1c0000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)

"Thread-509214" #544428 prio=5 os_prio=0 cpu=0.12ms elapsed=706.60s tid=0x00007f2fc853e190 nid=0xf033d waiting for monitor entry [0x00007f2f2eae9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)

"Thread-509221" #544435 prio=5 os_prio=0 cpu=0.11ms elapsed=704.43s tid=0x00007f2fc85385f0 nid=0xf0344 waiting for monitor entry [0x00007f2f2d6d5000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)

"Thread-509230" #544445 prio=5 os_prio=0 cpu=0.10ms elapsed=701.86s tid=0x00007f2fe4416810 nid=0xf034e waiting for monitor entry [0x00007f2f2acab000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)

"Thread-509252" #544467 prio=5 os_prio=0 cpu=0.10ms elapsed=678.16s tid=0x00007f2fc85396e0 nid=0xf0364 waiting for monitor entry [0x00007f2f2c7c6000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.hp.nova.service.impl.DataFromRedis.DataFromRedisImpl$ThreadTask3ReportInstanceFromRedis.run(DataFromRedisImpl.java:1239)
- waiting to lock <0x0000000550e06cb0> (a java.lang.String)

 

 

然后发现DataFromRedisImpl.java:1239

这里出现了锁竞争

 

private class ThreadTask3ReportInstanceFromRedis extends Thread {
private String taskcode;
private Integer taskversion;
private String rediskey;
public ThreadTask3ReportInstanceFromRedis(String taskcode ,Integer taskversion)
{
this.taskcode = taskcode;
this.taskversion=taskversion;
this.rediskey=reportinstanceredispre+taskcode+"."+taskversion;
}
public void run() {
synchronized(reportinstanceredispre){
try {
getTask3ReportInstanceFromRedis(taskcode, taskversion);
} catch (Exception e) {
// TODO Auto-generated catch block
logger.error("error:{}", ExceptionUtil.stack(e));
}
}
}
}

 

 

 

然后改成如下方式,cpu明显下降很多,root cause是之前的代码可能是误写锁住了一个常量字符串,相当于整个method都锁了,改成下面的方式根据taskcode 和version减少了锁的粒度

synchronized(reportinstanceredispre)改为 synchronized(lock)






 

private static final Map<String, Object> locks = new ConcurrentHashMap<>();

private class ThreadTask3ReportInstanceFromRedis extends Thread {
private String taskcode;
private Integer taskversion;
private String rediskey;
public ThreadTask3ReportInstanceFromRedis(String taskcode ,Integer taskversion)
{
this.taskcode = taskcode;
this.taskversion=taskversion;
this.rediskey=reportinstanceredispre+taskcode+"."+taskversion;
}
public void run() {

Object lock = locks.computeIfAbsent(rediskey, key -> new Object());

synchronized(lock){
try {
logger.info("rediskey======{}",rediskey);
getTask3ReportInstanceFromRedis(taskcode, taskversion);
} catch (Exception e) {
// TODO Auto-generated catch block
logger.error("error:{}", ExceptionUtil.stack(e));
}
finally {
// 清理锁,避免内存泄漏
locks.remove(rediskey); // Clean up lock if no longer needed
}
}
}
}
posted @ 2025-01-08 11:20  kuroniko  阅读(15)  评论(0编辑  收藏  举报