liunx服务器某个Java运行服务CPU占用率过高问题排查及修复

进入服务器 用 top 命令查看

 top
top - 09:57:55 up 40 days, 22:05,  9 users,  load average: 4.44, 4.03, 3.85
Tasks: 741 total,   3 running, 738 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.8 us,  5.9 sy,  0.0 ni, 88.1 id,  0.0 wa,  1.0 hi,  1.1 si,  0.0 st
MiB Mem :  75612.2 total,   8957.1 free,  35217.4 used,  31437.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  39580.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                   
2664227 museum    20   0   14.4g 867896  43196 S  78.6   1.1 798:35.65 java                                                                                                                                                      
  86521 redis     20   0  333628  56776   2820 S  14.1   0.1   4467:55 redis-server                                                                                                                                              
3370157 museum    20   0   14.4g 840884  41688 S   4.6   1.1   3:16.61 java                                                                                                                                                      
   2797 mysql     20   0   16.2g   6.6g  18564 S   4.3   8.9   6379:50 mysqld                                                                                                                                                    
1203554 museum    20   0   14.3g 795724  41332 S   3.9   1.0  52:52.93 java                                                                                                                                                      
1144857 root      20   0  245420   6992   4236 S   3.3   0.0  39:53.81 sshd                                                                                                                                                      
3258067 root      20   0  244884   6676   4452 S   3.3   0.0  30:24.07 sshd                                                                                                                                                      
 392412 museum    20   0   16.3g 933680  63456 S   2.6   1.2   2388:37 java                                                                                                                                                      
  89655 museum    20   0   15.5g   1.3g  27228 S   2.0   1.7   1896:47 java                                                                                                                                                      
  25420 1000      20   0  753840  70152  27396 S   1.6   0.1 344:25.97 kiali                                                                  

找出占用CPU最高的进程的pid:2664227 的进程 命令 JPS

[museum@museum-test txn]$ jps
2262626 cztech-museum-pac-mss-service.jar
1317781 cztech-service-gateway.jar
4009151 cztech-museum-ticket-check-service.jar
1402193 cztech-manage-gateway.jar
966342 cztech-platform-cashier-service.jar
867906 cztech-platform-common-service.jar
885471 cztech-museum-outer-portal-service.jar
482323 cztech-museum-pac-boss-service.jar
3469280 cztech-museum-manage-portal-service.jar
1468736 cztech-platform-form-service.jar
392412 BrokerStartup
108961 NamesrvStartup
1461618 cztech-museum-official-manage-service.jar
2037945 Jps
667244 cztech-platform-bdm-service.jar
3800095 cztech-common-human-service.jar
2172873 cztech-platform-sso-service.jar
198761 xxl-job-admin.jar
1404089 cztech-visitor-gateway.jar
119278 cztech-platform-bpm-service.jar
**2664227 cztech-platform-transaction-service.jar**
1203554 cztech-museum-guide-service.jar
114485 cztech-common-material-service.jar
3901069 cztech-museum-ticket-service.jar
89655 nacos-server.jar
1284840 cztech-museum-openapi-service.jar
946997 cztech-museum-collection-service.jar

确定该程序的pid是这个: ps -ef|grep

[museum@museum-test txn]$ ps -ef|grep cztech-platform-transaction-service.jar
museum   2664227       1 85 7月15 ?       13:18:14 java -Dnacos.server-addr=nacos.cqcztech.cn:8848 -Dreactor.netty.http.server.accessLogEnabled=true -Dspring.cloud.nacos.config.username=nacos -Dspring.cloud.nacos.config.password=%tgb6YHN -Dspring.cloud.nacos.discovery.username=nacos -Dspring.cloud.nacos.discovery.password=%tgb6YHN -Dproject=museum -server -Dlogs.home=/home/museum/logs -Dspring.profiles.active=test -Dfile.encoding=UTF-8 -Djava.awt.headless=true -Djava.io.tmpdir=/home/museum/tmp/ -DsqlQueryMaxLength=50000 -server -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -Xms256m -Xmx512m -Xmn64m -XX:PermSize=64m -Xss256k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -Dloader.path=/home/museum/program/lib,/home/museum/program/config -jar /home/museum/program/platform/txn/cztech-platform-transaction-service.jar

通过Arthas分析该Java运行的服务的进程

[museum@museum-test arthas]$ java -jar arthas-boot.jar 
[INFO] JAVA_HOME: /usr/lib/java/jdk1.8.0_144/jre
[INFO] arthas-boot version: 3.7.2
[INFO] Process 2664227 already using port 3658
[INFO] Process 2664227 already using port 8563
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 2664227 /home/museum/program/platform/txn/cztech-platform-transaction-service.jar
  [2]: 2262626 /home/museum/program/pac/mss/cztech-museum-pac-mss-service.jar
  [3]: 1317781 /home/museum/program/gateway/service/cztech-service-gateway.jar
  [4]: 1402193 /home/museum/program/gateway/manage/cztech-manage-gateway.jar
  [5]: 2290469 /home/museum/program/vistor/video/cztech-museum-videolive-service.jar
  [6]: 966342 /home/museum/program/platform/cashier/cztech-platform-cashier-service.jar
  [7]: 867906 /home/museum/program/platform/common/cztech-platform-common-service.jar
  [8]: 885471 /home/museum/program/vistor/portal/cztech-museum-outer-portal-service.jar
  [9]: 482323 /home/museum/program/pac/boss/cztech-museum-pac-boss-service.jar
  [10]: 3469280 /home/museum/program/manage/portal/cztech-museum-manage-portal-service.jar
  [11]: 1468736 /home/museum/program/platform/form/cztech-platform-form-service.jar
  [12]: 392412 org.apache.rocketmq.broker.BrokerStartup
  [13]: 3370157 /home/museum/program/vistor/ticket/cztech-museum-ticket-service.jar
  [14]: 108961 org.apache.rocketmq.namesrv.NamesrvStartup
  [15]: 1461618 /home/museum/program/manage/official/cztech-museum-official-manage-service.jar
  [16]: 667244 /home/museum/program/platform/bdm/cztech-platform-bdm-service.jar
  [17]: 3800095 /home/museum/program/manage/human/cztech-common-human-service.jar
  [18]: 2172873 /home/museum/program/platform/sso/cztech-platform-sso-service.jar
  [19]: 198761 /home/museum/middleware/xxl-job-admin/bin/xxl-job-admin.jar
  [20]: 1404089 /home/museum/program/gateway/visitor/cztech-visitor-gateway.jar
  [21]: 119278 /home/museum/program/platform/bpm/cztech-platform-bpm-service.jar
  [22]: 1203554 /home/museum/program/vistor/guide/cztech-museum-guide-service.jar
  [23]: 114485 /home/museum/program/manage/material/cztech-common-material-service.jar
  [24]: 89655 /home/museum/middleware/nacos/target/nacos-server.jar
  [25]: 2088486 /home/museum/program/manage/ticket-check/cztech-museum-ticket-check-service.jar
  [26]: 1284840 /home/museum/program/manage/openapi/cztech-museum-openapi-service.jar
  [27]: 946997 /home/museum/program/collection/collection/cztech-museum-collection-service.jar
  [28]: 1257580 /home/museum/program/vistor/member/cztech-museum-member-service.jar

通过Arthas看资源的耗用情况看板 : dashboard

ID       NAME                                                     GROUP                       PRIORITY           STATE             %CPU               DELTA_TIME         TIME               INTERRUPTED       DAEMON             
95       lettuce-epollEventLoop-6-1                               main                        5                  RUNNABLE          41.84              2.091              358:35.676         false             true               
166      pool-4-thread-1                                          main                        5                  TIMED_WAITING     33.61              1.680              273:19.935         false             false              
103      lettuce-eventExecutorLoop-1-4                            main                        5                  WAITING           1.02               0.051              5:30.768           false             true               
173      lettuce-eventExecutorLoop-1-18                           main                        5                  WAITING           0.98               0.048              5:27.252           false             true               
207      Thread-43                                                main                        10                 TIMED_WAITING     0.97               0.048              4:50.993           false             false              
175      lettuce-eventExecutorLoop-1-20                           main                        5                  WAITING           0.87               0.043              5:28.749           false             true               
163      lettuce-eventExecutorLoop-1-15                           main                        5                  WAITING           0.83               0.041              5:32.722           false             true               
162      lettuce-eventExecutorLoop-1-14                           main                        5                  WAITING           0.78               0.038              5:31.926           false             true               
105      lettuce-eventExecutorLoop-1-5                            main                        5                  WAITING           0.66               0.033              5:35.152           false             true               
108      lettuce-eventExecutorLoop-1-6                            main                        5                  WAITING           0.65               0.032              5:27.778           false             true               
58       lettuce-eventExecutorLoop-1-1                            main                        5                  TIMED_WAITING     0.63               0.031              5:42.399           false             true               
109      lettuce-eventExecutorLoop-1-7                            main                        5                  WAITING           0.63               0.031              5:29.607           false             true               
Memory                                          used             total           max             usage           GC                                                                                                              
heap                                            138M             249M            505M            27.37%          gc.parnew.count                                         8149                                                    
par_eden_space                                  9M               51M             51M             18.66%          gc.parnew.time(ms)                                      109122                                                  
par_survivor_space                              1M               6M              6M              17.58%          gc.concurrentmarksweep.count                            11                                                      
cms_old_gen                                     127M             192M            448M            28.50%          gc.concurrentmarksweep.time(ms)                         1416                                                    
nonheap                                         221M             231M            -1              95.78%                                                                                                                          
code_cache                                      77M              77M             240M            32.12%                                                                                                                          
metaspace                                       129M             137M            -1              94.27%                                                                                                                          
compressed_class_space                          15M              16M             1024M           1.50%                                                                                                                           
direct                                          1M               1M              -               100.00%                                                                                                                         
mapped                                          0K               0K              -               0.00%                                                                                                                           
Runtime                                                                                                                                                                                

分析是哪一个线程导致的CPU占用率较高 : thread -n 3

命令:查看占用CPU率最高的前三个线程-->  thread -n 3
"C1 CompilerThread0" [Internal] cpuUsage=1.63% deltaTime=3ms time=1170ms

"arthas-command-execute" Id=23 cpuUsage=0.11% deltaTime=0ms time=401ms RUNNABLE
    at java.management@11.0.7/sun.management.ThreadImpl.dumpThreads0(Native Method)
    at java.management@11.0.7/sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:466)
    at com.taobao.arthas.core.command.monitor200.ThreadCommand.processTopBusyThreads(ThreadCommand.java:199)
    at com.taobao.arthas.core.command.monitor200.ThreadCommand.process(ThreadCommand.java:122)
    at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl.process(AnnotatedCommandImpl.java:82)
    at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl.access$100(AnnotatedCommandImpl.java:18)
    at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl$ProcessHandler.handle(AnnotatedCommandImpl.java:111)
    at com.taobao.arthas.core.shell.command.impl.AnnotatedCommandImpl$ProcessHandler.handle(AnnotatedCommandImpl.java:108)
    at com.taobao.arthas.core.shell.system.impl.ProcessImpl$CommandProcessTask.run(ProcessImpl.java:385)
    at java.base@11.0.7/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base@11.0.7/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base@11.0.7/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base@11.0.7/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base@11.0.7/java.lang.Thread.run(Thread.java:834)


"VM Periodic Task Thread" [Internal] cpuUsage=0.07% deltaTime=0ms time=584ms

定位到线程使用的代码行

   private ScheduledExecutorService service = Executors.newSingleThreadScheduledExecutor();

 public void asyncCreateOrderFromQueue() {
        Runnable task = () -> {
            try {
                Set<String> keys = redisTemplate.keys(STREAM_PREFIX + "*");
                //检查队列是否有消息
                if (keys == null || keys.isEmpty()) {
                    return;
                }
                for (String key : keys) {
                    //检查队列是否有数据
                    MapRecord<String, String, JSONObject> orderRecord = redisQueue.getOrderBodyFromRedis(key);
                    if (orderRecord == null) { //没有消息则不消费
                        continue;
                    }
                    Map orderRecordValue
                            = orderRecord.getValue();
                    PreOrderBusinessOrderRequest request = JSON.parseObject((String) orderRecordValue.get("payload"),
                            PreOrderBusinessOrderRequest.class);
                    // 调用下单接口
                    MerchantGoodsMapping mapping = merchantGoodsMappingRepository.getByGoodsType(request.getMerchantGoodsType());
                    //消息确认及删除消息
                    redisQueue.ackAndDelete(request.getGoodsNo(), orderRecord.getId().getValue());
                    Result<Object> createOrderResult = null;
                    try {
                        createOrderResult = doCreateOrder(request, mapping);
                    } catch (Exception e) {
                        log.error("订单创建异常:doCreateOrder", e);
                    } finally {
                        redisTemplate.opsForZSet().remove(RedisQueue.generateZSortName(request.getGoodsNo()), request.getUserId().toString());
                    }
                    //从zsort 中删除 ,删除该用户的请求
                    redisTemplate.opsForZSet().remove(RedisQueue.generateZSortName(request.getGoodsNo()), request.getUserId().toString());
                    //保存订单数据到缓存中
                    redisQueue.putCreateOrderResult(createOrderResult, request.getGoodsNo(), orderRecord.getId().getValue());
                    //发送小程序通知
                }
            } catch (Exception e) {
                log.error("消费订单队列异常:asyncCreateOrderFromQueue--->" + e.getMessage(), e);
            }
        };
        service.scheduleAtFixedRate(task, 0, redisQueue.getHandleInterval(), TimeUnit.MILLISECONDS);
    }

分析原因:ScheduledExecutorService.scheduleAtFixedRate 方法导致的


这不就是一个定时调度线程吗:作用  100ms调度一次 定时一直调度使用    会导致CPU占用很大吗?  百度一下原因  


ScheduledExecutorService.scheduleAtFixedRate 方法本身在正常使用情况下不应该导致CPU占用过高。然而,如果你遇到了CPU占用过高的问题,可能有以下几个原因和解决方法:

任务执行时间过长:

如果你传递给 scheduleAtFixedRate 的任务执行时间过长,或者任务中包含了密集的计算或IO操作,这可能导致调度器无法及时启动下一个任务。这样会导致任务重叠,从而占用更多的CPU资源。
解决方法:确保每个任务在预定的执行周期内能够及时完成。如果任务执行时间不确定或者非常长,考虑使用 scheduleWithFixedDelay 方法,并在任务结束后再次调度。
异常或错误处理不当:

如果任务中发生了异常而没有得到适当处理,可能会导致任务执行线程一直处于忙碌状态,从而导致CPU占用过高。
解决方法:在任务中添加适当的异常处理机制,确保异常情况下能够优雅地处理任务并释放资源。
任务数量过多:

如果同时有大量任务被调度,并且这些任务的执行时间有重叠,可能会导致线程池中的线程数增多,从而增加CPU的负载。
解决方法:评估任务的执行频率和资源消耗,根据实际情况调整线程池的大小或者优化任务的调度策略。
线程池配置不当:

如果线程池的大小配置不当,可能会导致任务排队时间过长,或者线程频繁被创建和销毁,进而影响CPU的使用效率。
解决方法:根据任务的性质和数量,适当地配置线程池的核心线程数、最大线程数以及队列类型(如有界队列或无界队列)。
JVM和操作系统调优:

如果以上方法都无法解决问题,可能需要对JVM和操作系统进行调优,以确保能够更好地处理高负载情况。
解决方法:调整JVM的堆内存大小(通过 -Xmx 和 -Xms 参数),以及操作系统的资源限制(如文件句柄、线程数等),以满足应用程序的需求。
综上所述,调用 ScheduledExecutorService.scheduleAtFixedRate 方法本身不应该导致性能瓶颈或CPU占用过高。问题很可能源于任务本身的执行逻辑或者线程池配置,需要综合考虑以上几个方面来进行排查和优化。

替换成别的方式:还是老实new线程或者使用自定义线程池,然后通过睡眠来做到定期执行策略吧

posted on 2024-07-16 15:02  白嫖老郭  阅读(143)  评论(0编辑  收藏  举报

导航