系统缓慢情况收集

内存溢出

top发现内存占用过高,进程id 3788;
top -bcp 3788 可单独监控进程;
pwdx 3788 可查看服务所在路径;
top -Hp 3788 查看发现线程32258占用CPU过高;

jstack -l 3788|grep -A 20 `printf "%x" 32258`

发现以"VM Thread"开头,说明是垃圾回收线程。
查看gcutil发现

jstat -gcutil 3788 1000 10

发现老年代增长过快,FGC频率过快导致线程STOP。
DUMP内存分析原因(jmap dump:format=b,file= ),未见明显大对象,查看代码或三方的显式System.gc()调用:

[Full GC (System.gc()) [Tenured: 262546K->262546K(349568K), 0.0014879 secs] 262546K->262546K(506816K), [Metaspace: 3109K->3109K(1056768K)], 0.0015151 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] 
[GC (Allocation Failure) [DefNew: 2795K->0K(157248K), 0.0001504 secs][Tenured: 262546K->402K(349568K), 0.0012949 secs] 265342K->402K(506816K), [Metaspace: 3109K->3109K(1056768K)], 0.0014699 secs] [Times: user=0.00

可见第一次是System.gc(),第二次是JVM发起的。
通过配置JVM参数-XX:+DisableExplicitGC,关于显式GC,再次查看情况。

CPU太高

有时是因GC引起,有时因计算太多引起,如下图:

可见,在请求UserController时,因为Controller进行了一次比较耗时的调用,导致该线程CPU一直100%,从而定位至UserController第34行。

随机出现的接口耗时

有时我们无法通过jstack发现哪个接口比较耗时,这时我们可以通过压测工具,不断加压,这样多个线程均会报出这个耗时的代码所在的位置,如下图:

"http-nio-8080-exec-2" #29 daemon prio=5 os_prio=31 tid=0x00007fd08cb26000 nid=0x9603 waiting on condition [0x00007000031d5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at java.lang.Thread.sleep(Thread.java:240)
	at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:381)
	at com.feifei.user.controller.UserController.detail(UserController.java:28)

"http-nio-8080-exec-3" #30 daemon prio=5 os_prio=31 tid=0x00007fd08cb27000 nid=0x6203 waiting on condition [0x00007000032d8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at java.lang.Thread.sleep(Thread.java:240)
	at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:381)
	at com.feifei.user.controller.UserController.detail(UserController.java:28)

"http-nio-8080-exec-4" #31 daemon prio=5 os_prio=31 tid=0x00007fd08d0fa000 nid=0x6403 waiting on condition [0x00007000033db000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at java.lang.Thread.sleep(Thread.java:240)
	at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:381)
	at com.feifei.user.controller.UserController.detail(UserController.java:28)

发现UserController的第28行阻塞了。

线程WAITING

这种情况比较难复现,比如一个线程因连接被服务器挂起,导致该线程持续等待服务器的响应,从而引起主线程和其余线程均处于WAITING。

  1. 首先jstack日志中grep出所有处于TIMED_WAITING状态的线程,然后导出到一个文件。
"Attach Listener" #13 daemon prio=9 os_prio=31 tid=0x00007fe690064000 nid=0xd07 waiting on condition [0x0000000000000000]
"DestroyJavaVM" #12 prio=5 os_prio=31 tid=0x00007fe690066000 nid=0x2603 waiting on condition [0x0000000000000000]
"Thread-0" #11 prio=5 os_prio=31 tid=0x00007fe690065000 nid=0x5a03 waiting on condition [0x0000700003ad4000]
"C1 CompilerThread3" #9 daemon prio=9 os_prio=31 tid=0x00007fe68c00a000 nid=0xa903 waiting on condition [0x0000000000000000]

等待一段时间,如10秒,再次导出结果至文件。

"DestroyJavaVM" #12 prio=5 os_prio=31 tid=0x00007fe690066000 nid=0x2603 waiting on condition [0x0000000000000000]
"Thread-0" #11 prio=5 os_prio=31 tid=0x00007fe690065000 nid=0x5a03 waiting on condition [0x0000700003ad4000]
"VM Periodic Task Thread" os_prio=31 tid=0x00007fe68d114000 nid=0xa803 waiting on condition

重复以上步骤,有3,4个文件后,我们进行文件对比,找出这些文件中始终存在的用户线程,基本就可确认这个线程包含了处于等待状态的问题线程。因为等待一般不会超过30s还在等待。
从而我们再次此线程的堆栈进行二次排查,如果该线程是不包含用户自定义类的线程池中的空闲状态线程则排除,剩下的应该就是问题线程了。这样就可以找到代码行了。

"Thread-0" #11 prio=5 os_prio=31 tid=0x00007f9de08c7000 nid=0x5603 waiting on condition [0x0000700001f89000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
	at com.aibaobei.chapter2.eg4.SyncTask.lambda$main$0(SyncTask.java:8)
	at com.aibaobei.chapter2.eg4.SyncTask$$Lambda$1/1791741888.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:748)

于是我们发现了这个Thread-0,因为SyncTask.java的第8行导致。

死锁

这是常见问题,jstack也会给出如下结果:

可见,底部直接分析出了日志中存在哪些死锁,以及每个死锁的线程堆栈信息。上图中有两个用户线程分别在等待对方释放锁,位置为ConnectTask第5行。

工具

安装:
wget --no-check-certificate https://github.com/oldratlee/useful-scripts/archive/release.zip
文档:
https://github.com/oldratlee/useful-scripts/blob/master/docs/java.md#-show-busy-java-threads

测试代码CPU100%

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexLoad {
    public static void main(String[] args) {
        String[] patternMatch = {"([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)",
                "([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)"};
        List<String> patternList = new ArrayList<String>();

        patternList.add("Avg Volume Units product A + Volume Units product A");
        patternList.add("Avg Volume Units /  Volume Units product A");
        patternList.add("Avg retailer On Hand / Volume Units Plan / Store Count");
        patternList.add("Avg Hand Volume Units Plan Store Count");
        patternList.add("1 - Avg merchant Volume Units");
        patternList.add("Total retailer shipment Count");

        for (String s :patternList ){

            for(int i=0;i<patternMatch.length;i++){
                Pattern pattern = Pattern.compile(patternMatch[i]);

                Matcher matcher = pattern.matcher(s);
                System.out.println(s);
                if (matcher.matches()) {

                    System.out.println("Passed");
                }else
                    System.out.println("Failed;");
            }
        }
    }
}

测试代码死锁

import java.util.*;
public class SimpleDeadLock extends Thread {
    public static Object l1 = new Object();
    public static Object l2 = new Object();
    private int index;
    public static void main(String[] a) {
        Thread t1 = new Thread1();
        Thread t2 = new Thread2();
        t1.start();
        t2.start();
    }
    private static class Thread1 extends Thread {
        public void run() {
            synchronized (l1) {
                System.out.println("Thread 1: Holding lock 1...");
                try { Thread.sleep(10); }
                catch (InterruptedException e) {}
                System.out.println("Thread 1: Waiting for lock 2...");
                synchronized (l2) {
                    System.out.println("Thread 2: Holding lock 1 & 2...");
                }
            }
        }
    }
    private static class Thread2 extends Thread {
        public void run() {
            synchronized (l2) {
                System.out.println("Thread 2: Holding lock 2...");
                try { Thread.sleep(10); }
                catch (InterruptedException e) {}
                System.out.println("Thread 2: Waiting for lock 1...");
                synchronized (l1) {
                    System.out.println("Thread 2: Holding lock 2 & 1...");
                }
            }
        }
    }
}

其它工具

show-duplicate-java-classes:JAVA冲突检查
find-in-jars:类或资源查找
housemd pid [java_home]:BTrace封装,Refer:https://github.com/CSUG/HouseMD/wiki/UserGuideCN
jvmDebug:jvm pid
greys [@IP:PORT]:java进程异常诊断工具,同HouseMD.
sjk sjk --commands sjk --help :java性能排查优化工具

Refer:

https://my.oschina.net/zhangxufeng/blog/3017521
https://my.oschina.net/leejun2005/blog/1524687

[1] oldratlee/useful-scripts
https://github.com/oldratlee/useful-scripts

[2] awesome-scripts
https://github.com/superhj1987/awesome-scripts

[3] JDK自带工具之问题排查场景示例
http://bit.ly/2xtukcb

[4] Java调优经验谈
http://bit.ly/2xCIj2L

[5] jvm排查工具箱jvm-tools
https://segmentfault.com/a/1190000012658814

[6] alibaba/arthas
https://github.com/alibaba/arthas/blob/7f236219ddbd040764dd821cbcbd44899dd57c90/README.md
posted on 2019-06-24 17:22  liehen2046  阅读(284)  评论(0编辑  收藏  举报