JDK tools之jps和jstack诊断Java程序
大部分Java开发者可能知道有这么个工具,但是没怎么用过,每次还得百度一下。我也是之一 -_-!!。
每次遇到👇下面的问题,头是不是有点大:
- OutOfMemoryError
- 程序突然卡顿死锁
- 内存泄露
个人感觉jps、jstack、jmap、visualVM、jConsole、MAT都是解决这些问题的利器,今天先介绍jps、jstack。
1、jps,先man一下看看jps是干什么的
NAME jps - Java Virtual Machine Process Status Tool DESCRIPTION The jps tool lists the instrumented HotSpot Java Virtual Machines (JVMs) on the target system. The tool is limited to reporting information on JVMs for which it has the access permissions.
从名字就大概知道jps是查看Java进程信息的工具,跟LInux/Unix下的ps(process status)类似,只是jps只针对Java程序。
使用语法如下:
SYNOPSIS jps [options] [hostid]
主要参数:
-q 不输出类名、Jar名和传入main方法的参数 -m 输出传入main方法的参数 -l 输出main类或Jar的全限名 -v 输出传入JVM的参数
例如:
➜ ~ jps -m -l 2281 /Users/xxxxxx/Applications/data-integration-7.0/launcher/pentaho-application-launcher-7.0.0.0-25.jar -lib ./../libswt/osx64/ 2312 sun.tools.jps.Jps -m -l ➜ ~
包括jps在内一共有两个Java进程,第一列是pid,也是后面我们要用到的信息。
当然上面的信息还可以用ps命令来查询ps -ef或ps aux,前者Linux下的标准语法,后者是BSD风格的语法。一样可以查到第二列的pid。
➜ ~ ps -ef | grep java 501 2281 2261 0 11:48PM ttys000 0:52.81 /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java -XstartOnFirstThread -Xms1024m -Xmx2048m -XX:MaxPermSize=256m -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djava.library.path=./../libswt/osx64/ -DKETTLE_HOME= -DKETTLE_REPOSITORY= -DKETTLE_USER= -DKETTLE_PASSWORD= -DKETTLE_PLUGIN_PACKAGES= -DKETTLE_LOG_SIZE_LIMIT= -DKETTLE_JNDI_ROOT= -jar /Users/xxxxxx/Applications/data-integration-7.0/launcher/pentaho-application-launcher-7.0.0.0-25.jar -lib ./../libswt/osx64/ 501 2361 2285 0 11:51PM ttys001 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn java ➜ ~
2、jstack
NAME jstack - stack trace DESCRIPTION jstack prints Java stack traces of Java threads for a given Java process or core file or a remote debug server.
For each Java frame, the full class name, method name, 'bci' (byte code index) and line number, if available, are printed.
With the -m option,jstack prints both Java and native frames of all threads along with the 'pc' (program counter).
For each native frame, the closest native symbol to 'pc', if available, is printed. C++ mangled names are not demangled.
To demangle C++ names, the output of this command may be piped to c++filt.
说白了jstack就是查看当前Java程序内线程详细堆栈信息的工具。
使用语法如下:
SYNOPSIS jstack [ option ] pid jstack [ option ] executable core jstack [ option ] [server-id@]remote-hostname-or-IP
主要参数:
Options: -F to force a thread dump. Use when jstack <pid> does not respond (process is hung)(强制线程转储,此时线程会被挂起) -m to print both java and native frames (mixed mode)(打印Java和native接口的堆栈信息) -l long listing. Prints additional information about locks(打印关于锁的详细信息,如果有线程死锁可以使用jstack -l pid查看锁或资源的持有信息) -h or -help to print this help message
为了配合实战,写了一个会发生死锁的多线程程序代码如下:
public class MyThread extends Thread { private Object object1; private Object object2; public MyThread(Object object1, Object object2, String threadName) { this.object1 = object1; this.object2 = object2; setName(threadName); } @Override public void run() { String name = Thread.currentThread().getName(); System.out.println(name + " acquiring lock on " + object1); synchronized (object1) { System.out.println(name + " acquired lock on " + object1); sleep(); System.out.println(name + " acquiring lock on " + object2); synchronized (object2) { System.out.println(name + " acquired lock on " + object2); sleep(); } System.out.println(name + " released lock on " + object2); } System.out.println(name + " released lock on " + object2); System.out.println(name + " finished execution."); } private void sleep() { try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); } } }
main方法如下:
public class MainTest { public static void main(String[] args) { Object object1 = new Object(); Object object2 = new Object(); Object object3 = new Object(); MyThread myThread1 = new MyThread(object1, object2, "myThread1"); MyThread myThread2 = new MyThread(object2, object3, "myThread2"); MyThread myThread3 = new MyThread(object3, object1, "myThread3"); myThread1.start(); myThread2.start(); myThread3.start(); } }
运行程序后输出如下:
[root@localhost MutipleThreadTest]# java MainTest
myThread1 acquiring lock on java.lang.Object@796523ab
myThread3 acquiring lock on java.lang.Object@4335236e
myThread2 acquiring lock on java.lang.Object@5f08edd0
myThread3 acquired lock on java.lang.Object@4335236e
myThread1 acquired lock on java.lang.Object@796523ab
myThread2 acquired lock on java.lang.Object@5f08edd0
myThread3 acquiring lock on java.lang.Object@796523ab
myThread2 acquiring lock on java.lang.Object@4335236e
myThread1 acquiring lock on java.lang.Object@5f08edd0
从上面信息可知myThread1、myThread2、myThread3三个线程分别拿到一把锁,然后等待其他线程持有的锁。
由于没有任何一个线程释放锁于是整个程序进入死锁的状态。
现场的生产情况往往会更复杂,需要借助工具定位分析发生死锁的位置和原因,下面上jstack分析一下当前的堆栈信息。
上面的介绍已经知道如果有线程死锁可以使用`jstack -l pid`查看锁或资源的持有信息
[root@localhost ~]# jstack -l 3661 2018-01-09 17:24:15 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode): "Attach Listener" daemon prio=10 tid=0x00007f03a0001000 nid=0xeab waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "DestroyJavaVM" prio=10 tid=0x00007f03d4008800 nid=0xe4e waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "myThread3" prio=10 tid=0x00007f03d40b5800 nid=0xe5e waiting for monitor entry [0x00007f03d05f4000] java.lang.Thread.State: BLOCKED (on object monitor) at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b978> (a java.lang.Object) - locked <0x00000000d784b998> (a java.lang.Object) Locked ownable synchronizers: - None "myThread2" prio=10 tid=0x00007f03d40b4000 nid=0xe5d waiting for monitor entry [0x00007f03d06f5000] java.lang.Thread.State: BLOCKED (on object monitor) at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b998> (a java.lang.Object) - locked <0x00000000d784b988> (a java.lang.Object) Locked ownable synchronizers: - None "myThread1" prio=10 tid=0x00007f03d40b1800 nid=0xe5c waiting for monitor entry [0x00007f03d07f6000] java.lang.Thread.State: BLOCKED (on object monitor) at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b988> (a java.lang.Object) - locked <0x00000000d784b978> (a java.lang.Object) Locked ownable synchronizers: - None "Service Thread" daemon prio=10 tid=0x00007f03d4097800 nid=0xe5a runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread1" daemon prio=10 tid=0x00007f03d4095000 nid=0xe59 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "C2 CompilerThread0" daemon prio=10 tid=0x00007f03d4092000 nid=0xe58 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Signal Dispatcher" daemon prio=10 tid=0x00007f03d408f800 nid=0xe57 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "Finalizer" daemon prio=10 tid=0x00007f03d406f000 nid=0xe55 in Object.wait() [0x00007f03d0dfc000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000d7804858> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x00000000d7804858> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) Locked ownable synchronizers: - None "Reference Handler" daemon prio=10 tid=0x00007f03d406d000 nid=0xe54 in Object.wait() [0x00007f03d0efd000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000d7804470> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x00000000d7804470> (a java.lang.ref.Reference$Lock) Locked ownable synchronizers: - None "VM Thread" prio=10 tid=0x00007f03d4068800 nid=0xe53 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007f03d401e000 nid=0xe4f runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007f03d4020000 nid=0xe50 runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x00007f03d4022000 nid=0xe51 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x00007f03d4024000 nid=0xe52 runnable "VM Periodic Task Thread" prio=10 tid=0x00007f03d40a2000 nid=0xe5b waiting on condition JNI global references: 114 Found one Java-level deadlock: ============================= "myThread3": waiting to lock monitor 0x00007f03ac0062c8 (object 0x00000000d784b978, a java.lang.Object), which is held by "myThread1" "myThread1": waiting to lock monitor 0x00007f0388002258 (object 0x00000000d784b988, a java.lang.Object), which is held by "myThread2" "myThread2": waiting to lock monitor 0x00007f03ac004e28 (object 0x00000000d784b998, a java.lang.Object), which is held by "myThread3" Java stack information for the threads listed above: =================================================== "myThread3": at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b978> (a java.lang.Object) - locked <0x00000000d784b998> (a java.lang.Object) "myThread1": at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b988> (a java.lang.Object) - locked <0x00000000d784b978> (a java.lang.Object) "myThread2": at MyThread.run(MyThread.java:26) - waiting to lock <0x00000000d784b998> (a java.lang.Object) - locked <0x00000000d784b988> (a java.lang.Object) Found 1 deadlock.
简单分析如下:
- 以上结果末尾可知jstack已经检测到了死锁。
- "myThread3"在等待"myThread1"释放资源、"myThread1"在等待"myThread2"释放资源、"myThread2"在等待"myThread3"释放资源。简单说就是资源循环依赖了导致了死锁。
- 现实的情况往往不会发生结构如此简单的死锁,很多情况都是等待网络IO、数据读写等耗时操作,导致线程占用的资源不能释放,从而导致程序假死、卡顿等情况。
- 比如之前遇到过的一个场景saiku每次刷新缓存的时候都要卡很久,最后通过jstack发现每次刷缓存都会去拿olap数据库连接,并且加锁了。必须等刷缓存的相关操作执行完才能恢复,这其中就包含执行一系列的select COUNT(*)操作,而恰好存在这比样的大表4千万多万行数据,一个select COUNT(*)要执行90多秒,这就悲剧了,全部刷完需要8分钟左右系统才能恢复
堆栈中一些常见值解析
线程状态(java.lang.Thread.State)
- NEW:至今尚未启动的线程的状态。
- RUNNABLE:可运行线程的线程状态。处于可运行状态的某一线程正在 Java 虚拟机中运行,但它可能正在等待操作系统中的其他资源,比如处理器。
- BLOCKED:受阻塞并且正在等待monitor lock的某一线程的线程状态。处于受阻塞状态的某一线程正在等待monitor lock,以便进入一个同步的块/方法,或者在调用 Object.wait 之后再次进入同步的块/方法。 比如等待执行synchronized代码块或者使用synchronized标记的方法。
- WAITING:某一等待线程的线程状态。某一线程因为调用下列方法之一而处于等待状态。
- 不带超时值的 Object.wait
- 不带超时值的 Thread.join
- LockSupport.park 处于等待状态的线程正等待另一个线程,以执行特定操作。 例如,已经在某一对象上调用了 Object.wait() 的线程正等待另一个线程,以便在该对象上调用 Object.notify() 或 Object.notifyAll()。已经调用了 Thread.join() 的线程正在等待指定线程终止。
- TIMED_WAITING:具有指定等待时间的某一等待线程的线程状态。某一线程因为调用以下带有指定正等待时间的方法之一而处于定时等待状态:
- Thread.sleep
- 带有超时值的 Object.wait
- 带有超时值的 Thread.join
- LockSupport.parkNanos
- LockSupport.parkUntil
- TERMINATED:已终止线程的线程状态。线程已经结束执行。
各种线程ID
- prio:线程的优先级;
- tid:java Thread id;
- nid:native线程的id, 非常关键,后面再使用jstack时补充;
使用top -Hp pid查看进程的线程状态时线程的id就对应这里的nid,在堆栈中nid是使用16进制表示的因此可以使用如下工具做简单的转换
printf "%x\n" 21742
实战一下,找当前个人博客系统中最运行时间最长的线程
首先找到博客进程id
$ ps -ef | grep java
结果为:
zhangsan 27952 1 4 2017 ? 1-07:19:12 java -Xms128m -Xmx128m -Dfile.encoding=UTF-8 -jar tale-least.jar
查看当前线程信息
$ top -Hp 27952 top - 23:32:06 up 59 days, 10:06, 2 users, load average: 0.06, 0.04, 0.01 Tasks: 16 total, 1 running, 15 sleeping, 0 stopped, 0 zombie Cpu(s): 5.7%us, 3.0%sy, 0.0%ni, 91.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1019988k total, 907552k used, 112436k free, 151028k buffers Swap: 0k total, 0k used, 0k free, 387324k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27965 zhangsan 20 0 2116m 213m 13m R 7.3 21.4 1727:05 java 27954 zhangsan 20 0 2116m 213m 13m S 0.7 21.4 133:27.59 java 27952 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.00 java 27953 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.22 java 27955 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.06 java 27956 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.55 java 27957 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.00 java 27958 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:28.53 java 27959 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:09.86 java 27960 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.00 java 27961 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 13:00.98 java 27962 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.57 java 27969 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 2:04.10 java 27970 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:49.48 java 27971 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 2:16.69 java 18940 zhangsan 20 0 2116m 213m 13m S 0.0 21.4 0:00.00 java
发现线程pid 27954运行时间最长,转换成16进制
$ printf "%x\n" 27954 6d32
使用jstack该线程的堆栈信息
$ jstack 27952 | grep 6d32 | grep -v grep "VM Thread" os_prio=0 tid=0x00007fc6ec094000 nid=0x6d32 runnable
发现是Java VM相关的线程怪不得运行时间最长,这里根据实际情况鉴别VM系统线程和自己的业务线程