thread dump分析及如何利用Thread Dump解决问题(转载)

一、thread dump信息获取
1、发送信号
* In Unix, use "kill -3 <pid>" where pid is the Process ID of the JVM.(kill 信号列表)
* In Windows, press CTRL+BREAK on the window where the JVM is running.
2、通过命令导出文本文件
jps -l 找出服务器上运行的Java进程
jstack -l pid  直接显示在屏幕上
jstack pid | tee -a jstack.log    导出堆栈文件
3、通过jdk工具jvisualvm,在界面上点击生成thread dump文件
二、分析thread dump信息
1、最上面的是时间和jvm的信息
2、各个线程thread信息
3、最后面是heap堆的信息

三、thread线程的原始状态和thread dump文件中线程状态的
1、thread线程的原始状态:
NEW:还没有调用start()方法开始执行,在thread dump中不会出现
RUNNABLE:正常在执行的线程
BLOCKED:阻塞状态,等待获取锁进入同步方法或者同步代码块
WAITING:调用wait(),join(),LockSupport.park()方法
TIMED_WAITING:调用wait(long),join(long),LockSupport.parkNanos,LockSupport.parkUntil方法,在指定时间内等待并且没有超时
TERMINATED:正常执行完的线程,在thread dump中不会出现
在thread dump中可能出现的thread线程的原始状态有:RUNNABLE,BLOCKED,WAITING,TIMED_WAITING

2、thread dump文件中线程状态的种类:
Runnable
sleeping
Waiting for monitor entry (重点关注)
Wait on condition(重点关注)
in Object.wait() (重点关注)
Deadlock(重点关注)
其中
thread的RUNNABLE对应---> Runnable
thread的BLOCKED对应---> Waiting for monitor entry 
thread的WAITING和TIMED_WAITING对应---> in Object.wait()和Wait on condition

四、Dump文件中的线程状态含义:
Deadlock:死锁线程,一般指多个线程调用间,进入相互资源占用,导致一直等待无法释放的情况。
Runnable:一般指该线程正在执行状态中,该线程占用了资源,正在处理某个请求,有可能正在传递SQL到数据库执行,有可能在对某个文件操作,有可能进行数据类型等转换。
Waiting on condition:等待资源,或等待某个条件的发生。具体原因需结合 stacktrace来分析。如果堆栈信息明确是应用代码,则证明该线程正在等待资源。一般是大量读取某资源,且该资源采用了资源锁的情况下,线程进入等待状态,等待资源的读取。又或者,正在等待其他线程的执行等。
如果发现有大量的线程都在处在 Wait on condition,从线程 stack看,正等待网络读写,这可能是一个网络瓶颈的征兆。因为网络阻塞导致线程无法执行。一种情况是网络非常忙,几乎消耗了所有的带宽,仍然有大量数据等待网络读写;
另一种情况也可能是网络空闲,但由于路由等问题,导致包无法正常的到达。
另外一种出现 Wait on condition的常见情况是该线程在 sleep,等待 sleep的时间到了时候,将被唤醒。
Blocked:线程阻塞,是指当前线程执行过程中,所需要的资源长时间等待却一直未能获取到,被容器的线程管理器标识为阻塞状态,可以理解为等待资源超时的线程。
Waiting for monitor entry 和 in Object.wait():Monitor是 Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。
每一个对象都有,也仅有一个 monitor。每个 Monitor在某个时刻,只能被一个线程拥有,该线程就是 “Active Thread”,
而其它线程都是 “Waiting Thread”,分别在两个队列 “ Entry Set”和 “Wait Set”里面等候。
在 “Entry Set”中等待的线程状态是 “Waiting for monitor entry”,而在 “Wait Set”中等待的线程状态是 “in Object.wait()”。
在 “Entry Set”中的是试图获取锁而阻塞的线程;而“Wait Set”则是,获取锁后调用wait()而进入阻塞的线程,当其他线程调用notify()或者notifyall()唤醒等待该锁上的其他线程时,会把“Wait Set”队列中的
一个或者全部线程放入“Entry Set”中,然后在其中随机选择一个重新获取锁,然后执行。

 

一个阻塞状态线程的分析:

1. "Timer-0" daemon prio=10 tid=0xac190c00 nid=0xaef in Object.wait() [0xae77d000] 
2. java.lang.Thread.State: TIMED_WAITING (on object monitor) 
3. atjava.lang.Object.wait(Native Method) 
4. -waiting on <0xb3885f60> (a java.util.TaskQueue) ###继续wait 
5. atjava.util.TimerThread.mainLoop(Timer.java:509) 
6. -locked <0xb3885f60> (a java.util.TaskQueue) ###已经locked 
7. atjava.util.TimerThread.run(Timer.java:462)
* 线程名称:Timer-0 
* 线程类型:daemon 
* 优先级: 10,默认是5 
* jvm线程id:tid=0xac190c00,jvm内部线程的唯一标识(通过java.lang.Thread.getId()获取,通常用自增方式实现。) 
* 对应系统线程id(NativeThread ID):nid=0xaef,和top命令查看的线程pid对应,不过一个是10进制,一个是16进制。(通过命令:top -H -p pid,可以查看该进程的所有线程信息)
* 线程状态:in Object.wait(). 
* 起始栈地址:[0xae77d000] 
* Java thread statck trace:是上面2-7行的信息。到目前为止这是最重要的数据,Java stack trace提供了大部分信息来精确定位问题根源。(倒着看,入口在最下面一行)
JNI global references表示的意义?

五、案例分析
综合案例一:BLOCKED---waiting for monitor entry(deadlock)
综合案例二:BLOCKED---waiting for monitor entry(热点同步代码块造成的阻塞)
综合案例三:WAITING---(in Object.wait())(调用 object.wait()造成的)
综合案例四:TIMED_WAITING--- in Object.wait()(object.wait(1 * 60 * 1000)造成)
综合案例五:WAITING (parking)---waiting on condition(通过条件变量condition.await()造成)
综合案例六:TIMED_WAITING (parking)---waiting on condition(通过条件变量condition.await(5, TimeUnit.MINUTES)造成)
综合案例七:TIMED_WAITING---waiting on condition(Thread.sleep(1*60*1000)造成的)

综合案例一:BLOCKED---waiting for monitor entry(deadlock)

public class Test {
public static void main(String[] args) throws Exception{
System.out.println("主线程开始"+Thread.currentThread().getName());
final LeftRightDeadLock lock = new LeftRightDeadLock();
Thread t1 = new Thread(){
@Override
public void run() {
try {
lock.leftRight();
} catch (Exception e) {
e.printStackTrace();
}
}
};
Thread t2 = new Thread(){
@Override
public void run() {
try {
lock.rightLeft();
} catch (Exception e) {
e.printStackTrace();
}
}
};
t1.start();
t2.start();

t1.join();
t2.join();
System.out.println("主线程结束"+Thread.currentThread().getName());
}
}
public class LeftRightDeadLock {
private final Object left = new Object();
private final Object right = new Object();

public void leftRight() throws Exception {
synchronized (left) {
System.out.println(Thread.currentThread().getName()+"获得left锁");
Thread.sleep(1000);
synchronized (right)
{
System.out.println("leftRight end!");
}
}
}

public void rightLeft() throws Exception {
synchronized (right) { System.out.println(Thread.currentThread().getName()+"获得right锁");
Thread.sleep(1000);
synchronized (left)
{
System.out.println("rightLeft end!");
}
}
}
}
thread dump信息:
"Thread-1" prio=6 tid=0x000000000c019000 nid=0x2f38 waiting for monitor entry [0x000000000d14f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ls.lock.LeftRightDeadLock.rightLeft(LeftRightDeadLock.java:32)
- waiting to lock <0x00000000d7d53bb8> (a java.lang.Object)
- locked <0x00000000d7d53bc8> (a java.lang.Object)
at ls.lock.Test$2.run(Test.java:26)

"Thread-0" prio=6 tid=0x000000000c018000 nid=0x3de0 waiting for monitor entry [0x000000000cddf000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ls.lock.LeftRightDeadLock.leftRight(LeftRightDeadLock.java:20)
- waiting to lock <0x00000000d7d53bc8> (a java.lang.Object)
- locked <0x00000000d7d53bb8> (a java.lang.Object)
at ls.lock.Test$1.run(Test.java:16)

Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x000000000aa87f58 (object 0x00000000d7d53bb8, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x000000000c01ad08 (object 0x00000000d7d53bc8, a java.lang.Object),
which is held by "Thread-1"

综合案例二:BLOCKED---waiting for monitor entry(热点同步代码块造成的阻塞)

public class BlockedState {
private static Object object = new Object();

public static void main(String[] args) {
Runnable task = new Runnable() {
@Override
public void run() {
synchronized (object) {
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
long begin = System.currentTimeMillis();
long end = System.currentTimeMillis();
// 让线程运行1分钟,会一直持有object的监视器
while ((end - begin) <= 1 * 60 * 1000) {
end = System.currentTimeMillis();
}
}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c389800 nid=0x3ee4 waiting for monitor entry [0x000000000cc6f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ls.lock.BlockedState$1.run(BlockedState.java:19)
- waiting to lock <0x00000000d7d4c460> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c388800 nid=0x3d08 runnable [0x000000000cf0e000]
java.lang.Thread.State: RUNNABLE
at ls.lock.BlockedState$1.run(BlockedState.java:24)
- locked <0x00000000d7d4c460> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

综合案例三:WAITING---(in Object.wait())(调用 object.wait()造成的)

public class WaitingState {
private static Object object = new Object();
public static void main(String[] args) {
Runnable task = new Runnable() {
@Override
public void run() {
synchronized (object) {
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
try {
object.wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c379800 nid=0x3e28 in Object.wait() [0x000000000d06f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7d4c4a0> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at ls.lock.WaitingState$1.run(WaitingState.java:29)
- locked <0x00000000d7d4c4a0> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c375000 nid=0x3f54 in Object.wait() [0x000000000d18f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7d4c4a0> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at ls.lock.WaitingState$1.run(WaitingState.java:29)
- locked <0x00000000d7d4c4a0> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

综合案例四:TIMED_WAITING--- in Object.wait()(object.wait(1 * 60 * 1000)造成)

public class TimedWaitingState1 {
private static Object object = new Object();
public static void main(String[] args) {
Runnable task = new Runnable() {
@Override
public void run() {
synchronized (object) {
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
try {
// 进入等待的同时,会进入释放监视器
object.wait(1 * 60 * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
}

}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c0e5000 nid=0x3788 in Object.wait() [0x000000000cf9f000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7d4c978> (a java.lang.Object)
at ls.lock.TimedWaitingState1$1.run(TimedWaitingState1.java:26)
- locked <0x00000000d7d4c978> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c0b9000 nid=0x31a8 in Object.wait() [0x000000000cddf000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7d4c978> (a java.lang.Object)
at ls.lock.TimedWaitingState1$1.run(TimedWaitingState1.java:26)
- locked <0x00000000d7d4c978> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:744)

综合案例五:WAITING (parking)---waiting on condition(通过条件变量condition.await()造成)

public class TimedWaitingState3 {
// java的显示锁,类似java对象内置的监视器
private static Lock lock = new ReentrantLock();
// 锁关联的条件队列(类似于object.wait)
private static Condition condition = lock.newCondition();
public static void main(String[] args) {
Runnable task = new Runnable() {

@Override
public void run() {
// 加锁,进入临界区
lock.lock();
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
try {
condition.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
// 解锁,退出临界区
lock.unlock();
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c3ae800 nid=0x3f1c waiting on condition [0x000000000d03f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000d7d4ef90> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at ls.lock.TimedWaitingState3$1.run(TimedWaitingState3.java:51)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c3ad800 nid=0x36a4 waiting on condition [0x000000000cdbe000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000d7d4ef90> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at ls.lock.TimedWaitingState3$1.run(TimedWaitingState3.java:51)
at java.lang.Thread.run(Thread.java:744)

综合案例六:TIMED_WAITING (parking)---waiting on condition(通过条件变量condition.await(5, TimeUnit.MINUTES)造成)

public class TimedWaitingState {
// java的显示锁,类似java对象内置的监视器
private static Lock lock = new ReentrantLock();
// 锁关联的条件队列(类似于object.wait)
private static Condition condition = lock.newCondition();
public static void main(String[] args) {
Runnable task = new Runnable() {
@Override
public void run() {
// 加锁,进入临界区
lock.lock();
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
try {
condition.await(5, TimeUnit.MINUTES);
} catch (InterruptedException e) {
e.printStackTrace();
}
// 解锁,退出临界区
lock.unlock();
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c321000 nid=0x3bb4 waiting on condition [0x000000000d28f000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000d7d4ed18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)
at ls.lock.TimedWaitingState$1.run(TimedWaitingState.java:31)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c320800 nid=0x3d64 waiting on condition [0x000000000d05e000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000d7d4ed18> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2176)
at ls.lock.TimedWaitingState$1.run(TimedWaitingState.java:31)
at java.lang.Thread.run(Thread.java:744)

综合案例七:TIMED_WAITING---waiting on condition(Thread.sleep(1*60*1000)造成的)

public class TimedWaitingState2 {
public static void main(String[] args) {
Runnable task = new Runnable() {
@Override
public void run() {
System.out.println("线程:" + Thread.currentThread().getName()+"获取锁");
try {
Thread.sleep(1*60*1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
};
new Thread(task, "t1").start();
new Thread(task, "t2").start();
}
}
thread dump信息:
"t2" prio=6 tid=0x000000000c632000 nid=0x3f14 waiting on condition [0x000000000d20f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at ls.lock.WaitOnCondition$1.run(WaitOnCondition.java:17)
at java.lang.Thread.run(Thread.java:744)

"t1" prio=6 tid=0x000000000c551000 nid=0xae8 waiting on condition [0x000000000d07e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at ls.lock.WaitOnCondition$1.run(WaitOnCondition.java:17)
at java.lang.Thread.run(Thread.java:744)

附录:

死锁测试代码

public class LeftRightDeadLock
{
private final Object left = new Object();
private final Object right = new Object();

public void leftRight() throws Exception
{
synchronized (left)
{
System.out.println(Thread.currentThread().getName()+"获得left锁");
Thread.sleep(1000);
synchronized (right)
{
System.out.println("leftRight end!");
}
}
}

public void rightLeft() throws Exception
{
synchronized (right)
{ System.out.println(Thread.currentThread().getName()+"获得right锁");
Thread.sleep(1000);
synchronized (left)
{
System.out.println("rightLeft end!");
}
}
}
}
public class Test {
public static void main(String[] args) throws Exception{
System.out.println("主线程开始"+Thread.currentThread().getName());
final LeftRightDeadLock lock = new LeftRightDeadLock();
Thread t1 = new Thread(){
@Override
public void run() {
try {
lock.leftRight();
} catch (Exception e) {
e.printStackTrace();
}
}
};
Thread t2 = new Thread(){
@Override
public void run() {
try {
lock.rightLeft();
} catch (Exception e) {
e.printStackTrace();
}
}
};
t1.start();
t2.start();

t1.join();
t2.join();
System.out.println("主线程结束"+Thread.currentThread().getName());

}
}
程序运行输出:

主线程开始main
Thread-1获得right锁
Thread-0获得left锁

 

thread dump文件-2018-07-30 16:24:06

Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.51-b03 mixed mode):

"Thread-1" prio=6 tid=0x000000000c4ee000 nid=0x2ffc waiting for monitor entry [0x000000000d0ef000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ls.lock.LeftRightDeadLock.rightLeft(LeftRightDeadLock.java:32)
- waiting to lock <0x00000000d7d53ca8> (a java.lang.Object)
- locked <0x00000000d7d53cb8> (a java.lang.Object)
at ls.lock.Test$2.run(Test.java:25)

"Thread-0" prio=6 tid=0x000000000c4ed800 nid=0x46f0 waiting for monitor entry [0x000000000d21f000]
java.lang.Thread.State: BLOCKED (on object monitor)
at ls.lock.LeftRightDeadLock.leftRight(LeftRightDeadLock.java:20)
- waiting to lock <0x00000000d7d53cb8> (a java.lang.Object)
- locked <0x00000000d7d53ca8> (a java.lang.Object)
at ls.lock.Test$1.run(Test.java:15)

"Service Thread" daemon prio=6 tid=0x000000000c3c9800 nid=0x31a0 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x000000000c3c8800 nid=0x430c waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x000000000c3c7800 nid=0x2a48 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Monitor Ctrl-Break" daemon prio=6 tid=0x000000000c3bf800 nid=0x48ac runnable [0x000000000c95f000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0x00000000d7e12a50> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
- locked <0x00000000d7e12a50> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.intellij.rt.execution.application.AppMainV2$1.run(AppMainV2.java:64)

"Attach Listener" daemon prio=10 tid=0x000000000ac3f800 nid=0x3e48 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x000000000abda000 nid=0x3a20 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x000000000abc2800 nid=0x31f0 in Object.wait() [0x000000000bb8f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7c05568> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000000d7c05568> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

"Reference Handler" daemon prio=10 tid=0x000000000abbf000 nid=0x2354 in Object.wait() [0x000000000beaf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7c050f0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000d7c050f0> (a java.lang.ref.Reference$Lock)

"main" prio=6 tid=0x00000000024e9000 nid=0x48dc in Object.wait() [0x00000000029cf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000d7d53cc8> (a ls.lock.Test$1)
at java.lang.Thread.join(Thread.java:1280)
- locked <0x00000000d7d53cc8> (a ls.lock.Test$1)
at java.lang.Thread.join(Thread.java:1354)
at ls.lock.Test.main(Test.java:34)

"VM Thread" prio=10 tid=0x000000000abbd800 nid=0x40e8 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x00000000024ff000 nid=0x40c0 runnable

"GC task thread#1 (ParallelGC)" prio=6 tid=0x0000000002501000 nid=0x4064 runnable

"GC task thread#2 (ParallelGC)" prio=6 tid=0x0000000002502800 nid=0x437c runnable

"GC task thread#3 (ParallelGC)" prio=6 tid=0x0000000002504800 nid=0x1270 runnable

"VM Periodic Task Thread" prio=10 tid=0x000000000c3cd800 nid=0x34d4 waiting on condition

JNI global references: 127


Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x000000000c4eff28 (object 0x00000000d7d53ca8, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x000000000abc96e8 (object 0x00000000d7d53cb8, a java.lang.Object),
which is held by "Thread-1"

Java stack information for the threads listed above:
===================================================
"Thread-1":
at ls.lock.LeftRightDeadLock.rightLeft(LeftRightDeadLock.java:32)
- waiting to lock <0x00000000d7d53ca8> (a java.lang.Object)
- locked <0x00000000d7d53cb8> (a java.lang.Object)
at ls.lock.Test$2.run(Test.java:25)
"Thread-0":
at ls.lock.LeftRightDeadLock.leftRight(LeftRightDeadLock.java:20)
- waiting to lock <0x00000000d7d53cb8> (a java.lang.Object)
- locked <0x00000000d7d53ca8> (a java.lang.Object)
at ls.lock.Test$1.run(Test.java:15)

Found 1 deadlock.

Heap
PSYoungGen total 36352K, used 3751K [0x00000000d7c00000, 0x00000000da480000, 0x0000000100000000)
eden space 31232K, 12% used [0x00000000d7c00000,0x00000000d7fa9ce0,0x00000000d9a80000)
from space 5120K, 0% used [0x00000000d9f80000,0x00000000d9f80000,0x00000000da480000)
to space 5120K, 0% used [0x00000000d9a80000,0x00000000d9a80000,0x00000000d9f80000)
ParOldGen total 82432K, used 0K [0x0000000087400000, 0x000000008c480000, 0x00000000d7c00000)
object space 82432K, 0% used [0x0000000087400000,0x0000000087400000,0x000000008c480000)
PSPermGen total 21504K, used 3201K [0x0000000082200000, 0x0000000083700000, 0x0000000087400000)
object space 21504K, 14% used [0x0000000082200000,0x0000000082520540,0x0000000083700000)
 

分析工具

1、在线分析的:http://spotify.github.io/threaddump-analyzer/

2、TDA 可以作为visualvm插件。

3、IBM工具 BM Thread and Monitor Dump Analyzer for Java。下载 后可以获得名字类似jca457.jar的文件。

使用方法:

jca***.jar是一个可执行的jar包。可以输入以下命令来运行。

-jar是执行jar包的参数;-Xmx用来为执行jca457的进程分配最大堆内存。

<Java Runtime Environment path>java -Xmx500m -jar jca457.jar

 <Java Runtime Environment path>为%java%jre/bin目录下,同时将jca457.jar文件也拷贝到此目录下,再执行上述命令。

 

如何利用Thread Dump解决问题

使用方案

现象1:cpu高,load高,响应很慢

方案:
* 一个请求过程中多次dump

* 对比多次dump文件的runnable线程,如果执行的方法有比较大变化,说明比较正常。如果在执行同一个方法,就有一些问题了。

 

现象2:查找占用cpu最多的线程信息

方案:
* 使用命令: top -H -p pid(pid为被测系统的进程号),找到导致cpu高的线程id。

上述Top命令找到的线程id,对应着dump thread信息中线程的nid,只不过一个是十进制,一个是十六进制。

* 在thread dump中,根据top命令查找的线程id,查找对应的线程堆栈信息。

* 多次dump,比较方法调用

 

现象3:cpu使用率不高但是响应时间很长

方案:
* 进行dump,查看是否有很多thread struck在了i/o、数据库等地方,定位瓶颈原因。

1)Waiting on condition:等待某个资源或条件发生来唤醒自己。比如线程正在sleep,网络读写繁忙而等待。

2)Blocked:阻塞,waiting for monitor entry的线程。

 

现象4:请求无法响应

方案:
* 多次dump,对比是否所有的runnable线程都一直在执行相同的方法,如果是的,锁住了!

 

热锁

热锁,也往往是导致系统性能瓶颈的主要因素。其表现特征为:由于多个线程对临界区,或者锁的竞争,可能出现:

  • 频繁的线程的上下文切换:从操作系统对线程的调度来看,当线程在等待资源而阻塞的时候,操作系统会将之切换出来,放到等待的队列,当线程获得资源之后,调度算法会将这个线程切换进去,放到执行队列中。
  • 大量的系统调用:因为线程的上下文切换,以及热锁的竞争,或者临界区的频繁的进出,都可能导致大量的系统调用。
  • 大部分CPU开销用在“系统态 ”:线程上下文切换,和系统调用,都会导致 CPU在 “系统态 ”运行,换而言之,虽然系统很忙碌,但是 CPU用在 “用户态 ”的比例较小,应用程序得不到充分的 CPU资源。 
  • 随着 CPU数目的增多,系统的性能反而下降。因为CPU数目多,同时运行的线程就越多,可能就会造成更频繁的线程上下文切换和系统态的CPU开销,从而导致更糟糕的性能。 

    上面的描述,都是一个 scalability(可扩展性)很差的系统的表现。从整体的性能指标看,由于线程热锁的存在,程序的响应时间会变长,吞吐量会降低。
    那么,怎么去了解 “热锁 ”出现在什么地方呢?一个重要的方法还是结合操作系统的各种工具观察系统资源使用状况,以及收集Java线程的DUMP信息,看线程都阻塞在什么方法上,了解原因,才能找到对应的解决方法。
    我们曾经遇到过这样的例子,程序运行时,出现了以上指出的各种现象,通过观察操作系统的资源使用统计信息,以及线程 DUMP信息,确定了程序中热锁的存在,并发现大多数的线程状态都是 Waitingfor monitor entry或者 Wait on monitor,且是阻塞在压缩和解压缩的方法上。后来采用第三方的压缩包 javalib替代 JDK自带的压缩包后,系统的性能提高了几倍。

 

 

例子1:当cpu使用率不正常的偏高,查找占用CPU最多的线程信息
   
  1 )提取最高cpu使用率的线程
$ ps -mo pid.lwp.stime.time.cpu -C java  
PID LWP STIME TIME %CPU
10029 Dec07 00:02:02 99.5 -
10039 Dec07 00:00:00 0.1 -
10040 Dec07 00:00:00 95.5 -

     从应用中找出cpu使用率的线程,获得轻量级锁使用的cpu率,并将LWP对应的数字(10039) 转换成十六进制(0x2737) 

    2 )获得thread dump后,检测线程的状态
    提取出应用中thread dump中pid为10029 ,再找出nid为0x2737的线程
复制代码
"NioProcessor-2" prio=10 tid=0x0a8d2800 nid=0x2737 runnable [0x49aa5000] 
java.lang.Thread.State: RUNNABLE 
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) 
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) 
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) 
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) 
- locked <0x74c52678> (a sun.nio.ch.Util$1) 
- locked <0x74c52668> (a java.util.Collections$UnmodifiableSet) 
- locked <0x74c501b0> (a sun.nio.ch.EPollSelectorImpl) 
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) 
at external.org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) 
at external.org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:708) 
at external.org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662)
复制代码

    每隔一小时提取几次,然后分析线程的状态以确定问题。

 
例2:当性能不正常的下降
    在获得thread dump后,下面一组线程的状态是BLOCKED 
复制代码
" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000] 
java.lang.Thread.State: BLOCKED (on object monitor) 
at beans.ConnectionPool.getConnection(ConnectionPool.java:102) 
- waiting to lock <0xe0375410> (a beans.ConnectionPool) 
at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) 
at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) 

"DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020] 
java.lang.Thread.State: BLOCKED (on object monitor) 
at beans.ConnectionPool.getConnection(ConnectionPool.java:102) 
- waiting to lock <0xe0375410> (a beans.ConnectionPool) 
at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) 
at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) 

" DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080] 
java.lang.Thread.State: RUNNABLE 
at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570) 
- waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection) 
at beans.ConnectionPool.getConnection(ConnectionPool.java:112) 
- locked <0xe0386580> (a java.util.Vector) - locked <0xe0375410> (a beans.ConnectionPool) 
at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66) 
at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)
复制代码

If the threads are BLOCKED, extract the threads related to the lock that the threads are trying to obtain.

 
通过thread dump,可以确认处在BLOCKED状态的线程,因为<0xe0375410>锁无法被获得,这种问题可以目前持有锁的堆栈跟踪中解决问题。
在使用dbms时,有两个原因会导致上面描述的问题频繁发生。
第一个原因是:
不适当的配置:尽管这些线程仍然在工作,因为数据库连接池的不适当配置,使得无法显示出最好的性能。通过多次收集分析thread dumps,将会发现一些之前处于BLOCKED状态的线程会处于不同的状态。
第二个原因是:
不正当的连接。如果和数据库的连接保持异常,那么线程会一直等待超时。在这个例子中,通过多次收集分析thread dumps,将会看到和数据库有关的线程会一直保持BLOCKED状态。通过适当的参数,比如超时时间,可以缩短问题发生的时间
 
 
 
简单的Thread Dump编码:
线程的命名
使用java.lang.Thread 类来创建一个线程对象,将被命名为Thread-(Number)。当使用java.util.concurrent.DefaultThreadFactory对象创建线程,将被命名为pool-(Number)-thread-(Number)。在分析应用中几十到几百的线程时,如果所有的线程都使用默认的名字,分析会变得非常困难,很难区别出线程。
因此,可以养成在创建线程时为线程命名的好习惯
通过java.lang.Thread创建线程,可以给线程构造参数中传入自定义的名
public Thread(Runnable target, String name); 
public Thread(ThreadGroup group, String name);
public Thread(ThreadGroup group, Runnable target, String name);
public Thread(ThreadGroup group, Runnable target, String name, long stackSize); 
通过java.util.concurrent.ThreadFactory创建线程,可以取自己ThredFactory的名。如果不需要特殊的功能,可以使用下面的MyThreadFactory
import java.util.concurrent.ConcurrentHashMap; 
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;
public class MyThreadFactory implements ThreadFactory { private static final ConcurrentHashMap<String, AtomicInteger> POOL_NUMBER = new ConcurrentHashMap<String, AtomicInteger>();
private final ThreadGroup group;
private final AtomicInteger threadNumber = new AtomicInteger(1);
private final String namePrefix;
public MyThreadFactory(String threadPoolName) {
if (threadPoolName == null) { throw new NullPointerException("threadPoolName"); }
POOL_NUMBER.putIfAbsent(threadPoolName, new AtomicInteger());
SecurityManager securityManager = System.getSecurityManager();
group = (securityManager != null) ? securityManager.getThreadGroup() : Thread.currentThread().getThreadGroup();
AtomicInteger poolCount = POOL_NUMBER.get(threadPoolName);
if (poolCount == null) { namePrefix = threadPoolName + " pool-00-thread-"; }
else { namePrefix = threadPoolName + " pool-" + poolCount.getAndIncrement() + "-thread-";
}
}
public Thread newThread(Runnable runnable) {
Thread thread = new Thread(group, runnable, namePrefix + threadNumber.getAndIncrement(), 0);
if (thread.isDaemon()) { thread.setDaemon(false); }
if (thread.getPriority() != Thread.NORM_PRIORITY)
{ thread.setPriority(Thread.NORM_PRIORITY); }
return thread; }
}
 
通过MBean来获得更多的信息
可以通过MBean来获得更多的ThreadInfo的信息。同样使用ThreadInfo也可以获得更多的在thread dumps中难以获取的信息。
ThreadMXBean mxBean = ManagementFactory.getThreadMXBean(); 
long[] threadIds = mxBean.getAllThreadIds();
ThreadInfo[] threadInfos = mxBean.getThreadInfo(threadIds);
for (ThreadInfo threadInfo : threadInfos) {
System.out.println( threadInfo.getThreadName());
System.out.println( threadInfo.getBlockedCount());
System.out.println( threadInfo.getBlockedTime());
System.out.println( threadInfo.getWaitedCount());
System.out.println( threadInfo.getWaitedTime());
}
使用ThreaInfo中的信息可以获得等待线程或者阻塞线程所用的时间,也可以获得已经被停用了的异常线程的列表


--------------------------------------------------------------------------------------------------------------
参考资料:

https://blog.csdn.net/l1394049664/java/article/details/81290910

https://www.cnblogs.com/kongzhongqijing/articles/4152072.html

posted @ 2020-05-11 21:37  寒冰宇若  阅读(2790)  评论(0编辑  收藏  举报