ANR 死锁案例分析
Monkey跑出framework crash,最终发现是由于ANR产生了死锁,在WD检查锁时,kill掉了System Server进程引起的。
下面看看ANR的线程trace来分析死锁原因。
从主线程调用栈看,主线程block,而block的原因是等待锁:0x3fd06119,而该锁正在被thread80持有:
DALVIK THREADS(89):
"main"prio=5 tid=1 Blocked
| group="main" sCount=2 dsCount=0obj=0x73d2c050 self=0xb8ab27f8
| sysTid=556 nice=-2 cgrp=default sched=1/32handle=0xb6f5bbec
| state=S schedstat=( 0 0 0 ) utm=16794stm=14151 core=1 HZ=100
| stack=0xbe4e0000-0xbe4e2000 stackSize=8MB
| held mutexes=
atandroid.view.inputmethod.InputMethodManager.windowDismissed(InputMethodManager.java:1296)
- waiting to lock <0x3fd06119> (a
>>android.view.inputmethod.InputMethodManager$H) held by thread 80
atandroid.view.WindowManagerGlobal.removeViewLocked(WindowManagerGlobal.java:366)
atandroid.view.WindowManagerGlobal.removeView(WindowManagerGlobal.java:324)
- locked <0x2231a2ab> (ajava.lang.Object)
atandroid.view.WindowManagerImpl.removeViewImmediate(WindowManagerImpl.java:116)
atandroid.app.Dialog.dismissDialog(Dialog.java:341)
atandroid.app.Dialog.dismiss(Dialog.java:324)
thread80的调用栈,可知,他正等待锁0x1ba525de:
"Binder_D" prio=5 tid=80 Blocked
| group="main" sCount=2 dsCount=0 obj=0x13375760 self=0xb8df9018
| sysTid=2574 nice=0 cgrp=default sched=0/0 handle=0xb8de15d0
| state=S schedstat=( 0 0 0 ) utm=11936 stm=14541 core=1 HZ=100
| stack=0xa0b28000-0xa0b2a000 stackSize=1012KB
| held mutexes=
at com.android.server.InputMethodManagerService.getCurrentInputMethodSubtype(InputMethodManagerService.java:3238)
>> - waiting to lock <0x1ba525de> (a java.util.HashMap) held by thread 1
at android.view.inputmethod.InputMethodManager.getCurrentInputMethodSubtype(InputMethodManager.java:1948)
- locked <0x3fd06119> (a android.view.inputmethod.InputMethodManager$H)
at com.android.server.TextServicesManagerService.getCurrentSpellCheckerSubtype(TextServicesManagerService.java:413)
- locked <0x1aaf2990> (a java.util.HashMap)
at com.android.internal.textservice.ITextServicesManager$Stub.onTransact(ITextServicesManager.java:72)
at android.os.Binder.execTransact(Binder.java:469)
从tid=1的线程调用栈来看,他需要锁<0x3fd06119> 即mh的hander锁:
public void windowDismissed(IBinder appWindowToken) {
checkFocus();
synchronized (mH) {
if (mServedView != null &&
mServedView.getWindowToken() == appWindowToken) {
finishInputLocked();
}
}
}
通过其其他调用方法可知,此时它已经拥有了锁:<0x1ba525de>,即mMethodMap:
void hideInputMethodMenu() {
synchronized (mMethodMap) {
hideInputMethodMenuLocked();
}
}
而tid=80的线程它需要锁0x1ba525de,即mMethodMap,
public InputMethodSubtypegetCurrentInputMethodSubtype() {
// TODO: Make this work even fornon-current users?
if (!calledFromValidUser()) {
return null;
}
synchronized (mMethodMap) {
returngetCurrentInputMethodSubtypeLocked();
}
}
而该锁已经被线程1拥有,且并未释放;
另外,tid=80的线程此时恰好对tid=1线程需要的<0x3fd06119>mh的hander锁进行了上锁:
publicInputMethodSubtype getCurrentInputMethodSubtype() {
synchronized (mH) {
try {
returnmService.getCurrentInputMethodSubtype();
} catch (RemoteException e) {
Log.w(TAG, "IME died:" + mCurId, e);
return null;
}
}
}
............
这样,便形成了死锁,此时其他线程通过binder调用,也需要线程1所拥有的mMethodMap这个锁而造成线程阻塞,从而造成ANR。
死锁环境:
Thread | Locked | Need |
Thread 1 | 0x1ba525de(mMethodMap) | 0x3fd06119(mh) |
Thread 80 | 0x3fd06119(mh) | 0x1ba525de(mMethodMap) |
Other thread |
| 0x1ba525de(mMethodMap) |
解决方案:由于这个问题是在Monkey测试环境下触发的,环境较为复杂,概率极低,因此,考虑在InputMethodManager中增加flag,防止同时访问mMethodMap锁,避免发生死锁。