ANR 死锁案例分析

Monkey跑出framework crash,最终发现是由于ANR产生了死锁,在WD检查锁时,kill掉了System Server进程引起的。

下面看看ANR的线程trace来分析死锁原因。

从主线程调用栈看,主线程block,而block的原因是等待锁:0x3fd06119,而该锁正在被thread80持有:

DALVIK THREADS(89):

"main"prio=5 tid=1 Blocked

  | group="main" sCount=2 dsCount=0obj=0x73d2c050 self=0xb8ab27f8

  | sysTid=556 nice=-2 cgrp=default sched=1/32handle=0xb6f5bbec

  | state=S schedstat=( 0 0 0 ) utm=16794stm=14151 core=1 HZ=100

  | stack=0xbe4e0000-0xbe4e2000 stackSize=8MB

  | held mutexes=

  atandroid.view.inputmethod.InputMethodManager.windowDismissed(InputMethodManager.java:1296)

  - waiting to lock <0x3fd06119> (a

>>android.view.inputmethod.InputMethodManager$H) held by thread 80

  atandroid.view.WindowManagerGlobal.removeViewLocked(WindowManagerGlobal.java:366)

  atandroid.view.WindowManagerGlobal.removeView(WindowManagerGlobal.java:324)

  - locked <0x2231a2ab> (ajava.lang.Object)

  atandroid.view.WindowManagerImpl.removeViewImmediate(WindowManagerImpl.java:116)

  atandroid.app.Dialog.dismissDialog(Dialog.java:341)

  atandroid.app.Dialog.dismiss(Dialog.java:324)

thread80的调用栈,可知,他正等待锁0x1ba525de:

"Binder_D" prio=5 tid=80 Blocked
  | group="main" sCount=2 dsCount=0 obj=0x13375760 self=0xb8df9018
  | sysTid=2574 nice=0 cgrp=default sched=0/0 handle=0xb8de15d0
  | state=S schedstat=( 0 0 0 ) utm=11936 stm=14541 core=1 HZ=100
  | stack=0xa0b28000-0xa0b2a000 stackSize=1012KB
  | held mutexes=
  at com.android.server.InputMethodManagerService.getCurrentInputMethodSubtype(InputMethodManagerService.java:3238)
>> - waiting to lock <0x1ba525de> (a java.util.HashMap) held by thread 1
  at android.view.inputmethod.InputMethodManager.getCurrentInputMethodSubtype(InputMethodManager.java:1948)
  - locked <0x3fd06119> (a android.view.inputmethod.InputMethodManager$H)
  at com.android.server.TextServicesManagerService.getCurrentSpellCheckerSubtype(TextServicesManagerService.java:413)
  - locked <0x1aaf2990> (a java.util.HashMap)
  at com.android.internal.textservice.ITextServicesManager$Stub.onTransact(ITextServicesManager.java:72)
  at android.os.Binder.execTransact(Binder.java:469)

从tid=1的线程调用栈来看,他需要锁<0x3fd06119> 即mh的hander锁:

 public void windowDismissed(IBinder appWindowToken) {

        checkFocus();

        synchronized (mH) {

            if (mServedView != null &&

                   mServedView.getWindowToken() == appWindowToken) {

                finishInputLocked();

            }

        }

    }

通过其其他调用方法可知,此时它已经拥有了锁:<0x1ba525de>,即mMethodMap:

void hideInputMethodMenu() {

        synchronized (mMethodMap) {

            hideInputMethodMenuLocked();

        }

    }

而tid=80的线程它需要锁0x1ba525de,即mMethodMap,

public InputMethodSubtypegetCurrentInputMethodSubtype() {

        // TODO: Make this work even fornon-current users?

        if (!calledFromValidUser()) {

            return null;

        }

        synchronized (mMethodMap) {

            returngetCurrentInputMethodSubtypeLocked();

        }

    }

而该锁已经被线程1拥有,且并未释放;

另外,tid=80的线程此时恰好对tid=1线程需要的<0x3fd06119>mh的hander锁进行了上锁:

publicInputMethodSubtype getCurrentInputMethodSubtype() {

        synchronized (mH) {

            try {

                returnmService.getCurrentInputMethodSubtype();

            } catch (RemoteException e) {

                Log.w(TAG, "IME died:" + mCurId, e);

                return null;

            }

        }

    }

............

这样,便形成了死锁,此时其他线程通过binder调用,也需要线程1所拥有的mMethodMap这个锁而造成线程阻塞,从而造成ANR

死锁环境:

Thread

Locked

Need

Thread 1

0x1ba525de(mMethodMap

0x3fd06119(mh

Thread 80

0x3fd06119(mh

0x1ba525de(mMethodMap

Other thread

 

0x1ba525de(mMethodMap

解决方案:由于这个问题是在Monkey测试环境下触发的,环境较为复杂,概率极低,因此,考虑在InputMethodManager中增加flag,防止同时访问mMethodMap锁,避免发生死锁。

posted @ 2018-04-26 16:47  mail181  阅读(124)  评论(0编辑  收藏  举报