Android上层WatchDog学习笔记_2
一、简述
1. 了解 WatchDog 的原理,可以更好的理解系统服务的运行机制。
二、WatchDog实现
1. 代码实现位置
//frameworks/base/services/core/java/com/android/server/Watchdog.java public class Watchdog extends Thread { ... }
可见 Watchdog 是一个线程。
2. WatchDog 在 SystemServer.java 中启动
run() //SystemServer.java startBootstrapServices() //SystemServer.java traceBeginAndSlog("StartWatchdog"); final Watchdog watchdog = Watchdog.getInstance(); watchdog.start(); traceEnd(); ... traceBeginAndSlog("InitWatchdog"); watchdog.init(mSystemContext, mActivityManagerService); traceEnd();
可见 Watchdog 是运行在 SystemServer 中的一个辅线程。因为是线程,所以,只要start即可。
3. WatchDog构造方法
private Watchdog() { super("watchdog"); // not checking the background thread,shared foreground thread is the main checker. 线程名 "android.fg" mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChecker); // Add checker for main thread. only do a quick check since there can be UI running on the thread. mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT)); // Add checker for shared UI thread. 线程名 "android.ui" mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT)); // And also check IO thread. 线程名 "android.io" mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT)); // And the display thread. 线程名 "android.display" mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT)); // And the animation thread. 线程名 "android.anim" mHandlerCheckers.add(new HandlerChecker(AnimationThread.getHandler(), "animation thread", DEFAULT_TIMEOUT)); // And the surface animation thread. 线程名 "android.anim.lf" mHandlerCheckers.add(new HandlerChecker(SurfaceAnimationThread.getHandler(), "surface animation thread", DEFAULT_TIMEOUT)); // Initialize monitor for Binder threads. addMonitor(new BinderThreadMonitor()); mOpenFdMonitor = OpenFdMonitor.create(); HandlerThread handlerThread = new HandlerThread("workThread"); //SS下的"workThread"线程 handlerThread.start(); mWorkHandler = new Handler(handlerThread.getLooper()) { @Override public void handleMessage(Message msg) { switch (msg.what) { case MESSAGE_AFE_CHECK_ERROR: checkAfeStatus(false); break; case MESSAGE_AFE_CHECK_OVER: Slog.i(TAG, "release observer"); mFileObserver.stopWatching(); mFileObserver = null; checkAfeStatus(true); getLooper().quitSafely(); mWorkHandler = null; break; } } }; // See the notes on DEFAULT_TIMEOUT. assert DB || DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; }
重点关注两个对象:mMonitorChecker 和 mHandlerCheckers。
其中 mHandlerCheckers 列表元素的来源:
(1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入
(2)外部导入:Watchdog.getInstance().addThread(handler);
mMonitorChecker 列表元素的来源:
(1) 外部导入:Watchdog.getInstance().addMonitor(monitor);
(2) 特别说明:addMonitor(new BinderThreadMonitor());
3. WatchDog的run()方法
public void run() { while (true) { ... synchronized (this) { for (int i=0; i<mHandlerCheckers.size(); i++) { HandlerChecker hc = mHandlerCheckers.get(i); hc.scheduleCheckLocked(); } } ... } ... // Trigger the kernel to dump all blocked threads, and backtraces // on all CPUs to the kernel log doSysRq('w'); doSysRq('l'); ... Thread dropboxThread = new Thread("watchdogWriteToDropbox") dropboxThread.start(); ... }
对 mHandlerCheckers 列表元素进行检测,若发现卡住了,触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯。
4. HandlerChecker 的 scheduleCheckLocked()
public void scheduleCheckLocked() { if (mCompleted) { // Safe to update monitors in queue, Handler is not in the middle of work mMonitors.addAll(mMonitorQueue); mMonitorQueue.clear(); } if ((mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) || (mPauseCount > 0)) { mCompleted = true; return; } if (!mCompleted) { // we already have a check in flight, so no need return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis(); mHandler.postAtFrontOfQueue(this); }
mMonitors.size() == 0 的情況,主要为了检查 mHandlerCheckers 中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling().
mMonitorChecker 对象的列表元素一定是大于0,此时,关注点在 mHandler.postAtFrontOfQueue(this):
5. HandlerChecker 的 run()
public final class HandlerChecker implements Runnable { ... @Override public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } mCurrentMonitor.monitor(); } synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } } ... }
运用的手段,监听 monitor 方法。
(1) 这里是对 mMonitors 进行 monitor,而能够满足条件的只有:mMonitorChecker,例如,各种服务通过 addMonitor 加入列表。
Watchdog.getInstance().addMonitor(this); //ActivityManagerService.java Watchdog.getInstance().addMonitor(this); //InputManagerService.java Watchdog.getInstance().addMonitor(this); //PowerManagerService.java Watchdog.getInstance().addMonitor(this); //WindowManagerService.java
而被执行的 monitor 方法很简单,例如 ActivityManagerService 的:
public void monitor() { synchronized (this) { } }
这里仅仅是检查系统服务是否长时间被锁住。
(2) 特别说明,检查 BinderThreadMonitor 方法
private static final class BinderThreadMonitor implements Watchdog.Monitor { @Override public void monitor() { Binder.blockUntilThreadAvailable(); } } //frameworks/base/core/java/android/os/Binder.java public static final native void blockUntilThreadAvailable(); //frameworks/native/libs/binder/IPCThreadState.cpp void IPCThreadState::blockUntilThreadAvailable() { pthread_mutex_lock(&mProcess->mThreadCountLock); while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) { ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n", static_cast<unsigned long>(mProcess->mExecutingThreadsCount), static_cast<unsigned long>(mProcess->mMaxThreads)); pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock); } pthread_mutex_unlock(&mProcess->mThreadCountLock); }
这里仅仅是检查进程中包含的可执行线程的数量不能超过 mMaxThreads,如果超过了最大值(31个),就需要等待。默认每个进程最大15个binder线程,但是SS将自己的改成31个了:
//frameworks/native/libs/binder/ProcessState.cpp #define DEFAULT_MAX_BINDER_THREADS 15 //frameworks/base/services/java/com/android/server/SystemServer.java public final class SystemServer { private static final int sMaxBinderThreads = 31; private void run() { BinderInternal.setMaxThreads(sMaxBinderThreads); //在启动所有服务之前就设置了 ... startBootstrapServices(); ] }
6. 超时后WatchDog会做什么
private void checkAfeStatus(boolean success) { public void run() { ... Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "*** GOODBYE!"); Process.killProcess(Process.myPid()); System.exit(10); }
kill自己所在进程(system_server),并退出。
三、WatchDog日志打印
1. process stack traces
保存路径由 dalvik.vm.stack-trace-file 或 dalvik.vm.stack-trace-dir 控制,常规为 /data/anr 。调用 ActivityManagerService.dumpStackTraces() 进行打印。
public final class HandlerChecker implements Runnable { //Watchdog.java public void run() { while (true) { if (!fdLimitTriggered) { if (waitState == WAITED_HALF) { if (!waitedHalf) { Slog.i(TAG, "WAITED_HALF"); // We've waited half the deadlock-detection interval. Pull a stack // trace and wait another half. ArrayList<Integer> pids = new ArrayList<Integer>(); pids.add(Process.myPid()); ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } } final File stack = ActivityManagerService.dumpStackTraces(pids, null, null, getInterestingNativePids()); } } }
注意,堵塞一半时即 WAITED_HALF,也会打印 process stack traces。
2. slog
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject); Slog.w(TAG, "*** GOODBYE!");
3. event log
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
4. kernel stack traces
// Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log doSysRq('w'); doSysRq('l');
触发 show-backtrace-all-active-cpus(l) show-blocked-tasks(w) 这两个sysrq来获取active cpu和D状态线程的栈回溯,打印到内核log中。
5. dropbox
Thread dropboxThread = new Thread("watchdogWriteToDropbox") { public void run() { // If a watched thread hangs before init() is called, we don't have a // valid mActivity. So we can't log the error to dropbox. if (mActivity != null) { mActivity.addErrorToDropBox("watchdog", null, "system_server", null, null, null, subject, null, stack, null); } StatsLog.write(StatsLog.SYSTEM_SERVER_WATCHDOG_OCCURRED, subject); } }; dropboxThread.start();
注意,dropbox 一般放在 /data/system/dropbox 目录下,指定目录的位置是:
//frameworks/base/services/core/java/com/android/server/DropBoxManagerService.java public DropBoxManagerService(final Context context) { this(context, new File("/data/system/dropbox"), FgThread.get().getLooper()); }
四、监测UiThread、IoThread、DisplatyThread、FgThread的原因
1. 这4个类,继承 ServiceThread,是单例模式。例如 UiThread.java
//frameworks/base/services/core/java/com/android/server/UiThread.java public final class UiThread extends ServiceThread { private UiThread() { super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/); } @Override public void run() { // Make sure UiThread is in the fg stune boost group Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP); super.run(); } private static void ensureThreadLocked() { if (sInstance == null) { sInstance = new UiThread(); sInstance.start(); final Looper looper = sInstance.getLooper(); looper.setTraceTag(Trace.TRACE_TAG_SYSTEM_SERVER); looper.setSlowLogThresholdMs(SLOW_DISPATCH_THRESHOLD_MS, SLOW_DELIVERY_THRESHOLD_MS); sHandler = new Handler(sInstance.getLooper()); } } public static UiThread get() { synchronized (UiThread.class) { ensureThreadLocked(); return sInstance; } } public static Handler getHandler() { synchronized (UiThread.class) { ensureThreadLocked(); return sHandler; } } }
(1) 通过 get() 获取对象。
(2) 通过 getHandler() 获取各自线程里面的 Handler 对象。
(3) 注意看,创建自身对象 ensureThreadLocked 的时候,就进行了 start 动作。也就是说,这个线程。在创建对象的时候就,就已经启动了。
其次,这四个类都继承 ServiceThread ,而 ServiceThread 继承 HandlerThread。我们重点关注线程中的 Handler,因为 AMS、WMS、PMS 等系统服务都涉及调用它们。
//frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java final class UiHandler extends Handler { public UiHandler() { super(com.android.server.UiThread.get().getLooper(), null, true); } @Override public void handleMessage(Message msg) { switch (msg.what) { case SHOW_ERROR_UI_MSG: case SHOW_NOT_RESPONDING_UI_MSG: case SHOW_STRICT_MODE_VIOLATION_UI_MSG: case WAIT_FOR_DEBUGGER_UI_MSG: case DISPATCH_PROCESSES_CHANGED_UI_MSG: case DISPATCH_PROCESS_DIED_UI_MSG: case DISPATCH_UIDS_CHANGED_UI_MSG: case DISPATCH_OOM_ADJ_OBSERVER_MSG: } } }
UiHandler 是直接获取的 UiThread 里面的 Looper。我们清楚一个线程一个 Looper,一个 MessageQueue,但是可以有多个 Handler.
我们看 handleMessage 里面的处理方式,说明并不一定是主线程才能更新Ui。(但是Android有说明必须主线程才能更新UI)。
2. 使用的场景差异
UiThread --> ActivityManagerService DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService IoThread --> PackageInstallerService、StorageManagerService、BluetoothManagerService
五、总结
1. Watchdog 的核心对象为 mHandlerCheckers 和 mMonitorChecker。
mHandlerCheckers:监控消息队列是否发生阻塞。
mMonitorChecker:监控系统核心服务是否发生长时间持锁。
mHandlerCheckers 的对象采用手段为通过 mHandler.getLooper().getQueue().isPolling() 判断是否超时;mMonitorChecker 通过 synchronized(this) 判断是否超时,其中特别注意,BinderThreadMonitor 主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。
2. 超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析。
3. 超时之后,Watchdog会杀掉自己的进程,也就是此时 system_server 进程的pid会变化。
参考:
android原理分析博客,Android WatchDog原理分析:https://blog.csdn.net/weixin_28543661/article/details/117344345
posted on 2023-09-27 14:26 Hello-World3 阅读(340) 评论(0) 编辑 收藏 举报