ANR触发原理
ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。
造成ANR的场景:
- Service Timeout:比如前台服务在20s内未执行完成;
- BroadcastQueue Timeout:比如前台广播在10s内未执行完成
- ContentProvider Timeout:内容提供者,在publish过超时10s;
- InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件。
触发ANR的过程可分为三个步骤: 埋炸弹, 拆炸弹, 引爆炸弹
<1> Service Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG
消息时触发。
对于Service有两类:
- 对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
- 对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s
由变量ProcessRecord.execServicesFg来决定是否前台启动.
埋炸弹阶段:在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()
方法 (准确说是scheduleServiceTimeoutLocked方法) 中来埋下炸弹.
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException { ... //发送delay消息(SERVICE_TIMEOUT_MSG), bumpServiceExecutingLocked(r, execInFg, "create"); try { ... //最终执行服务的onCreate()方法 app.thread.scheduleCreateService(r, r.serviceInfo, mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo), app.repProcState); } catch (DeadObjectException e) { mAm.appDiedLocked(app); throw e; } finally { ... } }
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) { ... scheduleServiceTimeoutLocked(r.app); } void scheduleServiceTimeoutLocked(ProcessRecord proc) { if (proc.executingServices.size() == 0 || proc.thread == null) { return; } long now = SystemClock.uptimeMillis(); Message msg = mAm.mHandler.obtainMessage( ActivityManagerService.SERVICE_TIMEOUT_MSG); msg.obj = proc; //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程 mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT)); }
拆炸弹阶段:经过Binder等层层调用进入目标进程的主线程ActivityThread.handleCreateService()的过程. 在这个过程会创建目标服务对象,以及回调onCreate()方法, 紧接再次经过多次调用回到system_server来执行serviceDoneExecuting.
最终在serviceDoneExecutingLocked中移除服务超时消息SERVICE_TIMEOUT_MSG
。
private void handleCreateService(CreateServiceData data) { ... java.lang.ClassLoader cl = packageInfo.getClassLoader(); Service service = (Service) cl.loadClass(data.info.name).newInstance(); ... try { //创建ContextImpl对象 ContextImpl context = ContextImpl.createAppContext(this, packageInfo); context.setOuterContext(service); //创建Application对象 Application app = packageInfo.makeApplication(false, mInstrumentation); service.attach(context, this, data.info.name, data.token, app, ActivityManagerNative.getDefault()); //调用服务onCreate()方法 service.onCreate(); //拆除炸弹引线[见小节2.2.2] ActivityManagerNative.getDefault().serviceDoneExecuting( data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0); } catch (Exception e) { ... } }
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) { ... if (r.executeNesting <= 0) { if (r.app != null) { r.app.execServicesFg = false; r.app.executingServices.remove(r); if (r.app.executingServices.size() == 0) { //当前服务所在进程中没有正在执行的service mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); ... } ...
引爆炸弹阶段:在system_server进程中有一个Handler线程, 名叫”ActivityManager”.当倒计时结束便会向该Handler线程发送 一条信息SERVICE_TIMEOUT_MSG
,
final class MainHandler extends Handler { public void handleMessage(Message msg) { switch (msg.what) { case SERVICE_TIMEOUT_MSG: { ... //【见小节2.3.2】 mServices.serviceTimeout((ProcessRecord)msg.obj); } break; ... } ... } }
void serviceTimeout(ProcessRecord proc) { String anrMessage = null; synchronized(mAm) { if (proc.executingServices.size() == 0 || proc.thread == null) { return; } final long now = SystemClock.uptimeMillis(); final long maxTime = now - (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); ServiceRecord timeout = null; long nextTime = 0; for (int i=proc.executingServices.size()-1; i>=0; i--) { ServiceRecord sr = proc.executingServices.valueAt(i); if (sr.executingStart < maxTime) { timeout = sr; break; } if (sr.executingStart > nextTime) { nextTime = sr.executingStart; } } if (timeout != null && mAm.mLruProcesses.contains(proc)) { Slog.w(TAG, "Timeout executing service: " + timeout); StringWriter sw = new StringWriter(); PrintWriter pw = new FastPrintWriter(sw, false, 1024); pw.println(timeout); timeout.dump(pw, " "); pw.close(); mLastAnrDump = sw.toString(); mAm.mHandler.removeCallbacks(mLastAnrDumpClearer); mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS); anrMessage = "executing service " + timeout.shortName; } } if (anrMessage != null) { //当存在timeout的service,则执行appNotResponding mAm.appNotResponding(proc, null, null, false, anrMessage); } }
其中anrMessage的内容为”executing service [发送超时serviceRecord信息]”;
<2> BroadcastReceiver Timeout是位于”ActivityManager”线程中的BroadcastQueue.BroadcastHandler收到BROADCAST_TIMEOUT_MSG
消息时触发。
对于广播队列有两个: foreground队列和background队列:
- 对于前台广播,则超时为BROADCAST_FG_TIMEOUT = 10s;
- 对于后台广播,则超时为BROADCAST_BG_TIMEOUT = 60s ;
埋炸弹阶段:
通过调用 BroadcastQueue.processNextBroadcast() 来处理广播.其流程为先处理并行广播,再处理当前有序广播,最后获取并处理下条有序广播.
final void processNextBroadcast(boolean fromMsg) { synchronized(mService) { ... //part 2: 处理当前有序广播 do { r = mOrderedBroadcasts.get(0); //获取所有该广播所有的接收者 int numReceivers = (r.receivers != null) ? r.receivers.size() : 0; if (mService.mProcessesReady && r.dispatchTime > 0) { long now = SystemClock.uptimeMillis(); if ((numReceivers > 0) && (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) { //当广播处理时间超时,则强制结束这条广播 broadcastTimeoutLocked(false); ... } } if (r.receivers == null || r.nextReceiver >= numReceivers || r.resultAbort || forceReceive) { if (r.resultTo != null) { //处理广播消息消息 performReceiveLocked(r.callerApp, r.resultTo, new Intent(r.intent), r.resultCode, r.resultData, r.resultExtras, false, false, r.userId); r.resultTo = null; } //拆炸弹 cancelBroadcastTimeoutLocked(); } } while (r == null); ... //part 3: 获取下条有序广播 r.receiverTime = SystemClock.uptimeMillis(); if (!mPendingBroadcastTimeoutMessage) { long timeoutTime = r.receiverTime + mTimeoutPeriod; //埋炸弹 setBroadcastTimeoutLocked(timeoutTime); } ... } }
对于广播超时处理时机:
- 首先在part3的过程中setBroadcastTimeoutLocked(timeoutTime) 设置超时广播消息;
- 然后在part2根据广播处理情况来处理:
- 当广播接收者等待时间过长,则调用broadcastTimeoutLocked(false);
- 当执行完广播,则调用cancelBroadcastTimeoutLocked;
final void setBroadcastTimeoutLocked(long timeoutTime) { if (! mPendingBroadcastTimeoutMessage) { Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this); mHandler.sendMessageAtTime(msg, timeoutTime); mPendingBroadcastTimeoutMessage = true; } }
设置定时广播BROADCAST_TIMEOUT_MSG,即当前往后推mTimeoutPeriod时间广播还没处理完毕,则进入广播超时流程。
拆炸弹阶段:
在processNextBroadcast()过程, 执行完performReceiveLocked,便会拆除炸弹.
final void cancelBroadcastTimeoutLocked() { if (mPendingBroadcastTimeoutMessage) { mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this); mPendingBroadcastTimeoutMessage = false; } }
引爆炸弹阶段:
BroadcastHandler.handleMessage
private final class BroadcastHandler extends Handler { public void handleMessage(Message msg) { switch (msg.what) { case BROADCAST_TIMEOUT_MSG: { synchronized (mService) { //【见小节3.3.2】 broadcastTimeoutLocked(true); } } break; ... } ... } }
不会引爆的四种情况
- mOrderedBroadcasts已处理完成,则不会anr;
- 正在执行dexopt,则不会anr;
- 系统还没有进入ready状态(mProcessesReady=false),则不会anr;
- 如果当前正在执行的receiver没有超时,则重新设置广播超时,不会anr;
<3> ContentProvider Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。
ContentProvider 超时为CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s. 这个跟前面的Service和BroadcastQueue完全不同, 由Provider进程启动过程相关.
埋炸弹阶段:
埋炸弹的过程 其实是在进程创建的过程,进程创建后会调用attachApplicationLocked()进入system_server进程. 10s之后引爆该炸弹
private final boolean attachApplicationLocked(IApplicationThread thread, int pid) { ProcessRecord app; if (pid != MY_PID && pid >= 0) { synchronized (mPidsSelfLocked) { app = mPidsSelfLocked.get(pid); // 根据pid获取ProcessRecord } } ... //系统处于ready状态或者该app为FLAG_PERSISTENT进程则为true boolean normalMode = mProcessesReady || isAllowedWhileBooting(app.info); List<ProviderInfo> providers = normalMode ? generateApplicationProvidersLocked(app) : null; //app进程存在正在启动中的provider,则超时10s后发送CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息 if (providers != null && checkAppInLaunchingProvidersLocked(app)) { Message msg = mHandler.obtainMessage(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG); msg.obj = app; mHandler.sendMessageDelayed(msg, CONTENT_PROVIDER_PUBLISH_TIMEOUT); } thread.bindApplication(...); ... }
拆炸弹阶段:
当provider成功publish之后,便会拆除该炸弹
public final void publishContentProviders(IApplicationThread caller, List<ContentProviderHolder> providers) { ... synchronized (this) { final ProcessRecord r = getRecordForAppLocked(caller); final int N = providers.size(); for (int i = 0; i < N; i++) { ContentProviderHolder src = providers.get(i); ... ContentProviderRecord dst = r.pubProviders.get(src.info.name); if (dst != null) { ComponentName comp = new ComponentName(dst.info.packageName, dst.info.name); mProviderMap.putProviderByClass(comp, dst); //将该provider添加到mProviderMap String names[] = dst.info.authority.split(";"); for (int j = 0; j < names.length; j++) { mProviderMap.putProviderByName(names[j], dst); } int launchingCount = mLaunchingProviders.size(); int j; boolean wasInLaunchingProviders = false; for (j = 0; j < launchingCount; j++) { if (mLaunchingProviders.get(j) == dst) { //将该provider移除mLaunchingProviders队列 mLaunchingProviders.remove(j); wasInLaunchingProviders = true; j--; launchingCount--; } } //成功pubish则移除该消息 if (wasInLaunchingProviders) { mHandler.removeMessages(CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG, r); } synchronized (dst) { dst.provider = src.provider; dst.proc = r; //唤醒客户端的wait等待方法 dst.notifyAll(); } ... } } } }
引爆炸弹阶段:
在system_server进程中有一个Handler线程, 名叫”ActivityManager”.当倒计时结束便会向该Handler线程发送 一条信息CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG
,
final class MainHandler extends Handler { public void handleMessage(Message msg) { switch (msg.what) { case CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG: { ... ProcessRecord app = (ProcessRecord)msg.obj; synchronized (ActivityManagerService.this) { processContentProviderPublishTimedOutLocked(app); } } break; ... } ... } }
总结:
当出现ANR时,都是会调用到AMS.appNotResponding()方法
Timeout时长
- 对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
- 对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s
- 对于前台广播,则超时为BROADCAST_FG_TIMEOUT = 10s;
- 对于后台广播,则超时为BROADCAST_BG_TIMEOUT = 60s;
- ContentProvider超时为CONTENT_PROVIDER_PUBLISH_TIMEOUT = 10s;
超时检测
Service超时检测机制:
- 超过timeout时长没有执行完相应操作来触发移除延时消息,则会触发anr;
BroadcastReceiver超时检测机制:
- 有序广播的总执行时间超过 2* receiver个数 * timeout时长,则会触发anr;
- 有序广播的某一个receiver执行过程超过 timeout时长,则会触发anr;
另外:
- 对于Service, Broadcast, Input发生ANR之后,最终都会调用AMS.appNotResponding;
- 对于provider,在其进程启动时publish过程可能会出现ANR, 则会直接杀进程以及清理相应信息,而不会弹出ANR的对话框. appNotRespondingViaProvider()过程会走appNotResponding(),