Android ANR log trace日志文件分析
-
什么是ANR?
ANR:Application Not Responding,即应用无响应
-
ANR日志Trace文件获取
系统生成的Trace文件保存在data/anr,可以用过命令adb pull data/anr/取出
traces.txt只保留最后一次ANR的信息,Android系统有个DropBox功能功能,它能记录系统出现的crash错误.因此保留有发生过的ANR的信息.(log路径:/data/system/dropbox)
获取系统crash log: adb shell dumpsys dropbox --print >>log.txt
-
Trace文件怎么生成的?
-
当APP(包括系统APP和用户APP)进程出现ANR、应用响应慢或WatchDog的监视没有得到回馈时,
-
系统会dump此时的top进程,进程中Thread的运行状态就都dump到这个Trace文件中了.
-
导致ANR的常见几种情况:
1:Input dispatching timed out(5 seconds) 按键或触摸事件处理超时(一般是UI主线程做了耗时的操作,这类ANR最常见)
2:BroadcastTimeout(10 seconds) 广播的分发和处理超时(一般是onReceiver执行时间过长)
3:ServiceTimeout(20 seconds) Service的启动和执行超时
另外还有ProviderTimeout和WatchDog等导致的ANR.还有当系统内存或CPU资源不足时容易出现ANR,一般这种情况会有lowmemorykill的log打印.
应用ANR产生的时候,ActivityManagerService的appNotResponding方法就会被调用,然后在/data/anr/traces.txt文件中写入ANR相关信息.
-
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
-
ActivityRecord parent, boolean aboveSystem, final String annotation) {
-
// ... ...
-
if (MONITOR_CPU_USAGE) {
-
updateCpuStatsNow(); // 更新CPU使用率
-
}
-
// ... ...
-
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
-
// dumpStackTraces是输出traces文件的函数
-
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
-
NATIVE_STACKS_OF_INTEREST);
-
-
String cpuInfo = null;
-
if (MONITOR_CPU_USAGE) {
-
updateCpuStatsNow(); // 再次更新CPU信息
-
synchronized (mProcessCpuTracker) {
-
// 输出ANR发生前一段时间内的CPU使用率
-
cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
-
}
-
info.append(processCpuTracker.printCurrentLoad());
-
info.append(cpuInfo);
-
}
-
// 输出ANR发生后一段时间内的CPU使用率
-
info.append(processCpuTracker.printCurrentState(anrTime));
-
-
Slog.e(TAG, info.toString());
-
if (tracesFile == null) {
-
// There is no trace file, so dump (only) the alleged culprit's threads to the log
-
Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
-
}
-
// 将ANR信息同时输出到DropBox中
-
addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
-
cpuInfo, tracesFile, null);
-
// ... ...
-
synchronized (this) {
-
// 显示ANR提示对话框
-
// Bring up the infamous App Not Responding dialog
-
Message msg = Message.obtain();
-
HashMap<String, Object> map = new HashMap<String, Object>();
-
msg.what = SHOW_NOT_RESPONDING_MSG;
-
msg.obj = map;
-
msg.arg1 = aboveSystem ? 1 : 0;
-
map.put("app", app);
-
if (activity != null) {
-
map.put("activity", activity);
-
}
-
mUiHandler.sendMessage(msg);
-
}
-
}
避免ANR?
-
UI线程尽量只做跟UI相关的工作
-
耗时的工作(比如数据库操作,I/O,连接网络或者别的有可能阻碍UI线程的操作)把它放入单独的线程处理
-
尽量用Handler来处理UIthread和别的thread之间的交互
分析ANR的Log:
出现ANR的log如下:
-
06-22 10:37:46.204 3547 3604 E ActivityManager: ANR in org.code:MessengerService // ANR出现的进程包名
-
-
E ActivityManager: PID: 17027 // ANR进程ID
-
-
E ActivityManager: Reason: executing service org.code/.ipc.MessengerService //导致ANR的原因
-
-
E ActivityManager: Load: 8.31 / 8.12 / 8.47
-
-
E ActivityManager: CPU usage from 0ms to 6462ms later: //CPU在ANR发生后的使用情况
-
-
E ActivityManager: 61% 3547/system_server: 21% user + 39% kernel / faults: 13302 minor 2 major
-
-
E ActivityManager: 0.2% 475/debuggerd: 0% user + 0.1% kernel / faults: 6086 minor 1 major
-
-
E ActivityManager: 10% 5742/com.android.phone: 5.1% user + 5.1% kernel / faults: 7597 minor
-
-
E ActivityManager: 6.9% 5342/com.android.systemui: 2.1% user + 4.8% kernel / faults: 4158 minor
-
-
E ActivityManager: 0.1% 477/debuggerd64: 0% user + 0.1% kernel / faults: 4013 minor
-
-
E ActivityManager: 5.7% 16313/org.code: 3.2% user + 2.4% kernel / faults: 2412 minor
-
-
E ActivityManager: 3.7% 17027/org.code:MessengerService: 1.7% user + 2% kernel / faults: 2571 minor 6 major
-
-
E ActivityManager: 2.6% 306/cfinteractive: 0% user + 2.6% kernel
-
... ...
-
E ActivityManager: +0% 17168/kworker/0:1: 0% user + 0% kernel
-
-
E ActivityManager: 0% TOTAL: 0% user + 0% kernel + 0% softirq // CUP占用情况
-
-
E ActivityManager: CPU usage from 5628ms to 6183ms later:
-
-
E ActivityManager: 42% 3547/system_server: 17% user + 24% kernel / faults: 11 minor
-
-
E ActivityManager: 12% 3604/ActivityManager: 1.7% user + 10% kernel
-
-
E ActivityManager: 12% 3609/android.display: 8.7% user + 3.5% kernel
-
-
E ActivityManager: 3.5% 5304/Binder_6: 1.7% user + 1.7% kernel
-
-
E ActivityManager: 3.5% 5721/Binder_A: 1.7% user + 1.7% kernel
-
-
E ActivityManager: 3.5% 5746/Binder_C: 3.5% user + 0% kernel
-
-
E ActivityManager: 1.7% 3599/Binder_1: 0% user + 1.7% kernel
-
-
E ActivityManager: 1.7% 3600/Binder_2: 0% user + 1.7% kernel
-
-
I ActivityManager: Killing 17027:org.code:MessengerService/u0a85 (adj 1): bg anr
-
-
I art : Wrote stack traces to '/data/anr/traces.txt' //art这个TAG打印对traces操作的信息
-
-
D GraphicsStats: Buffer count: 9
-
-
W ActivityManager: Scheduling restart of crashed service org.code/.ipc.MessengerService in 1000ms
-
log打印了ANR的基本信息,我们可以分析CPU使用率推测ANR发生的时候设备在做什么工作;如果CPU使用率很高,接近100%,可能是在进行大规模的计算更可能是陷入死循环;如果CUP使用率很低,说明主线程被阻塞了,并且当IOwait很高,可能是主线程在等待I/O操作的完成.
对于ANR只是分析Log很难知道问题所在,我们还需要通过Trace文件分析stack调用情况.
-
-
----- pid 17027 at 2017-06-22 10:37:39 ----- // ANR出现的进程pid和时间(和上述log中pid一致)
-
Cmd line: org.code:MessengerService // ANR出现的进程名
-
Build fingerprint: 'Homecare/qucii8976v3_64:6.0.1/pansen06141150:eng/test-keys' // 下面记录系统版本,内存等状态信息
-
ABI: 'arm64'
-
Build type: optimized
-
Zygote loaded classes=6576 post zygote classes=13
-
Intern table: 13780 strong; 17 weak
-
JNI: CheckJNI is on; globals=283 (plus 158 weak)
-
Libraries: /system/lib64/libandroid.so /system/lib64/libcompiler_rt.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libwebviewchromium_loader.so libjavacore.so (7)
-
Heap: 29% free, 8MB/12MB; 75731 objects
-
Dumping cumulative Gc timings
-
Total number of allocations 75731
-
Total bytes allocated 8MB
-
Total bytes freed 0B
-
Free memory 3MB
-
Free memory until GC 3MB
-
Free memory until OOME 183MB
-
Total memory 12MB
-
Max memory 192MB
-
Zygote space size 3MB
-
Total mutator paused time: 0
-
Total time waiting for GC to complete: 0
-
Total GC count: 0
-
Total GC time: 0
-
Total blocking GC count: 0
-
Total blocking GC time: 0
-
-
suspend all histogram: Sum: 76us 99% C.I. 0.100us-28us Avg: 7.600us Max: 28us
-
DALVIK THREADS (15):
-
// Signal Catcher是记录traces信息的线程
-
// Signal Catche(线程名)、(daemon)表示守护进程、prio(线程优先级,默认是5)、tid(线程唯一标识ID)、Runnable(线程当前状态)
-
"Signal Catcher" daemon prio=5 tid=3 Runnable
-
//线程组名称、suspendCount、debugSuspendCount、线程的Java对象地址、线程的Native对象地址
-
| group="system" sCount=0 dsCount=0 obj=0x12d8f0a0 self=0x5598ae55d0
-
//sysTid是线程号(主线程的线程号和进程号相同)
-
| sysTid=17033 nice=0 cgrp=default sched=0/0 handle=0x7fb2350450
-
| state=R schedstat=( 4348125 172343 3 ) utm=0 stm=0 core=1 HZ=100
-
| stack=0x7fb2254000-0x7fb2256000 stackSize=1013KB
-
| held mutexes= "mutator lock"(shared held)
-
native: #00 pc 0000000000489e28 /system/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, int, char const*, art::ArtMethod*, void*)+236)
-
native: #01 pc 0000000000458fe8 /system/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&) const+220)
-
native: #02 pc 0000000000465bc8 /system/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+688)
-
native: #03 pc 0000000000466ae0 /system/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+276)
-
native: #04 pc 000000000046719c /system/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+188)
-
native: #05 pc 0000000000467a84 /system/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+492)
-
native: #06 pc 0000000000431194 /system/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+96)
-
native: #07 pc 000000000043e604 /system/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1256)
-
native: #08 pc 000000000043f214 /system/lib64/libart.so (art::SignalCatcher::Run(void*)+452)
-
native: #09 pc 0000000000068714 /system/lib64/libc.so (__pthread_start(void*)+52)
-
native: #10 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16)
-
(no managed stack frames)
-
-
//main(线程名)、prio(线程优先级,默认是5)、tid(线程唯一标识ID)、Sleeping(线程当前状态)
-
"main" prio=5 tid=1 Sleeping
-
| group="main" sCount=1 dsCount=0 obj=0x73132d10 self=0x5598a5f5e0
-
//sysTid是线程号(主线程的线程号和进程号相同)
-
| sysTid=17027 nice=0 cgrp=default sched=0/0 handle=0x7fb6db6fe8
-
| state=S schedstat=( 420582038 5862546 143 ) utm=24 stm=18 core=6 HZ=100
-
| stack=0x7fefba3000-0x7fefba5000 stackSize=8MB
-
| held mutexes=
-
// java 堆栈调用信息(这里可查看导致ANR的代码调用流程)(分析ANR最重要的信息)
-
at java.lang.Thread.sleep!(Native method)
-
- sleeping on <0x0c60f3c7> (a java.lang.Object)
-
at java.lang.Thread.sleep(Thread.java:1031)
-
- locked <0x0c60f3c7> (a java.lang.Object) // 锁住对象0x0c60f3c7
-
at java.lang.Thread.sleep(Thread.java:985)
-
at android.os.SystemClock.sleep(SystemClock.java:120)
-
at org.code.ipc.MessengerService.onCreate(MessengerService.java:63) //导致ANR的代码
-
at android.app.ActivityThread.handleCreateService(ActivityThread.java:2877)
-
at android.app.ActivityThread.access$1900(ActivityThread.java:150)
-
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1427)
-
at android.os.Handler.dispatchMessage(Handler.java:102)
-
at android.os.Looper.loop(Looper.java:148)
-
at android.app.ActivityThread.main(ActivityThread.java:5417)
-
at java.lang.reflect.Method.invoke!(Native method)
-
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
-
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
在log中显示的pid在traces文件中与之对应,trace log中会打印当前所有线程的堆栈信息,一般我们主要关注main线程的堆栈(也有分析其他线程的情况,通过分析ANR发生时系统状态推测出设备正在进行操作)
而这里然后通过查看堆栈调用信息分析ANR的代码.上述ANR实际上在org.code.ipc.MessengerService.onCreate中63行调用SystemClock.sleep(10000)代码导致的;这是比较简单ANR示例.
以上仅为解决ANR问题提供一个思路,具体问题还需要根据实际情况具体分析
线程状态的分类: java中定义的线程状态:
-
// libcore/libart/src/main/java/java/lang/Thread.java
-
/**
-
* A representation of a thread's state. A given thread may only be in one
-
* state at a time.
-
*/
-
public enum State {
-
/**
-
* The thread has been created, but has never been started.
-
*/
-
NEW,
-
/**
-
* The thread may be run.
-
*/
-
RUNNABLE,
-
/**
-
* The thread is blocked and waiting for a lock.
-
*/
-
BLOCKED,
-
/**
-
* The thread is waiting.
-
*/
-
WAITING,
-
/**
-
* The thread is waiting for a specified amount of time.
-
*/
-
TIMED_WAITING,
-
/**
-
* The thread has been terminated.
-
*/
-
TERMINATED
-
}
C代码中定义的线程状态:
-
// /art/runtime/thread_state.h
-
enum ThreadState {
-
// Thread.State JDWP state
-
kTerminated = 66, // TERMINATED TS_ZOMBIE Thread.run has returned, but Thread* still around
-
kRunnable, // RUNNABLE TS_RUNNING runnable
-
kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout
-
kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep()
-
kBlocked, // BLOCKED TS_MONITOR blocked on a monitor
-
kWaiting, // WAITING TS_WAIT in Object.wait()
-
kWaitingForGcToComplete, // WAITING TS_WAIT blocked waiting for GC
-
kWaitingForCheckPointsToRun, // WAITING TS_WAIT GC waiting for checkpoints to run
-
kWaitingPerformingGc, // WAITING TS_WAIT performing GC
-
kWaitingForDebuggerSend, // WAITING TS_WAIT blocked waiting for events to be sent
-
kWaitingForDebuggerToAttach, // WAITING TS_WAIT blocked waiting for debugger to attach
-
kWaitingInMainDebuggerLoop, // WAITING TS_WAIT blocking/reading/processing debugger events
-
kWaitingForDebuggerSuspension, // WAITING TS_WAIT waiting for debugger suspend all
-
kWaitingForJniOnLoad, // WAITING TS_WAIT waiting for execution of dlopen and JNI on load code
-
kWaitingForSignalCatcherOutput, // WAITING TS_WAIT waiting for signal catcher IO to complete
-
kWaitingInMainSignalCatcherLoop, // WAITING TS_WAIT blocking/reading/processing signals
-
kWaitingForDeoptimization, // WAITING TS_WAIT waiting for deoptimization suspend all
-
kWaitingForMethodTracingStart, // WAITING TS_WAIT waiting for method tracing to start
-
kWaitingForVisitObjects, // WAITING TS_WAIT waiting for visiting objects
-
kWaitingForGetObjectsAllocated, // WAITING TS_WAIT waiting for getting the number of allocated objects
-
kStarting, // NEW TS_WAIT native thread started, not yet ready to run managed code
-
kNative, // RUNNABLE TS_RUNNING running in a JNI native method
-
kSuspended, // RUNNABLE TS_RUNNING suspended by GC or debugger
-
};
两者可以看出在C代码中定义更为详细,Traces中显示的线程状态都是C代码定义的.我们可以通过查看线程状态对应的信息分析ANR问题.
如: TimedWaiting对应的线程状态是TIMED_WAITING
kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout 执行了无超时参数的wait函数
kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep() 执行了带有超时参数的sleep函数
-
ZOMBIE 线程死亡,终止运行
-
RUNNING/RUNNABLE 线程可运行或正在运行
-
TIMED_WAIT 执行了带有超时参数的wait、sleep或join函数
-
MONITOR 线程阻塞,等待获取对象锁
-
WAIT 执行了无超时参数的wait函数
-
INITIALIZING 新建,正在初始化,为其分配资源
-
STARTING 新建,正在启动
-
NATIVE 正在执行JNI本地函数
-
VMWAIT 正在等待VM资源
-
SUSPENDED 线程暂停,通常是由于GC或debug被暂停