前段时间有发过Benchmark感受,经过持续一个月的来回折腾,终于发现了问题
1. 加载profile跟踪工具也会带来一定的负载
前面再谈Jmeter有介绍过PerfMon Metrics Collector,带来的负载不是很大,
bash startAgent.sh
JMeterPlugins Agent version 1.3.2
No port specified, the default value is used: 4444
------ File Systems init: ------
File System detected: /dev/mapper/VolGroup-lv_root
File System detected: /dev/sda1
File System detected: /dev/mapper/VolGroup-lv_home
--------------------------------
--- Network Interfaces init: ---
Network interface detected: lo
Network interface detected: eth0
--------------------------------
Waiting for incoming connections...
Client id=0 connected!
Client id=1 connected!
Client id=1 disconnected!
Client id=0 disconnected!
Client id=2 connected!
Client id=3 connected!
Client id=3 disconnected!
Client id=2 disconnected!
Client id=4 connected!
Client id=5 connected!
Client id=4 disconnected!
Client id=5 disconnected!
Client id=6 connected!
Client id=7 connected!
Client id=6 disconnected!
Client id=7 disconnected!
Client id=8 connected!
Client id=9 connected!
Client id=8 disconnected!
Client id=9 disconnected!
Client id=10 connected!
后面换了内部的CPU profile tool,当jemter加载超过10W级别的时候,请求返回有出错了
2. Dump Core文件,使用gdb分析core文件
请求出错,查看apache,没有问题,以前200W级别都可以接受,apache的服务器CPU和Memory使用率都不高;
查看被测系统服务,貌似也没问题,系统所在服务器CPU和Memory也正常;
对apache的httpd和被测系统服务生成core文件,看看是否中途有崩溃,同时可以打开debug log
在teminal里输入ulimit -c unlimited
ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63369
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 2048
cpu time (seconds, -t) unlimited
max user processes (-u) 30000
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
这里可以看到core file size是unlimited了
重新执行Loadtest,执行完分别到apache和服务存放路径查看是否有core文件,最终发现系统服务有core文件,
同时查看log文件, 发现服务的子进程挂了,但是系统的包容性好,子进程挂掉后过段时间又重启了
使用gdb调试core文件
gdb ../sbin/MyServer.bin core.1733
GNU gdb (GDB) Fedora (7.1-34.fc13)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<
http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/build/80release/de81-64-n/sbin/MyServer.bin...done.
[New Thread 1884]
[New Thread 1898]
[New Thread 1885]
[New Thread 1887]
...
Missing separate debuginfo for /home/build/80release/de81-64-n/lib/libunwind.so.7
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/21/aaf76a7962367031098e7ba887012284e13839
Missing separate debuginfo for /home/build/80release/de81-64-n/lib/libprofiler.so.0
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/38/70cb8c5ff8dc39bb9346b5d3572208a539248b
Missing separate debuginfo for /home/build/80release/de81-64-n/lib/libtcmalloc.so.0
...
Reading symbols from /home/build/80release/de81-64-n/lib/libunwind.so.7...done.
Loaded symbols for /home/build/80release/de81-64-n/lib/libunwind.so.7
Reading symbols from /home/build/80release/de81-64-n/lib/libprofiler.so.0...done.
Loaded symbols for /home/build/80release/de81-64-n/lib/libprofiler.so.0
...
Core was generated by `../sbin/MyServer.bin -child'.
Program terminated with signal 11, Segmentation fault.
#0 access_mem (as=<value optimized out>, addr=9, val=0x7f13f1d4a8b8, write=<value optimized out>, arg=<value optimized out>) at x86_64/Ginit.c:164
164 x86_64/Ginit.c: No such file or directory.
in x86_64/Ginit.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12.2-1.x86_64 libgcc-4.4.5-2.fc13.x86_64 libstdc++-4.4.5-2.fc13.x86_64
(gdb)
到这里输入where就能查看具体错误地址了
(gdb) where
#0 access_mem (as=<value optimized out>, addr=9, val=0x7f13f1d4a8b8, write=<value optimized out>, arg=<value optimized out>) at x86_64/Ginit.c:164
#1 0x00007f158b909bcd in dwarf_get (c=0x7f13f1d4afa0, rs=<value optimized out>) at ../include/tdep/libunwind_i.h:137
#2 apply_reg_state (c=0x7f13f1d4afa0, rs=<value optimized out>) at dwarf/Gparser.c:766
#3 0x00007f158b90a0fb in _ULx86_64_dwarf_find_save_locs (c=0x7f13f1d4afa0) at dwarf/Gparser.c:849
#4 0x00007f158b90a3f9 in _ULx86_64_dwarf_step (c=0x7f13f1d4afa0) at dwarf/Gstep.c:35
#5 0x00007f158b90c59a in _ULx86_64_step (cursor=<value optimized out>) at x86_64/Gstep.c:42
...
具体GDB查看http://sources.redhat.com/gdb/
更多推荐陈皓的用GDB调试程序系列
http://blog.csdn.net/haoel/article/details/2879