Valgrind使用指南和错误分析

Valgrind使用指南和错误分析

Valgrind是一个GPL的软件,用于LinuxFor x86, amd64 and ppc32)程序的内存调试和代码剖析。你可以在它的环境中运行你的程序来监视内存的使用情况,比如语言中的mallocfree或者 C++中的new和 delete。使用Valgrind的工具包,你可以自动的检测许多内存管理和线程的bug,避免花费太多的时间在bug寻找上,使得你的程序更加稳固。

Valgrind的主要功能
Valgrind
工具包包含多个工具,如Memcheck,Cachegrind,Helgrind, CallgrindMassif。下面分别介绍个工具的作用:

Memcheck 工具主要检查下面的程序错误:

使用未初始化的内存 (Use of uninitialised memory)
使用已经释放了的内存 (Reading/writing memory after it has been free’d)
使用超过 malloc分配的内存空间(Reading/writing off the end of malloc’d blocks)
对堆栈的非法访问 (Reading/writing inappropriate areas on the stack)
申请的空间是否有释放 (Memory leaks – where pointers to malloc’d blocks are lost forever)
malloc/free/new/delete
申请和释放内存的匹配(Mismatched use of malloc/new/new [] vs free/delete/delete [])
src
dst的重叠(Overlapping src and dst pointers in memcpy() and related functions)
Callgrind
Callgrind
收集程序运行时的一些数据,函数调用关系等信息,还可以有选择地进行cache 模拟。在运行结束时,它会把分析数据写入一个文件。callgrind_annotate可以把这个文件的内容转化成可读的形式。

Cachegrind
它模拟 CPU中的一级缓存I1,D1L2二级缓存,能够精确地指出程序中 cache的丢失和命中。如果需要,它还能够为我们提供cache丢失次数,内存引用次数,以及每行代码,每个函数,每个模块,整个程序产生的指令数。这对优化程序有很大的帮助。

Helgrind
它主要用来检查多线程程序中出现的竞争问题。Helgrind 寻找内存中被多个线程访问,而又没有一贯加锁的区域,这些区域往往是线程之间失去同步的地方,而且会导致难以发掘的错误。Helgrind实现了名为” Eraser” 的竞争检测算法,并做了进一步改进,减少了报告错误的次数。

Massif
堆栈分析器,它能测量程序在堆栈中使用了多少内存,告诉我们堆块,堆管理块和栈的大小。Massif能帮助我们减少内存的使用,在带有虚拟内存的现代系统中,它还能够加速我们程序的运行,减少程序停留在交换区中的几率。

Valgrind 安装
1
、 到www.valgrind.org下载最新版valgrind-3.2.3.tar.bz2
2
、 解压安装包:tar –jxvf valgrind-3.2.3.tar.bz2
3
、 解压后生成目录valgrind-3.2.3 
4
、 cd valgrind-3.2.3
5
、 ./configure6、 Make;make install

注意:不要移动Valgrind到一个与--prefix指定的不一样的目录,这将导致一些莫名其妙的错误,大多数在Valgrind处理/fork/exec调用时。

1.检查内存错误:
例如我们原来有一个程序sec_infod,这是一个用gcc –g参数编译的程序,运行它需要:
#./a.out
如果我们想用valgrind的内存检测工具,我们就要用如下方法调用:
#valgrind --leak-check=full --show-reachable=yes --trace-children= yes   ./a.out (2>logfile
加上会好些,程序在执行期间stderr会有一些输出。提示比较多)

其中--leak-check=full 指的是完全检查内存泄漏,--show-reachable=yes是显示内存泄漏的地点,--trace-children=yes是跟入子进程。

如果您的程序是会正常退出的程序,那么当程序退出的时候valgrind自然会输出内存泄漏的信息。如果您的程序是个守护进程,那么也不要紧,我们 只要在别的终端下杀死memcheck进程(因为valgrind默认使用memcheck工具,就是默认参数—tools=memcheck):
#killall memcheck
这样我们的程序(./a.out)就被kill

2,检查代码覆盖和性能瓶颈:
我们调用valgrind的工具执行程序:
#valgrind --tool=callgrind ./sec_infod

会在当前路径下生成callgrind.out.pid(当前生产的是callgrind.out.19689),如果我们想结束程序,可以:
#killall callgrind
然后我们看一下结果:
#callgrind_annotate --auto=yes callgrind.out.19689   >log
#vim log

3.Valgrind使用参数
          --log-fd=N 
默认情况下,输出信息是到标准错误stderr,也可以通过—log-fd=8,输出到描述符为8的文件
          --log-file=filename
将输出的信息写入到filename.PID的文件里,PID是运行程序的进行ID。可以通过--log- file exactly=filename指定就输出到filename文件。
          --log-file-qualifier=<VAR>,
取得环境变量的值来做为输出信息的文件名。如—log-file- qualifier=$FILENAME
          --log-socket=IP:PORT 
也可以把输出信息发送到网络中指定的IP:PORT
          --error-limit=no 
对错误报告的个数据进行限制,默认情况不做限制
          --tool=<toolname> [default: memcheck]
--tool=memcheck
:要求用memcheck这个工具对程序进行分析
     --leak-ckeck=yes 
要求对leak给出详细信息
     --trace-children=<yes|no> [default: no]
跟踪到子进程里去,默认请况不跟踪
     --xml=<yes|no> [default: no]
将信息以xml格式输出,只有memcheck可用
     --gen-suppressions=<yes|no|all> [default: no]
如果为yesvalgrind会在每发现一个错误便停下让用户做选择是继续还是退出

更多选项请参看: http://www.valgrind.org/docs/manual/manual-core.html可以把一些默认选项编辑在~/.valgrindrc文件里。

这里使用valgrindmemcheckcallgrind两个工具的用法,其实valgrind还有几个工具:“cachegrind”,用于检查缓存使用的;“helgrind”用于检测多线程竞争资源的,等等。

错误分析

1.默认使用工具memcheck

2.输出到XML文件:valgrind --leak-check=full --xml=yes --log-file="log.xml" myprog arg1 arg2

3.错误解释

3.1Illegal read / Illegal write errors

例如:

Invalid read of size 4
at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
这个错误的发生是因为对一些memcheck猜想不应该访问的内存进行了读写。 3.2 Use of uninitialised values

例如:

Conditional jump or move depends on uninitialised value(s)
at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
by 0x402E8476: _IO_printf (printf.c:36)
by 0x8048472: main (tests/manuel1.c:8)
这个错误的发生是因为使用了未初始化的数据。一般情况下有两种情形容易出现这个错误:
程序中的局部变量未初始化;
C
语言malloc的内存未初始化;C++new的对象其成员未被初始化。

3.3 Illegal frees
例如:
Invalid free()
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)
Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)

3.4 When a block is freed with an inappropriate deallocation function
例如:
Mismatched free() / delete / delete []
at 0x40043249: free (vg_clientfuncs.c:171)
by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)

  • If allocated with malloccallocreallocvalloc or memalign, you must deallocate withfree.

  • If allocated with new[], you must deallocate with delete[].

  • If allocated with new, you must deallocate with delete.

linux系统对上述错误可能不在意,但是移值到其他平台时却会有问题。

3.5 Passing system call parameters with inadequate read/write permissions


例如:
Syscall param write(buf) points to uninitialised byte(s)
at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
at 0x259852B0: malloc (vg_replace_malloc.c:130)
by 0x80483F1: main (a.c:5)

Syscall param exit(error_code) contains uninitialised byte(s)
at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
by 0x8048426: main (a.c:8)
Memcheck
检查所有的被系统调用的参数。

  • It checks all the direct parameters themselves.

Also, if a system call needs to read from a buffer provided by your program, Memcheck checks that the entire buffer is addressable and has valid data, ie, it is readable.

Also, if the system call needs to write to a user-supplied buffer, Memcheck checks that the buffer is addressable.

例如:

#include <stdlib.h>
#include <unistd.h>
int main( void )
{
char* arr = malloc(10);
int* arr2 = malloc(sizeof(int));
write( 1 /* stdout */, arr, 10 );
exit(arr2[0]);
}

错误信息:

Syscall param write(buf) points to uninitialised byte(s)
at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
at 0x259852B0: malloc (vg_replace_malloc.c:130)
by 0x80483F1: main (a.c:5)
Syscall param exit(error_code) contains uninitialised byte(s)
at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
by 0x8048426: main (a.c:8)

传递了无效参数到系统函数中。

3.6 Overlapping source and destination blocks

C的以下库函数拷贝数据从一块内存到另一块内存时memcpy()strcpy()strncpy()strcat()strncat()源和目的都不允许溢出。

例如:

==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
==27492== by 0x804865A: main (overlap.c:40)

 

3.7 Memory leak detection

错误信息:

Still reachable: A pointer to the start of the block is found. This usually indicates programming sloppiness. Since the block is still pointed at, the programmer could, at least in principle,free it before program exit. Because these are very common and arguably not a problem, Memcheck won't report such blocks unless --show-reachable=yes is specified.

 

Possibly lost, or "dubious": A pointer to the interior of the block is found. The pointer might originally have pointed to the start and have been moved along, or it might be entirely unrelated. Memcheck deems such a block as "dubious", because it's unclear whether or not a pointer to it still exists.

 

Definitely lost, or "leaked": The worst outcome is that no pointer to the block can be found. The block is classified as "leaked", because the programmer could not possibly have freed it at program exit, since no pointer to it exists. This is likely a symptom of having lost the pointer at some earlier point in the program.

 

posted on 2018-03-02 21:21  tigerloveapple  阅读(8359)  评论(0编辑  收藏  举报