valgrind  is a powerful tool for detecting memory management problems in programs. The kinds of problems it can detect are often very difficult to find by other means and often cause difficult to diagnose crashes. Valgrind can be used with existing executables without recompiling or relinking, although the output it produces will be much more useful if you have compiled with the -g flag.
Valgrind is basically an x86 emulator that checks all reads and writes of memory, intercepts all calls to allocate and deallocate memory. The memcheck tool of valgrind (which is the main tool and the only one covered in this chapter) can detect the following:
A note to those running the 2.6 version of the Linux kernel: Valgrind versions ≤ 2.1.1 do not work on the 2.6 kernel. You either need a later version (which have not been released as of the time of this writing) or a version compiled from CVS. You also need to run sysctl -w kernel.vdso=0 as root in order for valgrind to run.
A simple program that contains nine different tests, each of which shows an example of an error that valgrind can catch, is included in an appendix. You can compile it with the command g++ -g valgrind-tests.cc -o valgrind-tests, and can run it by specifying the test number (1-9) at the command line (e.g. valgrind-tests 2).
Let me walk through running one of the tests under valgrind and the output it produces. To run test 2 of the valgrind-tests program in the appendix under valgrind, run valgrind --logfile=valgrind.output ./valgrind-tests 2. (Note that in the development versions of valgrind, --tool=memcheck must also appear on the command line; it is no longer assumed by default.) The logfile is not necessary, but I prefer the output of my program and the output from valgrind to be separated. Also, the --logfile option is slightly misleading in that the string you specify is not actually the logfile used, but rather is just part of the name of the logfile. A string of the form .pid5313, where 5313 is the process id number of the valgrind-tests program when it runs, will be appended. The code that this specific test runs is
int * i = new int; delete i; *i = 4; // Error, i was already freed
After you launch valgrind like this from a terminal, the log file contain a number of messages. The beginning of the logfile will look something like
==5313== Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux. ==5313== Copyright (C) 2002-2003, and GNU GPL'd, by Julian Seward. ==5313== Using valgrind-2.0.0, a program supervision framework for x86-linux. ==5313== Copyright (C) 2000-2003, and GNU GPL'd, by Julian Seward. ==5313== ==5313== My PID = 5313, parent PID = 4709. Prog and args are: ==5313== ./valgrind-tests ==5313== 2 ==5313== Estimated CPU clock rate is 803 MHz ==5313== For more details, rerun with: -v
Note that each line begins with a ==5313== (the 5313 is the process id number and thus will be different each time the program runs). The reason for this flag on each line is that the output of the program is normally interspersed with the output of valgrind, and a way to tell the output from the two programs apart is needed. Of course, that need does not exist when a log file is specified as we have done, but valgrind still includes it anyway. The other information on the lines printed by valgrind should be clear.
After this header, valgrind prints any errors that it comes across. The test case we ran contained an invalid write to an already freed chunk of memory, so the messages from valgrind in the log file reflect that:
==5313== Invalid write of size 4 ==5313== at 0x8048A27: test_2() (valgrind-tests.cc:37) ==5313== by 0x8048CDF: main (valgrind-tests.cc:134) ==5313== by 0x215BBE: __libc_start_main (in /lib/libc-2.3.2.so) ==5313== by 0x8048910: (within /home/newren/examples/valgrind-tests) ==5313== Address 0x1B3E024 is 0 bytes inside a block of size 4 free'd ==5313== at 0x5419C5: __builtin_delete (vg_replace_malloc.c:244) ==5313== by 0x5419E3: operator delete(void*) (vg_replace_malloc.c:253) ==5313== by 0x8048A20: test_2() (valgrind-tests.cc:36) ==5313== by 0x8048CDF: main (valgrind-tests.cc:134) ==5313== by 0x215BBE: __libc_start_main (in /lib/libc-2.3.2.so) ==5313== by 0x8048910: (within /home/newren/examples/valgrind-tests)
The way to read the first five lines of this is that an invalid write occurred (to some data structure that was four bytes long) at line 37 of valgrind-test.cc (which was called from line 134 of valgrind-tests.cc, which was called from __libc_start_main, which is the outermost function of the valgrind-tests program). Line 37 of valgrind-test.cc corresponds to the *i = 4 statement, which came right after the delete i statement in the code. Valgrind then tries to be more helpful and state why the write was invalid. The extra information it provides states that the invalid write was to a freed block (and valgrind even goes so far as to state how big that freed block was and where inside that freed block the invalid write occurred to), and then lists the stack trace of functions involved in freeing that block (that block was freed at line 244 of replace_malloc.c, which was called in line 253 of vg_replace_malloc.c, which was called by line 36 of valgrind-test.cc (which is where the delete i statement was in the valgrind-test.cc program), which was called by...I think you get the picture).
At the end of the log file, a summary of errors that valgrind found is printed.
==5313== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) ==5313== malloc/free: in use at exit: 0 bytes in 0 blocks. ==5313== malloc/free: 1 allocs, 1 frees, 4 bytes allocated. ==5313== For counts of detected errors, rerun with: -v ==5313== No malloc'd blocks -- no leaks are possible.
This output is self-explanatory for the most part. The suppressed comment refers to the fact that valgrind allows errors to be suppressed--something that can come in handy since valgrind also reports on errors in all libraries that your application is linked to.
At this point, you really know everything you need to know to get started with valgrind; it really does not require much more information than knowing that you just add "valgrind" (plus maybe some options) at the beginning of your command line and then reading the output that is produced. A good thing to do at this point is to run the nine tests in the valgrind_tests program in the appendix under valgrind and get used to valgrind's output for different types of errors.
Valgrind does have a few notable disadvantages which are helpful to be aware of:
There are many other options and ways in which you can run valgrind. You can learn more at http://valgrind.kde.org/. Also note that there are graphical frontends to valgrind, among them alleyoop, a front-end using the Gnome libraries. In the remainder of the section, I will try to list some of the more common options to pass to valgrind.
Errors from the usage of uninitialized memory are a little bit different in that they are not triggered instantly, meaning that do not occur when copying uninitialized values to uninitialized locations. This means that you may need to check one of the functions further down the stack trace than the top one listed by valgrind. Sometimes, the default of valgrind to list no more than 4 functions in the stack trace will not be enough to see where the real cause of the problem was. You can increase the number of functions that valgrind lists with the --num-callers= option. Also, note that using a single uninitialized variable can result in many errors if that value is copied multiple times--something that can easily happen if that value is passed to another function. An example of this is the first test case of the valgrind_tests program.
By default, valgrind does not check for memory leaks. This can be changed by specifying the --leak-check=yes option. Even without this option, however, valgrind's end summary will still state whether memory is still in use, and, if there is, suggest that the user rerun with --leak-check turned on. Note that valgrind has a somewhat confusing default of merging reported leaks based on ignoring all but a few frames of the stack trace. This can be turned off by also specifying --leak-resolution=high.
There is also a --gdb-attach=yes option which allows one to attach gdb to the running program when an error is encountered in order to learn more about what is going on. Note that this option conflicts with the use of any log file, and it will probably require that the path to gdb be specified by using --gdb-path=/path/to/gdb.
Finally, a few other options of note are the -v option for more verbosity, a -fno-inline option for C++ which makes it easier to see the function-call chain, a --gen-suppressions=yes option to help in the generation of suppressions files. (Suppressions files are a longer topic than I want to cover in this short tutorial, but if you ever find that valgrind displays lots of errors for a library that you are linked to but which you do not want to see the errors for, then this option can come in handy), and a --skin=addrcheck option (the syntax has changed to --tool=addrcheck in development versions) which cause valgrind to do fewer checks but which runs about twice as fast and uses less memory.
Gnome applications tend to have deep stack traces, much of which comes from the Glib main loop. So it tends to be important to specify a large value for --num-callers (say, 40 or so, just to be safe). Also, if checking for leaks, be sure to specify --leak-resolution=high.
Common valgrind options
Option : --num-callers=number
Purpose : Determines the number of function calls (i.e. depth of stacktrace) to display as part of showing where an error occurs within a program. The default is a measly 4.
Option : --leak-check=yes
Purpose : Enabling leak checking has valgrind search for memory leaks (i.e. allocated memory that has not been released) when the program finished.
Option : --leak-resolution=high
Purpose : An option that should be used when doing leak checking since all other options result in confusing reports.
Option : --show-reachable=yes
Purpose : An option that makes leak checking more helpful by requesting that valgrind report whether pointers to unreleased blocks are still held by the program.
Option : -v
Purpose : Run in more verbose mode.
Option : -fno-inline
Purpose : An option for C++ programs which makes it easier to see the function-call chain.
Option : --gen-suppressions=yes
Purpose : A simple way to generate a suppressions file in order to facilitate ignoring certain errors in future runs of the same code.
Option : --skin=addrcheck
Purpose : (Note that the name of this option has become --tool and has become mandatory for the development release). This selects the specific tool of valgrind that will run. Memcheck (the only tool covered here) is the default.
Option : --logfile=file-basename
Purpose : Record all errors and warnings to file-basename.pidpid