CSC 161 Schedule Readings Labs & Projects Homework Deadlines Resources

Debugging with the GNU Debugger gdb

When running your programs gives unexpected behavior, one often wants to probe your program state and behavior more interactively. The GNU debugger, gdb, powerfully enables us to step inside the write–compile–run cycle so we can fix errors more quickly.

A Debugging Philosophy

A philosophy of debugging begins with the observation that a "bug" is some program behavior that runs counter to our expectations. (The reading on testing reminds us we must first be clear on what those expectations are.) In any case, expectation-violating behavior is generally caused by our assumptions. In particular, at least one of our assumptions (another word for expectations) is not being met. Thus, it is the programmer's task to first generate a set of potentially relevant assumptions being made about the program, and then to eliminate each of them from suspicion. (The assert function is a good way to do this.)

This philosophy is analogous to detective work and the scientific process. A detective wrangles a large set of suspects, so as not to miss the likely culprit, and then works to find incriminating clues while eliminating some individuals from further consideration. Similarly, a diligent scientist will concoct many hypothetical theories to explain some behavior and then conduct experiments to eliminate certain theories as inconsistent with observed behavior.

Debugging != Tweaking

While the initial running of a program has been known to produce helpful and correct results, your past programming experience probably suggests that some errors usually arise somewhere in the problem-solving process. Specifications may be incomplete or inaccurate, algorithms may contain flaws, or the coding process may be incorrect. Edsger Dijkstra, a very distinguished computer scientist, once observed¹ that in most disciplines such difficulties are called errors or mistakes, but that in computing this terminology is usually softened, and flaws are called bugs. (It seems that people are often more willing to tolerate errors in computer programs than in other products.)²

Novice programmers sometimes approach the task of finding and correcting an error by trial and error, making successive small changes in the source code ("tweaking" it), and reloading and re-testing it after each change, without giving much thought to the probable cause of the error or to how making the change will affect its operation. This approach to debugging is ineffective for two reasons:

Tweaking is time-consuming.
Novice programmers tend to have a naïve confidence that the next small change in the source code, whatever it is, will fix the problem. This is seldom the case. If you detect an error in a procedure, and the first tweak doesn't fix it, the next twelve tweaks probably won't either—so don't bother with them. Push yourself away from the keyboard and study the context. Don't make even one more change in the source code until you're ready to test a well-thought-out hypothesis about the cause of the error. (This is also a good time to make a separate copy of the procedure, in Emacs, so that you can backtrack to the current version if subsequent experimentation requires extensive temporary rewriting.)
Tweaking usually fixes only a specific, local problem.
Very often an error is a symptom of a general misunderstanding on the part of the programmer, one that affects the operation of the procedure in cases other than the one being tested. Unless you address this general problem, tweaking a procedure in such a way that it passes the particular test that it formerly failed is likely to make your program worse instead of better.

A much more time-efficient approach to debugging is to examine exactly what code is doing. While a variety of tools can help you analyze code, a primary technique involves carefully tracing through what a procedure is actually doing. In the remainder of this reading, we discuss the most powerful tool for targeted code debugging, the GNU Debugger gdb.

What is GDB?

The GDB manual tells us:

The purpose of a debugger such as GDB is to allow you to see what is going on "inside" another program while it executes—or what another program was doing at the moment it crashed.

GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

  • Start your program, specifying anything that might affect its behavior.
  • Make your program stop on specified conditions.
  • Examine what has happened, when your program has stopped.
  • Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

An Example

Many common errors in C programs arise from the use of pointers and addresses. As an instructive case, we simply and subtly modify a program from the reading introducing pointer parameters.

/* Demonstrate incorrect use of pointer (address) parameters to functions to
   change variables
 */

#include <stdio.h>

void
addressAsParameter (double* addressParameter)
{
   printf ("value of valueParameter at start of addressAsParameter: %lf\n",
          *addressParameter);

   *addressParameter = 543.21;

   printf ("value of valueParameter at end of addressAsParameter: %lf\n", 
          *addressParameter);
} // addressAsParameter

int
main (void)
{
   double* number;
   addressAsParameter (number);
   printf ("value of number after addressAsParameter completed:  %lf\n",
	   *number);
} // main

As it turns out, this code can be made to compile successfully, but when run it produces a so-called "segmentation fault".

Launching GDB

While one can run GDB on any executable, the most useful information is garnered from a program compiled with debugging symbols included. That means you get helpful information about the names of your functions and variables, as well as the corresponding source code line numbers during program execution.

To include such debugging symbols, one provides the -g flag to the clang compiler, like so:

clang -g -o address-param address-param.c

Alternately (and more appropriate for this class), you can add the -g flag to the CFLAGS variable in your Makefile. For example:

# Set appropriate compiler flags
CFLAGS+=-Wall -Werror-std=c11 -g

Your build process would then be the same as normal, using make.

make address-param
clang -I/home/walker/MyroC/include -Wall -std=c11 -g
-L/home/walker/MyroC/lib -lm -lMyroC -lbluetooth -ljpeg -o address-param
address-param.c:23:24:error: variable 'number' is uninitialized when used
      here [-Werror,-Wuninitialized]
   addressAsParameter (number);
                       ^~~~~~
address-param.c:22:18: note: initialize the variable 'number' to silence this
      warning
   double* number;
                 ^
                  = NULL
1 error generated.

Of course, compiling with warnings as errors has caught the underlying issue. (Whew!) Let us suppose for a moment we had forged ahead without the -Werror flag and ignored any warnings.

It is very important to note that adding debugging symbols to your compiled programs is not free. The executable files are bigger (taking more disk), use more more memory when run, and can be noticeably slower. Thus it is good practice not to enable debugging symbols when you are shipping programs to be used in general scenarios long after development.

However, for this class it is perfectly reasonable to build with debugging symbols as your default. Thus, leaving the -g flag in your class Makefile is fine.

Now that we can build debuggable programs, we look at two ways to launch the debugger, from the command line and within Emacs.

From Terminal

Invoking the gdb program simply requires you provide the name of the compiled executable (not the source code file):

gdb address-param
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/address-param...done.
(gdb) 

This shows us the GDB program (not address-param) is running and waiting for us to give it a command. GDB presents the prompt (gdb).

From Emacs

Because GDB will interact with Emacs to show you where in your program code the debugger is currently operating, many developers prefer to use GDB from within Emacs.

First, you need to be sure you have compiled your executable. Once you have done that, with your source code file as the active buffer, navigate to the Tools menu and select Debugger (GDB)... or simply type M-x gdb and press Enter.

The command window will activate and present you with prompt that looks remarkably similar to the command we used directly within terminal, with the exception of an additional i flag that signals GDB to use additional output for Emacs to utilize:

Run gdb (like this): gdb -i=mi address-param

Assuming the default executable name matches, you may simply press Enter to issue the command to launch GDB.

You should now have a new buffer open in the same Emacs window for the GDB session containing the same content we saw above. While it is relatively straightforward to arrange to see both the GDB and your source buffers in the same window, we'll see soon that Emacs will do this for you automatically.

Using GDB for Basic Debugging

There are four basic commands essential to doing the most common bug fixes.

Running your program

Now that we have GDB running, we need to tell it to run our program. This is accomplished with the run or r (abbreviated) command. (We omit the (gdb) prompt in the remaining examples.)

run
Starting program: /tmp/address-param address-param

Program received signal SIGSEGV, Segmentation fault.
0x00000000004006fa in addressAsParameter (addressParameter=0x0)
    at address-param.c:10
10	   printf ("value of valueParameter at start of addressAsParameter: %lf\n",

It might not be comforting now, but it's good to know GDB seems to run into the same problem we saw earlier. (As you might imagine, it can be very inconvenient when "Heisenbugs" disappear the moment you try to inspect them.) Let's examine the meaning of this initial output.

The first line simply tells us GDB is running our program. The next line tells us quite a bit more:

Program received signal SIGSEGV, Segmentation fault.
A segmentation fault happens when a program accesses an invalid memory address. GDB informs us that the operating system sent the SIGSEGV "signal", which corresponds to this error. The default recourse is to immediately stop the program.
0x00000000004006fa
This is an address in memory written in a hexadecimal base, which is more convenient than decimal and more compact than binary. In this case, it represents (unhelpfully in our case), the address of the information about the function containing the offending program instruction. (More on that later.)
in addressAsParameter (addressParameter=0x0)
This more conveniently tells us the particular function being run when the fault occured, addressAsParameter. In addition, the values of any function parameters are given. Because addressParameter is a pointer, GDB displays the address in hexadecimal form. Other parameter types, like int or double are printed in their natural, intuitive format.
at address-param.c:10
Finally, this helpful bit informs us exactly which line of code (in which file) contains the offending expression.

If you launched GDB from within Emacs, you should find that your window is now split between GDB (on the top) and your source code file (on the bottom). Moreover, Emacs displays an arrow next to the offending line of code and has moved your cursor to that line.

Releasing a Process

Normally, the operating system takes over from a process that has terminated and cleans up after it, freeing up the memory that the process was using, closing any files that it was holding open, removing its entry in the process table, and so on. However, when the debugger is running the program, it blocks this normal cleanup process, making it possible for the user to collect some post-mortem data (such as the values of variables at the time of the crash). The cleanup won't proceed until the user either quits the debugger, as described in the preceding section, or releases the process.

To release a process, type the command kill (or just k) at the GDB prompt. Because you lose access to potentially valuable data by doing this, GDB requests confirmation:

kill
Kill the program being debugged? (y or n)

Answering yes (or y) at this point releases the process.

Trying to run the executable a second time without releasing the process in which it ran the first time also loses access to potentially valuable data, so it requires a similar confirmation. (In effect, if you give the run command a second time without first doing kill, gdb inserts the kill operation automatically, and it's this implicit process release that requires confirmation.)

For security, the Linux kernel and perhaps other operating systems randomize the portion of the address space used by each invocation of a program by default. Thus, you might worry that memory addresses reported by GDB might be different with subsequent executions of the program. However, GDB disables this address randomization (though you may re-enable, as that often proves necessary to reproducing bugs). Therefore, you will see the same addresses until you recompile the program, even across different invocations of GDB (not just executions of your program within a single GDB session).

Printing values

One of the simplest capabilities of GDB is also the most powerful: you can interactively probe the values of variables or the results of certain expressions with a print command, abbreviated p.

In our example above, there's not much going on in the expression beginning on line 10:

   printf ("value of valueParameter at start of addressAsParameter: %lf\n",
          *addressParameter);

printf takes a format string, which we have given it, and the type signifier %lf matches the expected type double pointed to by addressParameter; therein lies the problem.

What address does addressParameter refer to? The invocation shown above tells us, but if it did not, we would want a way to directly inquire of any variables or expressions. We do this with the print or p command, which takes as an argument a C expression, evaluating and printing the value of the result. In our case,

print addressParameter
$1 = (double *) 0x0

Here $1 is simply a serial number for the interaction, and the value of the variable appears on the right-hand side of the equals sign. The debugger tries to select the format that is most appropriate to the type of the variable. Since addressParameter is a pointer, GDB specifies its type and writes out the address to which it points as a hexadecimal (base-sixteen) numeral, as in printf's "%p" format.

This command shows us the type (double*) and value (0x0, which is hexadecimal for zero or the NULL pointer) of the address. The special address NULL cannot be dereferenced by any program in C, which leads to the segmentation fault.

Why did this happen? We'll investigate that after mentioning a few other print capabilities.

Variables and Arguments

GDB can also neatly summarize the values of all the local variables or function arguments using the command info, which might save you the need for printing them individually.

info args
addresParameter = 0x0
info locals
No locals.

Advanced Printing

The print command can also evaluate simple expressions containing the variables, including pointer dereferencing, array subscripting, and arithmetic.

The ptype command displays the type of a given variable or simple expression :

ptype *addressParameter
type = double

The command printf allows the user to control the format in which values are printed: It takes any positive number of arguments, separated by commas but not enclosed in parentheses, of which the first is a format-control string following the same conventions as in the standard C library function printf, and the remaining arguments are simple expressions.

Backtrace

As most program problems occur within functions called from other functions called from yet other functions (and so on), it is often very useful to know the complete chain of calls giving rise to the current context. This chain is called the backtrace or stack trace. Like the stacks of trays in the dining hall, it is called such because every time we enter a new function all the bookeeping information gets stacked on top of the current context. Leading to deeper and deeper (or taller and taller, depending on your perspective) stacks.

The GDB manual explains it this way:

The call stack is divided up into contiguous pieces called stack frames, or frames for short; each frame is the data associated with one call to one function. The frame contains the arguments given to the function, the function's local variables, and the address at which the function is executing.

When your program is started, the stack has only one frame, that of the function main. This is called the initial frame or the outermost frame. Each time a function is called, a new frame is made. Each time a function returns, the frame for that function invocation is eliminated. If a function is recursive, there can be many frames for the same function. The frame for the function in which execution is actually occurring is called the innermost frame. This is the most recently created of all the stack frames that still exist.

Inside your program, stack frames are identified by their addresses. A stack frame consists of many bytes, each of which has its own address; each kind of computer has a convention for choosing one byte whose address serves as the address of the frame. Usually this address is kept in a register called the frame pointer register (see $fp) while execution is going on in that frame.

GDB assigns numbers to all existing stack frames, starting with zero for the innermost frame, one for the frame that called it, and so on upward. These numbers do not really exist in your program; they are assigned by GDB to give you a way of designating stack frames in GDB commands.

GDB allows us to query the whole stack at its current context with the backtrace or (abbreviated) bt command.

Continuing the example,

backtrace
#0  0x00000000004006fa in addressAsParameter (addressParameter=0x0)
    at address-param.c:10
#1  0x0000000000400751 in main () at address-param.c:23

This tells us that the innermost frame is addressAsParameter as we saw before. Below that, we see frame 1 was main, which called addressAsParameter at line 23.

In some circumstances, this location (the call from main) may be a useful place to go snooping for answers.

Navigating the stack

The debugger has captured all of the current program state, which means we can probe variables active in other functions on the stack. How do we get there?

A common way to proceed through the stack is by navigating from the innermost to the outermost frame. Unfortunately, our analogy of dining hall trays breaks down because to move up is to move toward the outermost frame, while down is to move toward the innermost frame.

In our example, we move up one frame (the default argument) to reach main

up
#1  0x0000000000400751 in main () at address-param.c:23
23	   addressAsParameter (number);
info locals
number = 0x0

If you're operating GDB within Emacs, you should see that the cursor of your source code changes (along with the display arrow) to the line calling the function from whence you just came (addressAsParameter), which reads

   addressAsParameter (number);

We can now use the print command to tell us the value of the variable number, which we had used as a parameter.

p number
$1 = (double *) 0x0

Not surprisingly (parameters in one context should match their values in another), we see that the double* pointer is also NULL or 0x0.

The proximal cause of the bug is dereferencing a NULL pointer in addressAsParameter The ultimate cause is failing to initialize number with a sensible value in main. If we had compiled with all warnings turned on, and paid heed to them, we likely could have avoided this issue. However, the epilogue points out the deeper issues.

Advanced Debugging with GDB

It is also helpful to be aware of several additional advanced debugging activities available within GDB:

Setting and removing breakpoints

GDB can interrupt the execution of the program at any point, allowing you to inspect the values of variables before resuming. You arrange in advance for this to happen by setting a breakpoint for the interruption before telling GDB to run the program. To set a breakpoint, type the command break (or just b) at the gdb prompt, followed by

You can set as many breakpoints as you want, and you can add new breakpoints whenever you get the gdb prompt.

Here is how to set a breakpoint just before the dereference of the variable addressParameter, for instance:

break 10
Breakpoint 1 at 0x4006f6: file address-param.c, line 10.

If you switch to the Emacs buffer with your program source, you should see a red circle appear alongside the line with your breakpoint. Clicking the red circle toggles the breakpoint off. Conversely, you may graphically enable (and disable) breakpoints in your code by clicking in the left-most column of any line of code.

Now, when we run the program, it stops just before executing that statement:

run
Starting program: /tmp/address-param 

Breakpoint 1, addressAsParameter (addressParameter=0x0) at address-param.c:10
10	   printf ("value of valueParameter at start of addressAsParameter: %lf\n",

When using GDB, it's usually best to do so within Emacs, which will automatically show you the context of each breakpoint it hits. In addition, Emacs will allow you to set breakpoints graphically during a debugging session. Alternatively, if you give the command list (or just l), followed by a function name, or a line number, or a file name, colon, and line number, or a star and address, GDB displays a numbered listing of the source-code file, showing a few lines before the specified point and a few lines after. You can also type list, with no argument, to see the context of the breakpoint at which you're currently stopped.

To resume execution of the program from the breakpoint, type continue (or just c) at the gdb prompt. Within emacs, you may also use the navigation buttons that appear at the top of the window when the GDB buffer is active.

If you set a breakpoint on a statement that is inside a loop or in a function that is called repeatedly, execution is interrupted each time the breakpoint is reached.

The command to remove a breakpoint is clear. You can specify the location of the breakpoint to be removed in any of the four now-familiar ways: by function name, by line number, by file name, colon, and line number, or by star and address. If you give the clear command with no argument, it removes the breakpoint at which you're currently stopped.

The debugger discards all uncleared breakpoints when it exits, but retains them until then, even if it repeatedly kills and restarts the executable that is being debugged.

If you set a number of breakpoints, it is easy to lose track of them. The command info breakpoints (or just i b) displays information about your current break points, including a serial number for each one (in the column headed Num.

Without removing a breakpoint completely, it is possible to disable it temporarily and later reenable it again, provided that you know its serial number. The command disable, followed by a breakpoint serial number, deactivates it, so that it no longer interrupts the program, but does not remove it from GDB's internal tables. The command enable, again followed by a breakpoint serial number, reactivates a disabled breakpoint.

Single stepping

In collecting data about bugs, it is often helpful to walk through some key part of the program one statement at a time, following the control flow carefully and occasionally inspecting variables to keep an eye on the side effects of those statements.

Typing next (or just n) at the GDB prompt directs the debugger to execute a single line of code and then stop again for further directions. Repeating this command takes you through the execution line by line.

Some lines include function calls, and the next command treats those as part of the single statement to be executed, even if the functions themselves are long and complex. Alternatively, it is possible to extend the single-stepping idea to those functions as they are invoked, walking through them one line at a time as well. The step (or s) command is similar to next, but single-steps into function bodies as well.

After executing a line of code, step and next display the next line to be executed before returning you to the gdb prompt.

Monitoring a variable

If you're at a breakpoint inside a function, you can set watchpoints on the parameters and variables declared in that function. (You can set watchpoints for global variables at any time.) A watchpoint monitors the storage location associated with the variable and interrupts the execution of the program every time some operation on that storage location occurs.

The most common kind of watchpoint interrupts execution only when a value is stored into the specified location. To create this kind of “write” watchpoint, type watch, followed by the variable that is bound to that location, at the GDB prompt. For example,

watch tulip
Hardware watchpoint 7: tulip

The similar command rwatch (“read watch”) sets up to interrupt execution every time the value stored in the location is recovered and loaded into the processor. Finally, the command awatch (“access watch”) sets up to interrupt execution whenever the specified storage location is accessed either for reading or for writing.

As an alternative to setting a watchpoint, you may want to use the display command, which asks gdb to print out the value of a variable every time a breakpoint is reached (or, while single-stepping, after each line of code). For example,

display tulip
1: tulip = (unsigned long *) 0xb7ff1380

The analogous undisplay command turns off this facility. Like watch, the display and undisplay commands can be given only while the variables in question are defined in the current scope.

Setting variables

The set command allows you to change the value of a variable while the program is stopped. If you type the word set, followed by a C assignment expression, GDB performs the side effect of that expression:

p *zinnia
$8 = 953
set *zinnia += 2
p *zinnia
$9 = 955

Command history

Like the shell, GDB has a command history Use the up and down arrows ( and ) to navigate linearly through the history. Within Emacs, these keys will move you around in the (mostly write-only) buffer. Instead, you need to pair them with the "meta" keyboard modifier usually used in Emacs (typically Esc or Alt keys).

Epilogue

Early in the semester it will be common for us to pass addresses into procedures. However, one will more rarely be declaring pointer variables. The proper way to have written this code was given in the reading introducing pointer parameters.

When we declare a pointer variable, we only get space for storing an address (i.e., a pointer to a double). We do not get a valid memory address in which to store a value (i.e., a double). Thus, the proper paradigm now is to declare the double

   double number;

and find its address with the ampersand operator, &.

   addressAsParameter (&number);

The original, buggy version is rather like having a paper form with the little set of boxes on it where you can write the address of your residence—you could put anything there, but it doesn't mean it will refer to an actual physical space. The correct way given above gets you an actual house!

Summary reference

GDB Command Abbreviation Result Manual Entry
run [args] [args] Run program (using command line arguments [args] if given) Running
backtrace bt Print a backtrace of the entire stack Backtrace
print expr expr Print the evaluation of expr Data
info locals i locals Display the names and values of local variables Frame Info
info args i args Display the names and values of function arguments Frame Info
up [n] Move n frames up the stack; if no argument, n defaults to 1 Selection
down [n] Move n frames down the stack; if no argument, n defaults to 1 Selection
step [count] [count] Continue to next line of execution (in any stack frame), (optionally) repeating count times Continuing and Stepping
next [count] [count] Continue to next line in the current stack frame, (optionally) repeating count times Continuing and Stepping
continue c Resume program execution Continuing and Stepping
break [location] Set a break point at location or the next instruction to be executed Breakpoints
watch [expr] Set a watch point for an expression (e.g., variable) Setting Watchpoints

In addition to these basic after-the-fact tools, before running your program it can be very useful to set breakpoints, which allow you to suspend execution before the program crashes and step through bit by bit as you investigate. You can even set watchpoints so that the program is suspended when an expression's value changes.

Emacs provides useful graphical versions of these capabilities natively, but the GDB manual also provides useful, direct commands for doing this:

Inside GDB, your program may stop for any of several reasons, such as a signal, a breakpoint, or reaching a new line after a GDB command such as step. You may then examine and change variables, set new breakpoints or remove old ones, and then continue execution. Usually, the messages shown by GDB provide ample explanation of the status of your program—but you can also explicitly request this information at any time.

PostScript: A final "note"

Finally, to truly understand the debugging process, you must sing the GDB song.


Notes

  1. Edsger Dijkstra, "On the Cruelty of Really Teaching Computer Science," Communications of the ACM, Volume 32, Number 12, December 1989, p. 1402.
  2. Paragraph modified from Henry M. Walker, The Limits of Computing, Jones and Bartlett, 1994, p. 6.