Earlier, while writing my compare strings method, I made a mistake in the code and came across a segmentation fault. Based on how the program executed I was pretty sure of approximately where the error was occurring, but rather than go and find the mistake I thought it would be a lot more useful to step through the program in the debugger and examine the problem that way. By doing this I’ll make it easier for myself to debug similar (more complex) problems in the future.
Segmentation Faults
What are they?
Wikipedia more or less defines a segfault as “an attempt to access memory that the CPU cannot physically address”. Typically the hardware notifies the operating system about a memory access violation, so the kernel sends a signal to the process which caused the exception.
In English
Your program is trying to access something in memory. The hardware, OS, or some other component has decided that the memory you want to access does not belong to you or could be potentially harmful for you to access. So it politely tells you that you are not allowed.
How does this happen?
Well, it could be that you’re just being a dick and trying to access memory that doesn’t belong to you. Is that what you’re doing? … No? .. Ok, well then probably you just made a mistake when you were performing some memory-related operation. For instance, perhaps you treated an integer as a pointer and passed it to a string-related operation. Or maybe you copied 150 bytes of data into 100 byte buffer and smashed the stack. Whatever the case may be, you can be certain it’s related to some sort a memory-operation; unfortunately programming involves a lot of those.
Example code
Instead of using the large code sample that I was working on when the problem occurred, I’ve created a shorter sample in assembly that will generate a buffer overflow. The sample could be made even shorter, but I wanted a realistic example and I also wanted to keep the comments in the code so it’s easier to follow.
.data Str1: .asciz "Segfault's are awesome!\n" .text .globl _start .type PrintString, @function _start: # for stepping through the debugger nop # string length is stored in rbx movq $25, %rbx # push the arguments onto the stack pushq %rbx pushq $Str1 # print the string call PrintString # restore the stack pointer addq $8, %rsp # exit() jmp ExitProgram ###################################### # print the string # @param str1 # @param strlength ###################################### PrintString: # save the current base pointer by pushing it onto the stack pushq %rbp # move the base pointer to the top of the stack movq %rsp, %rbp # retrieve our arguments from the stack movq 16(%rbp), %rcx movq 24(%rbp), %rdx # print the string movl $4, %eax movl $1, %ebx int $0x80 # return ret ExitProgram: movl $1, %eax movl $0, %ebx int $0x80
Running the example
Here is how to run the example and what that looks like:
# as -gstabs -o Segfault.o Segfault.s # ld -o Segfault Segfault.o # ./Segfault Segfault's are awesome! Segmentation fault
Stepping through with the debugger (gdb)
Digging in
So clearly we have a problem in this code. Let’s load up gdb and find out why.
gdb ./Segfault
Setting a breakpoint
Let’s make some inferences about where to set our breakpoint. Str1 definitely prints out before the program crashes. The program is pretty short, so why not just set our breakpoint after that line? We’ll use list and breakpoint to do this.
(gdb) list PrintString 32 # @param str1 33 # @param strlength 34 ###################################### 35 PrintString: 36 # save the current base pointer by pushing it onto the stack 37 pushq %rbp 38 39 # move the base pointer to the top of the stack 40 movq %rsp, %rbp 41 (gdb) 42 # retrieve our arguments from the stack 43 movq 16(%rbp), %rcx 44 movq 24(%rbp), %rdx 45 46 # print the string 47 movl $4, %eax 48 movl $1, %ebx 49 int $0x80 50 51 # return (gdb) break 49 Breakpoint 1 at 0x4000df: file Segfault.s, line 49.
Find the segfault
Now that we have a breakpoint just before the string is printed out, let’s run the program and find the exact line that causes the segfault.
(gdb) run Starting program: /root/assembly/Segfault Breakpoint 1, PrintString () at Segfault.s:49 49 int $0x80 (gdb) s Segfault's are awesome! PrintString () at Segfault.s:52 52 ret (gdb) s Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? ()
So after the breakpoint we stepped through the program twice and we can see that line 52 is causing the problem. The ret statement is causing some problem that results in the segmentation fault. More specifically, it looks like the instruction pointer (EIP/RIP) is being pointed to 0×0.
Examining the stack
Well, we know that when a function is called, the next instruction that should execute within the calling function is stored on the stack. That way when the function call returns it can simply restore the memory address on the stack into the EIP register and we’re suddenly back to the position where the function was called from. So with this theory, we know that ret basically pops the return pointer off of the stack and into the EIP register. Why don’t we try restarting the program and examining the stack (ESP/RSP) just before the ret instruction is run.
(gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /root/assembly/Segfault Breakpoint 1, PrintString () at Segfault.s:49 49 int $0x80 (gdb) s Segfault's are awesome! PrintString () at Segfault.s:52 52 ret (gdb) x/2xg $rsp 0x7fffffffe4a0: 0x0000000000000000 0x00000000004000c3
This is a 64-bit machine, so here we are examining two 64-bit values on the stack. We can see that the first value is 0×0; this is the value at the top of the stack. The following value, 0x00000000004000c3 is the next value on the stack, and if we examined further we could review the full stack if we wanted to (including other frames). For now let’s focus on 0×0, since that seems to be what is popped off and what’s causing our problem.
Verify the stack value is being popped into EIP/RIP
Let’s just demonstrate how executing the ret actually does pop that value into EIP/RIP by doing this..
(gdb) print /x $rip $1 = 0x4000e1 (gdb) s Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) x/2xg $rsp 0x7fffffffe4a8: 0x00000000004000c3 0x00000000006000f0 (gdb) print /x $rip $2 = 0x0
We can see that the RIP register clearly changes from 0x4000e1 to 0×0 and that 0×0 was removed from the stack. Notice how 0x00000000004000c3 is now at the top of the stack instead of being the second value on the stack like we saw before.
Looking up the memory address
Based on the fact that we know 0×0 is an invalid memory location, let’s see if this next value on the stack is valid using the disassemble command in gdb.
(gdb) disassemble 0x00000000004000c3 Dump of assembler code for function _start: 0x00000000004000b0 <+0>: nop 0x00000000004000b1 <+1>: mov $0x19,%rbx 0x00000000004000b8 <+8>: push %rbx 0x00000000004000b9 <+9>: pushq $0x6000f0 0x00000000004000be <+14>: callq 0x4000c9 <PrintString> 0x00000000004000c3 <+19>: add $0x8,%rsp 0x00000000004000c7 <+23>: jmp 0x4000e2 <ExitProgram>
Well, what do you know. The highlighted line above shows that 0x00000000004000c3 is the correct memory location for the line right after our call to PrintString. So when PrintString calls ret, it should actually be popping off 0x00000000004000c3 and not 0×0.
Finding the mistake
At this point we know that something has been added to the stack AFTER the return address and has not been popped back off. Since it was added after the return address we can be pretty confident that it was added inside of the PrintString function. Let’s take a look at the PrintString code.
(gdb) list PrintString 32 # @param str1 33 # @param strlength 34 ###################################### 35 PrintString: 36 # save the current base pointer by pushing it onto the stack 37 pushq %rbp 38 39 # move the base pointer to the top of the stack 40 movq %rsp, %rbp 41 (gdb) 42 # retrieve our arguments from the stack 43 movq 16(%rbp), %rcx 44 movq 24(%rbp), %rdx 45 46 # print the string 47 movl $4, %eax 48 movl $1, %ebx 49 int $0x80 50 51 # return
Sure enough, in the highlighted line above you can see where we pushed the base pointer onto the stack using pushq %rbp. At the time the base pointer was set to 0×0, so that’s what we’re seeing on the stack in gdb. Unfortunately we never popped this value back off, so it’s causing a segfault when ret is called.
Fixing the code
Fixing this problem is incredibly easy as it turns out. Just pop the stored value of the RBP register back into RBP before returning from the function. It’s a one line fix that should be placed just before the ret. Now when ret executes it will pull the correct return address off the stack and everything will run as expected.
popq %rbp ret
You’re done!
Hopefully this guide helped you understand how you can examine the stack to find segfaults and other problems with your code. The GNU debugger is a powerful tool and the more you know the easier it becomes and the faster you can get back to writing code!
Have fun!