Address Space Layout

Nini: “Niko, come here for a moment. I have a small task for you.”

She points to a short C program on her screen:

#include <stdio.h>
#include <unistd.h>

int global_variable = 43;

void zzz(void) {
    int local_variable = 42;
    printf("local_variable addr = %p\n", (void *)&local_variable );
    printf("global_variable addr = %p\n", (void *)&global_variable );
    fflush(stdout);
    while(1);
}

int main(void) {
    zzz();
    return 0;
}

Press Ctrl+C to terminate the process, otherwise it will keep running in the while loop.

Nini: “Compile and run this program for two times. Pay close attention to the addresses it prints.”

Niko typed the commands and screamed: “Le Mao! Nini, this is strange. The addresses for local_variable and global_variable are different each time I run the program! Why does that happen?”

Nini: “What you see is not a bug, but a very important security feature. Let’s walk through this together. This is a piece of fish.”

You will now follow along with Niko to see how a process’s memory is organized.

Exercise: Address Space Layout

We will investigate how a program’s memory is organized. Figure 4.1 in OSTEP shows the classic layout of a process: code, data, heap, and stack. We’ll use the program Nini showed Niko to pause execution and see this structure for ourselves.

Write a sleeping program

“Alright,” Nini begins, “First, you need to create that program on your own machine to experiment. Go ahead and create a file named zzz.c with the code I just showed you.”

Now, compile the program. The -g flag is important because it adds debugging information.

gcc -g -o zzz zzz.c

Run and get the process id

“Good! Now, run your program. We’ll put it in the background with & so we can continue using the terminal. Then, we immediately ask the shell for the Process ID (PID) of the job we just started using $!.”

In your terminal, you should run these commands:

./zzz &          # run in the background
echo $!          # prints the PID; copy this number

Make sure to copy the PID. You will need it for the next steps. Niko scribbles it down on a notepad. “Okay, I have the PID! Now what?”

Look at the stack content with gdb

“Shio Dan Ji Lie!! What are we doing now?” asks Niko, excitedly.

“Patience, Niko. We will use gdb to look inside its memory.” Remember to replace <PID> with the actual PID you copied.

gdb -q -p <PID>

Once inside gdb, Nini wants you to inspect a few things:

  • The stack pointer register (rsp), which points to the top of the stack.
  • The location of our global_variable.
  • The complete memory map of the process.
  • The raw data currently on the stack.

Type the following commands into the gdb prompt:

(gdb) info registers rsp            # See the current stack pointer address
(gdb) info address global_variable  # Which memory segment is this in?
(gdb) info proc mappings            # Show the full memory map
(gdb) x/10gx $rsp                   # Examine 10 quad-words (64-bit values) from the stack
(gdb) detach                        # Detach from the process
(gdb) quit                          # Exit gdb

When you run x/10gx $rsp, look closely at the output. You should see the value 42 (which is 0x2a in hexadecimal) somewhere in the output. Because computers use little-endian byte order, you might see it as part of a larger number like ...0000002a or 0000002a.... That’s your local_variable!

Note: Run fg then press ctrl+c to free your CPU from while loop!

View the address-space layout

Linux provides a virtual file system called /proc. It lets us see information about running processes. Let’s use my favorite command, cat, to view the memory map file for your process.”

Replace <PID> with your process’s PID again.

cat /proc/<PID>/maps

You will see output showing different memory regions, their address ranges, and permissions (r=read, w=write, x=execute). It should look something like this:

55ee05f42000-55ee05f43000 r--p … /path/to/zzz      # program header
55ee05f43000-55ee05f44000 r-xp … /path/to/zzz      # .text (code)
55ee05f44000-55ee05f45000 r--p … /path/to/zzz      # .rodata (read-only data)
55ee05f45000-55ee05f46000 rw-p … /path/to/zzz      # .data (global_variable is here)
7ffcd2f9d000-7ffcd2fbe000 rw-p … [stack]          # local_variable is here

There’s another useful command, pmap, that shows similar information in a slightly different format.

pmap -x <PID>

“Do you see, Niko?” Nini asks. “The global_variable is in the .data segment, and the local_variable is on the [stack]. The layout is just like the textbook diagram, but with real addresses.”

Questions

“Now, Niko, to prove you’ve understood the reason for your original confusion,” Nini says, looking at him expectantly, “answer these questions for me.”

  1. Why does the numeric address of both local_variable and global_variable change each time you rerun the program, even though the data segment still appears below the stack segment?
  1. In /proc/<PID>/maps, which segment normally holds global_variable and what are its typical access permissions?

  2. Bonus: The stack region is usually marked non-executable by the kernel to prevent inject-and-run shellcode. To sidestep this protection, attackers use Return-Oriented Programming (ROP) to glue together short instruction snippets (or: gadgets) in the .text region. Code injection is usually achieved with Buffer Overflow and the injected code is run with ROP. Explain what Buffer Overflow is, and how ROP works after reading the two links.

Back to top