A Radare2-based Analysis of Pointers to an Array in C

Pointers and arrays are a confusing subject in C. Pointers can be of different types, which makes possible pointer arithmetic to access necessary elements in an array in different ways.  Consider the following examples:
int *A[5]; This is an array of 5 pointers to integers.
int (*B)[5]; This is a pointer to an array of 5 integers.
We are interested in the latter one, so let's analyze the following code:

It will print all elements of the array using different pointer dereferencing strategies.

If the variables stored in the memory are analyzed from the assembly perspective, it becomes very clear what pointers actually do.

To begin with, we should consider the notion of a variable and lvalues. In the C code above, variables "arr", "pi", "pa" stand for a synonym of an address. So after initializing the array "arr" on the stack, the memory will look like this:

So "arr" is just a synonym for 0x7fffffffe2d0. Similarly, initialized pointers will look like this:

and should be read as:

Because "pa" is just a name for 0x7fffffffe2f8 and "pi" is a name for 0x7fffffffe2f0. Notice that both "pa" and "pi" store the same value: the address  (0x7fffffffe2d0) of the first element (10, hexadecimal "A") of the array. So both pointers point at the same place - at the beginning of the array. However, they are of different type (int * and int (*)[5]), so Incrementing "pi" by 1 yields  0x7fffffffe2d4, incrementing "pa" by 1 yields 0x7fffffffe2e4. In the first case, 4 bytes (the size of one integer) were added, in the second: 20 bytes (the size of 5 integers) .

Anyways, coming back to the original code. Before returning, the main function calls "printf" with 8 arguments. It's a good example to see how the argument passing conventions are applied. According to "System V Application Binary Interface: AMD64 Architecture Processor Supplement" (Linux AMD64 ABI), the arguments to a function a passed via registers and the stack. Our printf will be called using the following arguments:

Arg. # 1 2 3 4 5 6 7 8
Arg. string i i arr[i] i *(pi+i) i (*pa)[i]
Register RDI RSI RDX RCX R8 R9 Stack Stack

Note that last two arguments are pushed to the stack in the opposite order, so when the called function takes control over execution, it pops out the first argument, because it was on the top of the stack, then the second and so on. It was done to handle functions with variable number of arguments (like printf).

Eventually, I analyzed code with radare2 and renamed all local variables on the stack. On the right side there are debugging comments from GDB, interlaced with my comments on pointers. RBX register and variable "N_Minus_One" are not used, so consider them as junk.

How about removing initialized but unused variables? Let's compile with an optimization flag:

This will yield the following assembly code:

That's amazing, there is no stack frame initialized, unnecessary pointer calculations are omitted, variable "i" is stored in register RBX and there is just a single method to access array elements.

Leave a Reply

Your email address will not be published. Required fields are marked *