Sunday, August 14, 2016

Transition from C to Assembly: The Basics (x86 64bit)

In this post, I will go over how a simple C program translates to x86 64bit assembly.

Consider classical helloworld.c program below:


#include <stdio.h>
int main() {
  printf("hello world\n");
  return 0;
}

Let us compile it.
$ gcc -g helloworld.c

Now, we will output its assembly using gdb:
$ gdb a.out -q
(gdb) disass main
Dump of assembler code for function main:
   0x0000000100000f6b <+0>: push   rbp
   0x0000000100000f6c <+1>: mov    rbp,rsp
   0x0000000100000f6f <+4>: lea    rdi,[rip+0x2c]        # 0x100000fa2
   0x0000000100000f76 <+11>: call   0x100000f82
   0x0000000100000f7b <+16>: mov    eax,0x0
   0x0000000100000f80 <+21>: pop    rbp
   0x0000000100000f81 <+22>: ret
End of assembler dump.

Note that you may get different result, depending on your compiler and platform OS. I am compiling with gcc5.3.0 running on Mac OS X 64-bit. On Unix systems, the result should be very similar, although if you are running on 32-bit computer, you will be getting ebp, esp, etc instead of rbp, rsp, etc. Lastly, if you want to switch between Intel vs AT&T style assembly code, please refer to my previous post.

Let us go over each instruction step by step.
<+0> push rbp simply pushes the value of rbp, the frame pointer or base pointer onto the stack. rbp stores the base frame address for the function, and because main() is called, it is saving its previous base frame onto the stack, so that when main() returns, it can go back to the previous function. The previous function, of course, would be some system call that initiates the program.

<+1> mov rbp, rsp will move rsp into rbp. Now that rbp was saved from the previous instruction, we are now safe to store the current frame pointer into rbp. Note that rsp is the stack pointer, which stores the current address of the stack. The value of rsp will increment for each pop instruction and decrement for each push instruction---remember that stack grows from the high-address to low-address, so each push will decrement the address of the current stack pointer. After this instruction, rsp and rbp hold the same value, which is the base of the main() frame.

<+4> lea rdi, [rip+0x2c] will load the value 0x100000fa2 into rdi register. Note that this address contains what we want to print, "hello world". To see this, simply run
(gdb) x/s 0x0x100000fa2
0x100000fa2: "hello world"

<+11> call 0x100000f82 will call the appropriate system call to print out the string.

<+16> mov eax, 0x0 will move 0 to eax register, which will be return value from main().

<+21> pop rbp will pop the content of the memory address pointed by rsp into rbp. This content will simply be the previous value of rbp, pushed onto stack during instruction <+0>. Thus, it will restore the rbp value.

<+22> ret will finally return from the function by popping the value pointed by rsp into rip, which is the instruction pointer. Note that the caller who called main() has already stored this value onto the stack.

No comments:

Post a Comment