Peeking Into The Assembly


Assembly is tough. How does the processing of c code takes place in assembly? What are the kinds of instructions that we could see? What is actually happening in this gigantic mesh of lines?

This is my small step to look into the assembly code of a small program/ replicating a small segment of code that was explained very well by Ben Eater’s Comparing C to Machine Language Yt video (link in the reference section).

Functions involved from the start of execution to the end:

The code:

In assembly:

Before we jump into that, let’s just look at some common instruction sets that we encounter through this blog.

%rbp, %rsp are special purpose registers

%rbp is the base pointer which points to the base of the current stack frame

%rsp is the stack pointer which points to the top of the current stack frame

%rbp always has a higher value than %rsp because the stack starts at high memory address and grows downwards.

%eax, %ecx are general purpose registers

eax= Extended AX register ( a 32 bit register )

AX is 16 bits wide, the high byte can be accessed with Ah and the low byte with AL

RAX  is 64-bit register

Now coming to the code… (Please refer to the image and code parallelly.)

push %rbp
mov %rsp, %rbp


These are function prologue or preamble.
First, push the old base pointer onto the stack and save it for later.
Then copy the value of the stack pointer to the base pointer.
now %rbp points to the base of the main’s stack frame.

sub $0x10,%rsp

This instruction allocates space on the stack,

movl  $0x0, -0xc(%rbp)
movl  $0x1, -0x8(%rbp)


The parentheses indicate the memory address. Here rbp is the base register with -0xc displacement. This simply means %rbp  + -0xc i.e subtracting c(12) from the base pointer which moves to the current stack frame where the value 0 is stored. Comparing it with the assembly code, we find that the value of x is 0 and the value of y is 1, i.e at %rbp –  0xc the value of x is stored and at %rbp  – 0x8 the value of y is stored.

mov -0xc(%rbp),%eax
mov %eax,%esi
lea 0xe95(%rip),%rax
mov %rax %rdi
mov $0x0,%eax
call 0x1050 <printf@plt>



In the first line, we can see that the value stored in 0xc i.e value of x is being brought into eax. Here eax is one of the general-purpose registers.

In the Second Step, esi is a source index pointer that is used for strings and memory array copying. In this case, the memory array of eax is being copied into esi.

In the third line, lea means load effective address. lea moves the contents of the designated memory into the target location. Here the target location is rax i.e it will move the contents of 0xe95(%rip) to rax. Here rip is a special-purpose register and instruction pointer. So this instruction says, take the value stored at 0xe95 and load it into rax.

In the fourth line, registers such as %rdi are commonly known as caller-save registers i.e they are not necessarily saved across function calls. %rdi is also used to pass the first six integer or pointer parameters to called functions.  %rax is usually used to store the function return value. These instructions are used to call a function. To call a function the program should place the first six integer or pointer parameters in the register. Here %rdi is doing the job.

In the fifth line, 0 is being written into eax register. Here eax is a general-purpose register. The x86 calling convention dictates that a function’s return value is stored in %eax, so the above instruction sets us up to return 0 at the end of our function.

In the sixth line, This instruction calls the printf function

mov -0xc(%rbp),%edx
mov -0x8(%rbp),%eax
add %edx,%eax
mov %eax, -0x4(%rbp)


Now we are doing the z=x+y; operation. Here the values stored in x i.e at -0xc(%rbp) and y at -0x8(%rbp) are copied into edx and eax respectively.

Once that’s done, the next operation adds the values and stores the result in %eax. Once the addition is done, the result that was stored in eax is copied to -0x4(%rbp).

mov -0x8(%rbp),%eax
mov %eax,-0xc(%rbp)

mov -0x4(%rbp),%eax
mov %eax,-0x8(%rbp)


In these instructions we are loading the values into eax from respective memory locations and placing them in -0xc(%rbp) and -0x8(%rbp) respectively. They are nothing but the locations of x and y, and hence the following segment of code is being executed.  Here -0x4(%rbp) is where z is stored.

x=y;
y=z;


Once that’s done, the compiler would be moving to the next step of execution, that is while(x<255); . The assembly instruction for the above line of code is cmpl $0xfe, -0xc(%rbp). Here cmpl is an instruction that indicates to comparison of double word.

jle 0x1163 <main+26>
jmp 0x1155 <main+15>


The last 2 instructions are jle, i.e jump if less than or equal and jmp where it jumps to label

So, that was some description on assembly code of a small code. I do agree there are many stones left unturned here. It was a good exploration for me. Hope you find this helpful.

The links in the reference section would definitely help you to dive more deeply.

Thanks Y’all. Do share your opinion and don’t forget to follow to grab the first notification of my new blog posts.

References:
1) Ben Eater  –  Comparing C to Machine Language: https://www.youtube.com/watch?v=yOyaJXpAYZQ
2) x64  cheat sheet – https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf
3) https://www.recurse.com/blog/7-understanding-c-by-learning-assembly
4) https://medium.com/swlh/how-does-hello-world-actually-work-73a557be16eb
5) https://cs61.seas.harvard.edu/site/2018/Asm1/
6) A good number of tabs of stack overflow.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.