Thursday, October 31, 2013

C to Assembly

From C To Assembly Language LG #94
From C To Assembly Language

1. Overview

What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.

This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).

The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.

The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.

Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.

Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

2. Getting Started

To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.

The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.

Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eax, ebx, ecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

movl $10, %eax
This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.

Our first assembly language program, stored in a file named first.s is shown in Listing 1.

#Listing 1
.globl main
main:
  movl $20, %eax
  ret
This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.

The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

3. Arithmetic, Comparison, Looping

Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

#Listing 2
.globl main
main: 
 movl $5, %eax
 movl $1, %ebx
L1: cmpl $0, %eax  //compare 0 with value in eax
 je L2   //jump to L2 if 0==eax (je - jump if equal)
 imull %eax, %ebx // ebx = ebx*eax
 decl %eax  //decrement eax
 jmp L1   // unconditional jump to L1
L2:  ret
L1 and L2 are labels. When control flow reaches L2, ebx would contain the factorial of the number stored in eax.

4. Subroutines

When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3 illustrates subroutine call and return in assembly language programs.

#Listing 3
.globl main
main:
 movl $10, %eax
 call foo
 ret
foo:
 addl $5, %eax
 ret
The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.

Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

5. Using The Stack

A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack. Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.

Figure 1 Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
Figure 2 The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.

The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.

In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

6. Allocating Space for Local Variables

It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.

The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

#Listing 4
.globl main
main:
 call foo
 ret
foo:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 movl $10, -4(%ebp)
 movl %ebp, %esp
 popl %ebp
 ret
First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebp, esp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

Table 1

6. Parameter Passing and Value Return

The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

#Listing 5
.globl main
main:
 movl $12, %ebx
 pushl %ebx
 call sqr
 addl $4, %esp       //adjust esp to its value before the push
 ret
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax    //compute eax * eax, store result in eax 
 ret
Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

8. Mixing C and Assembler

Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typing cc main.c sqr.s.

The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

#Listing 6
//main.c
main()
{
 int i = sqr(11);
 printf("%d\n",i);
}

//sqr.s
.globl sqr
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax
 ret


#Listing 7
//print.c
print(int i)
{
 printf("%d\n",i);
}

//main.s
.globl main
main:
 movl $123, %eax
 pushl %eax
 call print
 addl $4, %esp
 ret

9. Assembler Output Generated by GNU C

I guess this much reading is sufficient for understanding the assembler output produced by gcc. Listing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

#Listing 8
//add.c
int add(int i,int j)
{
 int p = i + j;
 return p;
}

//add.s
.globl add
add:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp  //create space for integer p
 movl 8(%ebp),%edx //8(%ebp) refers to i
 addl 12(%ebp), %edx //12(%ebp) refers to j
 movl %edx, -4(%ebp) //-4(%ebp) refers to p
 movl -4(%ebp), %eax //store return value in eax
 leave   //i.e. to movl %ebp, %esp; popl %ebp ret
The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

pushl $20
pushl $10
call add
Note that the second parameter is passed first.

10. Global Variables

Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

#Listing 9
//glob.c
int foo = 10;
main()
{
 int p foo;
}

//glob.s
.globl foo
foo:
 .long 10
.globl main
main:
 pushl %ebp
 movl %esp,%ebp
 subl $4,%esp
 movl foo,%eax
 movl %eax,-4(%ebp)
 leave
 ret
The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

11. System Calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.

There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.

Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.

System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.

Consider Listing 10 given below.

#Listing 10
#fork.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
 fork();
 printf("Hello\n");
 return 0;
}
Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.

There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

11. Inline Assembly Programming

The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

asm ("fsin" : "=t" (answer) : "0" (angle));
is an x86-specific way of coding this C statement:

answer = sin(angle);
You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

  • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
  • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
#Listing 11
#Name : bit-pos-loop.c 
#Description : Find bit position using a loop

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
 long max = atoi (argv[1]);
 long number;
 long i;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  for (i=(number>>1), position=0; i!=0; ++position)
   i >>= 1;
  result = position;
 }
 return 0;
}

#Listing 12
#Name : bit-pos-asm.c
#Description : Find bit position using bsrl

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
 long max = atoi(argv[1]);
 long number;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  asm("bsrl %1, %0" : "=r" (position) : "r" (number));
  result = position;
 }
 return 0;
}
Compile the two versions with full optimizations as given below:

$ cc -O2 -o bit-pos-loop bit-pos-loop.c
$ cc -O2 -o bit-pos-asm bit-pos-asm.c
Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

$ time ./bit-pos-loop 250000000
and

$ time ./bit-pos-asm 250000000
The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.

GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.

Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

12. Exercises

  1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
  2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
  3. Interpret assembly code for switch statement.
  4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
  5. A nested function is defined inside another function (the "enclosing function"), such that:
    • the nested function has access to the enclosing function's variables; and
    • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

    Nested functions can be useful because they help control the visibility of a function.
    Consider Listing 13 given below:

    #Listing 13
    /* myprint.c */
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
     int i;
     void my_print(int k)
     {
      printf("%d\n",k);
     }
     scanf("%d",&i);
     my_print(i);
     return 0;
    }
    
    Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?
 

OS Process States

OS Process States:




How VTABLE works?

In C++, what’s a vtable and how does it work?

Vtables: Known by Many Different Names

It’s worth a few brain cells to remember that a vtable is known by many different names: virtual function table, virtual method table, and even as a dispatch table. When interviewing, it’s always a good idea to be familiar with the terminology.

Vtables: Used Behind the Scenes of Polymorphism



When working with virtual functions in C++, it’s the vtable that’s being used behind the scenes to help achieve polymorphism. And, although you can understand polymorphism without understanding vtables, many interviewers like to ask this question just to see if you really know your stuff.

Vtables contain pointers to virtual functions

Whenever a class itself contains virtual functions or overrides virtual functions from a parent class the compiler builds a vtable for that class. This means that not all classes have a vtable created for them by the compiler. The vtable contains function pointers that point to the virtual functions in that class. There can only be one vtable per class, and all objects of the same class will share the same vtable.

Vpointers point to the vtable

Associated with every vtable is what’s called a vpointer. The vpointer points to the vtable, and is used to access the functions inside the vtable. The vtable would be useless without a vpointer.
Subscribe to our newsletter for more free interview questions.
For any class that contains a vtable, the compiler will also add "hidden" code to the constructor of that class to initialize the vpointers of its objects to the address of the corresponding vtable.
Because you’re probably confused now, let’s take a look at an example so that we can explain vtables more clearly and in more detail. Take a look at the code below:
class Animal // base class
{

public:

int weight;

virtual int getWeight() {};
}


// Obviously, Tiger derives from the Animal class
class Tiger: public Animal {

public:

int weight;

int height; 

int getWeight() {return weight;};

int getHeight() {return height;};

int main()
{ 
 Tiger t1;
 
 /* below, an Animal object pointer is set to point
    to an object of the derived Tiger class  */
 
 Animal *a1 = &t1; 
 
 /*  below, how does this know to call the 
  definition of getWeight in the Tiger class, 
  and not the definition provided in the Animal 
  class  */
  
 a1 -> getWeight(); 
}   
}

Only one vtable per class

The vtable contains function pointers that point to the virtual functions in that class. It’s important to note that there can only be one vtable per class, and all objects of the same class will share the same vtable. This means that in the example above, the Animal and Tiger classes will each have their very own vtable, and any objects of the Animal or Tiger classes will use their respective class’s vtables.

Vtables are used at runtime



In the example above, because the Tiger class overrides the getWeight virtual function provided in the Animal class, the compiler will construct a vtable for the Tiger class. This vtable will only contain a function pointer for the getWeight function. The getHeight function will not be put inside the vtable because it is not virtual, nor does it override a virtual function in the Animal base class.
The question that the code above raises is at runtime how does the call "a1 -> getWeight()" know to use the version of getWeight provided in the Tiger – and not the Animal class. The answer, as you probably guessed, is by using vtables.
We now know that a vtable will be created for the Tiger class. And every vtable must have a vpointer that points to the vtable (otherwise the vtable can not be referenced). Let’s call the vpointer that belongs to the Tiger class vptr1 – this is just a name we created so we can show our point.
Take a look at the code below again.
// Obviously, Tiger derives from the Animal class
class Tiger: public Animal {

/*
...some code left out here

*/

int main()
{ 
 Tiger t1;
 
 /* below, an Animal object pointer is set to point
    to an object of the derived Tiger class  */
 
 Animal *a1 = &t1; 
 
 /*  below, how does this know to call the 
  definition of getWeight in the Tiger class, 
  and not the definition provided in the Animal 
  class  */
  
 a1 -> getWeight(); 
}   
} 

Because the Tiger class contains a pointer to the vtable called vptr1, the call "a1 -> getWeight()" will actually be translated to "(*(a1 -> vptr1 -> getWeight())".
Tiger t1;
 
/* below, an Animal object pointer is set to point
to an object of the derived Tiger class  */
 
Animal *a1 = &t1;       
a1 -> getWeight(); 

/* 
   The call above gets translated to this,
   assuming the pointer to the vtable for the
   Tiger class is called vptr1
   :  
   *(a1 -> vptr1 -> getWeight())

*/


This means that the definition of getWeight() provided in the Tiger class will be called, which is what we want.

What if vtables were not used?

Now that we’ve seen vtables in action we should ask ourselves again why C++ would use vtables? Hopefully at this point you can answer that question yourself. But, just to review, vtables are used so that the correct definition of the function can be called at runtime – basically to help achieve polymorphism. In our example, we wanted the definition of the getWeight() function provided in the Tiger class to be called. And that is exactly what will happen with the help of vtables. 

From C To Assembly Language

From C To Assembly Language

1. Overview

What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.
This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).
The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.
The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.
Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.
Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

2. Getting Started

To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.
The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.
Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eaxebxecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

movl $10, %eax
This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.
Our first assembly language program, stored in a file named first.s is shown in Listing 1.

#Listing 1
.globl main
main:
  movl $20, %eax
  ret
This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.
The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

3. Arithmetic, Comparison, Looping

Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

#Listing 2
.globl main
main: 
 movl $5, %eax
 movl $1, %ebx
L1: cmpl $0, %eax  //compare 0 with value in eax
 je L2   //jump to L2 if 0==eax (je - jump if equal)
 imull %eax, %ebx // ebx = ebx*eax
 decl %eax  //decrement eax
 jmp L1   // unconditional jump to L1
L2:  ret
L1 and L2 are labels. When control flow reaches L2ebx would contain the factorial of the number stored in eax.

4. Subroutines

When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3illustrates subroutine call and return in assembly language programs.

#Listing 3
.globl main
main:
 movl $10, %eax
 call foo
 ret
foo:
 addl $5, %eax
 ret
The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.
Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

5. Using The Stack

A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack.Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.
Figure 1Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
Figure 2The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.
The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.
In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

6. Allocating Space for Local Variables

It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.
The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

#Listing 4
.globl main
main:
 call foo
 ret
foo:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 movl $10, -4(%ebp)
 movl %ebp, %esp
 popl %ebp
 ret
First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebpesp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

Table 1

6. Parameter Passing and Value Return

The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

#Listing 5
.globl main
main:
 movl $12, %ebx
 pushl %ebx
 call sqr
 addl $4, %esp       //adjust esp to its value before the push
 ret
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax    //compute eax * eax, store result in eax 
 ret
Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

8. Mixing C and Assembler

Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typingcc main.c sqr.s.
The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

#Listing 6
//main.c
main()
{
 int i = sqr(11);
 printf("%d\n",i);
}

//sqr.s
.globl sqr
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax
 ret

#Listing 7
//print.c
print(int i)
{
 printf("%d\n",i);
}

//main.s
.globl main
main:
 movl $123, %eax
 pushl %eax
 call print
 addl $4, %esp
 ret

9. Assembler Output Generated by GNU C

I guess this much reading is sufficient for understanding the assembler output produced by gccListing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

#Listing 8
//add.c
int add(int i,int j)
{
 int p = i + j;
 return p;
}

//add.s
.globl add
add:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp  //create space for integer p
 movl 8(%ebp),%edx //8(%ebp) refers to i
 addl 12(%ebp), %edx //12(%ebp) refers to j
 movl %edx, -4(%ebp) //-4(%ebp) refers to p
 movl -4(%ebp), %eax //store return value in eax
 leave   //i.e. to movl %ebp, %esp; popl %ebp ret
The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

pushl $20
pushl $10
call add
Note that the second parameter is passed first.

10. Global Variables

Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

#Listing 9
//glob.c
int foo = 10;
main()
{
 int p foo;
}

//glob.s
.globl foo
foo:
 .long 10
.globl main
main:
 pushl %ebp
 movl %esp,%ebp
 subl $4,%esp
 movl foo,%eax
 movl %eax,-4(%ebp)
 leave
 ret
The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

11. System Calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.
There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.
Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.
System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.
Consider Listing 10 given below.

#Listing 10
#fork.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
 fork();
 printf("Hello\n");
 return 0;
}
Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.
There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

11. Inline Assembly Programming

The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

asm ("fsin" : "=t" (answer) : "0" (angle));
is an x86-specific way of coding this C statement:

answer = sin(angle);
You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

  • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
  • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
#Listing 11
#Name : bit-pos-loop.c 
#Description : Find bit position using a loop

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
 long max = atoi (argv[1]);
 long number;
 long i;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  for (i=(number>>1), position=0; i!=0; ++position)
   i >>= 1;
  result = position;
 }
 return 0;
}

#Listing 12
#Name : bit-pos-asm.c
#Description : Find bit position using bsrl

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
 long max = atoi(argv[1]);
 long number;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  asm("bsrl %1, %0" : "=r" (position) : "r" (number));
  result = position;
 }
 return 0;
}
Compile the two versions with full optimizations as given below:

$ cc -O2 -o bit-pos-loop bit-pos-loop.c
$ cc -O2 -o bit-pos-asm bit-pos-asm.c
Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

$ time ./bit-pos-loop 250000000
and

$ time ./bit-pos-asm 250000000
The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.
GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.
Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

12. Exercises

  1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
  2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
  3. Interpret assembly code for switch statement.
  4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
  5. A nested function is defined inside another function (the "enclosing function"), such that:
    • the nested function has access to the enclosing function's variables; and
    • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

    Nested functions can be useful because they help control the visibility of a function.
    Consider Listing 13 given below:

    #Listing 13
    /* myprint.c */
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
     int i;
     void my_print(int k)
     {
      printf("%d\n",k);
     }
     scanf("%d",&i);
     my_print(i);
     return 0;
    }
    
    Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?

    << Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>
    LINUX GAZETTE
    ...making Linux just a little more fun!
    From C To Assembly Language
    By Hiran Ramankutty


    1. Overview

    What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.
    This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).
    The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.
    The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.
    Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.
    Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

    2. Getting Started

    To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.
    The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.
    Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eaxebxecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

    movl $10, %eax
    
    This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.
    Our first assembly language program, stored in a file named first.s is shown in Listing 1.

    #Listing 1
    .globl main
    main:
      movl $20, %eax
      ret
    
    This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.
    The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

    3. Arithmetic, Comparison, Looping

    Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

    #Listing 2
    .globl main
    main: 
     movl $5, %eax
     movl $1, %ebx
    L1: cmpl $0, %eax  //compare 0 with value in eax
     je L2   //jump to L2 if 0==eax (je - jump if equal)
     imull %eax, %ebx // ebx = ebx*eax
     decl %eax  //decrement eax
     jmp L1   // unconditional jump to L1
    L2:  ret
    
    L1 and L2 are labels. When control flow reaches L2ebx would contain the factorial of the number stored in eax.

    4. Subroutines

    When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3illustrates subroutine call and return in assembly language programs.

    #Listing 3
    .globl main
    main:
     movl $10, %eax
     call foo
     ret
    foo:
     addl $5, %eax
     ret
    
    The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.
    Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

    5. Using The Stack

    A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack.Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.
    Figure 1Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
    Figure 2The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.
    The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.
    In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

    6. Allocating Space for Local Variables

    It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.
    The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

    #Listing 4
    .globl main
    main:
     call foo
     ret
    foo:
     pushl %ebp
     movl %esp, %ebp
     subl $4, %esp
     movl $10, -4(%ebp)
     movl %ebp, %esp
     popl %ebp
     ret
    
    First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebpesp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

    Table 1

    6. Parameter Passing and Value Return

    The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

    #Listing 5
    .globl main
    main:
     movl $12, %ebx
     pushl %ebx
     call sqr
     addl $4, %esp       //adjust esp to its value before the push
     ret
    sqr:
     movl 4(%esp), %eax
     imull %eax, %eax    //compute eax * eax, store result in eax 
     ret
    
    Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

    8. Mixing C and Assembler

    Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typingcc main.c sqr.s.
    The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

    #Listing 6
    //main.c
    main()
    {
     int i = sqr(11);
     printf("%d\n",i);
    }
    
    //sqr.s
    .globl sqr
    sqr:
     movl 4(%esp), %eax
     imull %eax, %eax
     ret
    

    #Listing 7
    //print.c
    print(int i)
    {
     printf("%d\n",i);
    }
    
    //main.s
    .globl main
    main:
     movl $123, %eax
     pushl %eax
     call print
     addl $4, %esp
     ret
    

    9. Assembler Output Generated by GNU C

    I guess this much reading is sufficient for understanding the assembler output produced by gccListing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

    #Listing 8
    //add.c
    int add(int i,int j)
    {
     int p = i + j;
     return p;
    }
    
    //add.s
    .globl add
    add:
     pushl %ebp
     movl %esp, %ebp
     subl $4, %esp  //create space for integer p
     movl 8(%ebp),%edx //8(%ebp) refers to i
     addl 12(%ebp), %edx //12(%ebp) refers to j
     movl %edx, -4(%ebp) //-4(%ebp) refers to p
     movl -4(%ebp), %eax //store return value in eax
     leave   //i.e. to movl %ebp, %esp; popl %ebp ret
    
    The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

    pushl $20
    pushl $10
    call add
    
    Note that the second parameter is passed first.

    10. Global Variables

    Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

    #Listing 9
    //glob.c
    int foo = 10;
    main()
    {
     int p foo;
    }
    
    //glob.s
    .globl foo
    foo:
     .long 10
    .globl main
    main:
     pushl %ebp
     movl %esp,%ebp
     subl $4,%esp
     movl foo,%eax
     movl %eax,-4(%ebp)
     leave
     ret
    
    The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

    11. System Calls

    Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.
    There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.
    Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.
    System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.
    Consider Listing 10 given below.

    #Listing 10
    #fork.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/types.h>
    #include <unistd.h>
    
    int main()
    {
     fork();
     printf("Hello\n");
     return 0;
    }
    
    Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.
    There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

    11. Inline Assembly Programming

    The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
    The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

    asm ("fsin" : "=t" (answer) : "0" (angle));
    
    is an x86-specific way of coding this C statement:

    answer = sin(angle);
    
    You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

    • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
    • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
    #Listing 11
    #Name : bit-pos-loop.c 
    #Description : Find bit position using a loop
    
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, char *argv[])
    {
     long max = atoi (argv[1]);
     long number;
     long i;
     unsigned position;
     volatile unsigned result;
    
     for (number = 1; number <= max; ; ++number) {
      for (i=(number>>1), position=0; i!=0; ++position)
       i >>= 1;
      result = position;
     }
     return 0;
    }
    

    #Listing 12
    #Name : bit-pos-asm.c
    #Description : Find bit position using bsrl
    
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char *argv[])
    {
     long max = atoi(argv[1]);
     long number;
     unsigned position;
     volatile unsigned result;
    
     for (number = 1; number <= max; ; ++number) {
      asm("bsrl %1, %0" : "=r" (position) : "r" (number));
      result = position;
     }
     return 0;
    }
    
    Compile the two versions with full optimizations as given below:

    $ cc -O2 -o bit-pos-loop bit-pos-loop.c
    $ cc -O2 -o bit-pos-asm bit-pos-asm.c
    
    Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

    $ time ./bit-pos-loop 250000000
    
    and

    $ time ./bit-pos-asm 250000000
    
    The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.
    GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.
    Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

    12. Exercises

    1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
    2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
    3. Interpret assembly code for switch statement.
    4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
    5. A nested function is defined inside another function (the "enclosing function"), such that:
      • the nested function has access to the enclosing function's variables; and
      • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

      Nested functions can be useful because they help control the visibility of a function.
      Consider Listing 13 given below:

      #Listing 13
      /* myprint.c */
      #include <stdio.h>
      #include <stdlib.h>
      
      int main()
      {
       int i;
       void my_print(int k)
       {
        printf("%d\n",k);
       }
       scanf("%d",&i);
       my_print(i);
       return 0;
      }
      
      Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?

Wednesday, October 30, 2013

Pure Virtual Functions


What is a pure virtual function?

A pure virtual function is a function that has the notation "= 0" in the declaration of that function. Why we would want a pure virtual function and what a pure virtual function looks like is explored in more detail below.
Here is a simple example of what a pure virtual function in C++ would look like:

Simple Example of a pure virtual function in C++

class SomeClass {
public:
   virtual void pure_virtual() = 0;  // a pure virtual function
   // note that there is no function body  
};

The pure specifier



The "= 0" portion of a pure virtual function is also known as the pure specifier, because it’s what makes a pure virtual function “pure”. Although the pure specifier appended to the end of the virtual function definition may look like the function is being assigned a value of 0, that is not true. The notation "= 0" is just there to indicate that the virtual function is a pure virtual function, and that the function has no body or definition. Also note that we named the function “pure_virtual” – that was just to make the example easier to understand, but it certainly does not mean that all pure virtual functions must have that name since they can have any name they want.

Can a pure virtual function have an implementation?

The quick answer to that question is yes! A pure virtual function can have an implementation in C++ – which is something that even many veteran C++ developers do not know. So, using the SomeClass class from our example above, we can have the following code:
class SomeClass {
public:
   virtual void pure_virtual() = 0;  // a pure virtual function
   // note that there is no function body  
};

/*This is an implementation of the pure_virtual function
    which is declared as a pure virtual function.
    This is perfectly legal:
*/
void SomeClass::pure_virtual() {
    cout<<"This is a test"<<endl;
}

Why would you want a pure virtual function to have an implementation?

It is actually pretty rare to see a pure virtual function with an implementation in real-world code, but having that implementation may be desirable when you think that classes which derive from the base class may need some sort of default behavior for the pure virtual function. So, for example, if we have a class that derives from our SomeClass class above, we can write some code like this – where the derived class actually makes a call to the pure virtual function implementation that is inherited:
//this class derives from SomeClass
class DerivedClass: public SomeClass {

virtual void pure_virtual() {

/*
Makes a call to the pure virtual function 
implementation that is inside the SomeClass
class. This can happen because DerivedClass
may not have anything appropriate to define
for this function, so it just calls the SomeClass's
implementation
*/

SomeClass::pure_virtual();  

}

};
Something else that is definitely worth noting in the code above is the fact that the call to the “SomeClass::pure_virtual();” function is valid because of the fact that the pure_virtual function declaration is public in the SomeClass class. That call would also be valid if the pure_virtual function declaration is protected, because the DerivedClass does derive from the SomeClass class. However, if the pure_virtual function declaration was private in the SomeClass class, then a compiler error would result when the “SomeClass::pure_virtual();” call is made in the DerivedClass class, because it would obviously not have access to that function implementation.

Pure virtual functions can not have a definition inside the function declaration

If you do mistakenly try to give a declaration of a pure virtual function a definition as well, then the compiler will return an error when it comes across that code. Note that there is however an exception to this in Microsoft’s Visual C++ implementation, which specifically allows this. This is also known as an inline definition, which is completely different from the use of the inline keyword – which you can read about here Inline vs macro. So, suppose we have the following code:
class SomeClass {
public:
  /*note that we added braces that are normally
     associated with a function body and definition:
  */
   virtual void pure_virtual() = 0 { }; //ERROR (except in MS VC++)
};
The code above is considered ill formed by the C++ 03 standard in Clause 10.4, paragraph 2, which says that “a function declaration cannot provide both a pure-specifier and a definition”.
Running the code above will actually result in a compiler error, because a pure virtual function can not have a definition within the declaration of the pure virtual function.

A class with a pure virtual function is called an abstract class

Any class that has at least one pure virtual function is called an abstract class. This means that in our example above, the SomeClass class is an abstract class. An abstract class cannot have an instance of itself created. So, using our example class from above, the following code would not work:
SomeClass aClass;  //Error!  SomeClass is abstract
This also means that any class that derives from an abstract class must override the definition of the pure virtual function in the base class, and if it doesn’t then the derived class becomes an abstract class as well.

Pure virtual functions in Java

In Java, pure virtual methods are declared using the abstract keyword – note that C++ does not use the keyword “abstract” as part of the language itself, although C++ does have abstract classes – any class with at least one virtual function is considered to be abstract. So, in Java abstract methods are the equivalent of pure virtual functions in C++. In Java, an abstract method cannot have a body, just like a pure virtual function in C++. A class containing abstract methods must itself be declared abstract. But, an abstract class is not necessarilly required to have any abstract methods. An abstract class cannot be instantiated.

When should pure virtual functions be used in C++?

In C++, a regular, "non-pure" virtual function provides a definition, which means that the class in which that virtual function is defined does not need to be declared abstract. You would want to create a pure virtual function when it doesn’t make sense to provide a definition for a virtual function in the base class itself, within the context of inheritance.

An example of when pure virtual functions are necessary

For example, let’s say that you have a base class called Figure. The Figure class has a function called draw. And, other classes like Circle and Square derive from the Figure class. In the Figure class, it doesn’t make sense to actually provide a definition for the draw function, because of the simple and obvious fact that a “Figure” has no specific shape. It is simply meant to act as a base class. Of course, in the Circle and Square classes it would be obvious what should happen in the draw function – they should just draw out either a Circle or Square (respectively) on the page. But, in the Figure class it makes no sense to provide a definition for the draw function. And this is exactly when a pure virtual function should be used – the draw function in the Figure class should be a pure virtual function.

Function Template with more than one type parameter


How would you create a function template with more than one type parameter?



If you don’t know what a function template is, then it would help if you read our quick refresher on function templates.
You may find yourself needing to use more than one type parameter in a function template. If that ever occurs, then declaring multiple type parameters is actually quite simple. All you need to do is add the extra type to the template prefix, so it looks like this:
// 2 type parameters:
template<class T1, class T2>
void someFunc(T1 var1, T2 var2 )
{
// some code in here...
}

Can you have unused type parameters?

No, you may not. If you declare a template parameter then you absolutely must use it inside of your function definition otherwise the compiler will complain. So, in the example above, you would have to use both T1 and T2, or you will get a compiler error.

Do template parameters have to be declared with a “T”?

No, the type parameter can actually be declared with any other identifier that you choose – as long as it not a keyword in C++. Using “T” as the name for the type parameter is traditional, but remember that other names can be used.

Memory Leak in C++


What is a memory leak in C++, and provide an example?



A memory leak occurs when a piece (or pieces) of memory that was previously allocated by a programmer is not properly deallocated by the programmer. Even though that memory is no longer in use by the program, it is still “reserved”, and that piece of memory can not be used by the program until it is properly deallocated by the programmer. That’s why it’s called a memory leak – because it’s like a leaky faucet in which water is being wasted, only in this case it’s computer memory.

What problems can be caused by memory leaks?

The problem caused by a memory leak is that it leaves chunk(s) of memory unavailable for use by the programmer. If a program has a lot of memory that hasn’t been deallocated, then that could really slow down the performance of the program. If there’s no memory left in the program because of memory leaks, then that could of course cause the program to crash.

An example of a memory leak in C++

Here is an example of a memory leak in C++:

Out of Scope Pointer

void memLeak( )
{
  int *data = new int;
  *data = 15;
}

So, the problem with the code above is that the “*data” pointer is never deleted – which means that the data it references is never deallocated, and memory is wasted.

Whenever an object is created using new operator in C++, it needs to be destroyed (or) the memory has to be returned back to the Operating system using delete operator.

if delete operator is not called, the operating system assumes that block of memory requested by new operator is still in use by the process.

Restoring SHOW DESKTOP icon in QUICK LAUNCH toolbar

To re-create the Show desktop icon yourself, follow these steps:
  1. Click Start, click Run, type notepad in the Open box, and then click OK.
  2. Carefully copy and then paste the following text into the Notepad window:
    [Shell]
    Command=2
    IconFile=explorer.exe,3
    [Taskbar]
    Command=ToggleDesktop
  3. On the File menu, click Save As, and then save the file to your desktop as "Show desktop.scf". The Show desktop icon is created on your desktop.
  4. Click and then drag the Show desktop icon to your Quick Launch toolbar.