Wednesday, November 13, 2013

Sizeof Class / Data Members

Sizeof Class / Data Members

There are many factors that decide the size of an object of a class in C++. These factors are:
  1. Size of all non-static data members
  2. Order of data members
  3. Byte alignment or byte padding
  4. Size of its immediate base class
  5. The existence of virtual function(s) (Dynamic polymorphism using virtual functions).
  6. Compiler being used Mode of inheritance (virtual inheritance)
1. Sizeof Class with No data members:

Generally, for class without any data members, the size of the object of the class would be equals to 1.

i.e Have a look at the below example

Class Student
{
        //Function Definition
};

int main()
{
       Student S1;
       cout<<"The size of Empty class is "<<sizeof(S1);
       return 0;
}

The o/p would be 1. sizeof can never return 0. Logically, structure/class with 0 bytes does not exists. Hence minimum memory is allocated for a character data type which is one byte. That will be provided for an empty class.

2. Sizeof Class with data members:

Generally, for class with any data members, the size of the object of the class would be equals to amount of bytes the data members occupy.

i.e Have a look at the below example

Class Student
{
        private:
            int rollno;
        
        public:
            int getrollno()
            {
                 return rollno;
            }
            void setrollno(int rlno)
            {
                 rollno=rlno;
            }
};

int main()
{
       Student S1;
       cout<<"The size of this class is "<<sizeof(S1);
       return 0;
}

The output of the code would result to 4. i.e sizeof(int)= 4 bytes for the rollno.

3. Sizeof Class with data members without padding:

Generally, for class with any data members, the size of the object of the class would be equals to amount of bytes the data members occupy.

i.e Have a look at the below example

Class Student
{
        private:
            char section; 
            int rollno;
            char grade;
        
        public:
            int getrollno()
            {
                 return rollno;
            }
            void setrollno(int rlno)
            {
                 rollno=rlno;
            }
};

int main()
{
       Student S1;
       cout<<"The size of this class is "<<sizeof(S1);
       return 0;
}

If you compute the sizeof class as 6, (int-4, char-1) then you are absolutely wrong. Generally the order of data members defined place a vital role in allocating memory.

In our example, first character is declared. Although character takes only one byte, it allocates 4 bytes of block of memory in which only the first byte is used, the remaining three bytes are used with filler bits.

For integer, it will allocate 4 bytes. And again for the character 'grade', it allocates 4 bytes of memory.
The output of the above will result to 12.

Hence padding concept can be used to avoid this kind of high memory usage.

4. Sizeof Class with data members with padding:

Padding - Data members with higher memory usage are declared first followed by the lower order.

Generally, for class with any data members, the size of the object of the class would be equals to amount of bytes the data members occupy.

i.e Have a look at the below example

Class Student
{
        private:
            int rollno;
            char section;  
            char grade;
        
        public:
            int getrollno()
            {
                 return rollno;
            }
            void setrollno(int rlno)
            {
                 rollno=rlno;
            }
};

int main()
{
       Student S1;
       cout<<"The size of this class is "<<sizeof(S1);
       return 0;
}

Now integer will allocate 4 bytes, and the remaining two characters fills another two bytes.
The output would be 8.

Hence padding concept can be used to avoid this kind of high memory usage.


5. Sizeof Class with one virtual member function:

For the class with virtual functions, it allocates additional 4 bytes of memory for the pointer to virtual table.

Class Student
{
        private:
            int rollno;
        
        public:
            virtual void display()
            {
            }
};

int main()
{
       Student S1;
       cout<<"The size of this class is "<<sizeof(S1);
       return 0;
}

The output would be equals to 8.



6. Sizeof Class with two or more virtual member functions:

For the class with virtual functions, it allocates additional 4 bytes of memory for the pointer to virtual table.
Any number of virtual functions are available, the pointer remains only one.

Class Student
{
        private:
            int rollno;
        
        public:
            virtual void display()
            {
            } 

            virtual void treatsignal()
            {
            }
};

int main()
{
       Student S1;
       cout<<"The size of this class is "<<sizeof(S1);
       return 0;
}

The output would still be equals to 8.

7. Sizeof Derived Class:

The size of derived class will occupy space for data members of immediate base class + memory allocation for its own class.

Class Student
{
        public:
            int rollno;
        
        public:
            virtual void display()
            {
            }
};

Class Marks : public Student
{
        private:
            int mark1;
            int mark2;

        public:
            virtual void display()
            {
            }
};

int main()
{
       Marks m;
       cout<<"The size of this class is "<<sizeof(m);
       return 0;
}

The output would be Baseclass member (rollno)(4) + Virtualptr (4) in baseclass + Ownclass members (4 + 4) equals to 16.


8. Sizeof Derived Class inherited twice:

The size of derived class will occupy space for data members of immediate base class + memory allocation for its own class.

class Student
{
       int rollno;
};

class  marks : public Student
{
};

class performance: public marks
{
     char grade;
     char rank;
};

The size will be Baseclass member + own class will be equals to 8.

9. Size of all non-static data members:
Only non-static data members will be counted for calculating sizeof class/object.

class A 
{
private:
        float iMem1;
        const int iMem2;
        static int iMem3;
        char iMem4;
};


For an object of class A, the size will be the size of float iMem1 + size of int iMem2 + size of char iMem4. Static members are really not part of the class object. They won't be included in object's layout.

Mode of inheritance (virtual inheritance):

In C++, sometimes we have to use virtual inheritance for some reasons. (One classic example is the implementation of final class in C++.) When we use virtual inheritance, there will be the overhead of 4 bytes for a virtual base class pointer in that class.

class ABase{
        int iMem;
};

class BBase : public virtual ABase {
        int iMem;
};

class CBase : public virtual ABase {
        int iMem;
};

class ABCDerived : public BBase, public CBase {
        int iMem;
};


And if you check the size of these classes, it will be:

    Size of ABase : 4
    Size of BBase : 12
    Size of CBase : 12
    Size of ABCDerived : 24

Because BBase and CBase are derived from ABase virtually, they will also have an virtual base pointer. So, 4 bytes will be added to the size of the class (BBase and CBase). That is sizeof ABase + size of int + sizeof Virtual Base pointer.

Size of ABCDerived will be 24 (not 28 = sizeof (BBase + CBase + int member)) because it will maintain only one Virtual Base pointer (Same way of maintaining virtual table pointer).       

SS7 Layers / Architecture

SS7 Layers/Architecture :


Thursday, November 7, 2013

Script Command in UNIX for creating LOG files

 How to Use Script Command in UNIX:

:/root>script mylogfile.txt
Script started, file is mylogfile.txt
:/root>pwd
/root
:/root>cd SARAVANAN/
:/root/SARAVANAN>ls -lrt
total 20
drwxr-xr-x    5 root     root         4096 Mar 21  2013 SARAVANAN_RQ011305
drwxr-xr-x    4 root     root         4096 Apr 30  2013 65456A_T38
drwxr-xr-x    5 root     root         4096 Oct 15 06:13 SARAVANAN_RQ011308
drwxr-xr-x    3 root     root         4096 Nov  6 05:50 SARA_SIP_SIP_RQ011308
drwxr-xr-x    2 root     root         4096 Nov  8 04:24 3GPP_TEST
:/root/SARAVANAN>
:/root/SARAVANAN>
:/root/SARAVANAN>
:/root/SARAVANAN>exit
Script done, file is mylogfile.txt



mylogfile.txt in /root/ folder will contain the snapshot of all commands entered during the script is made ON/OFF

 :/root/vi mylogfile.txt
Script started on Fri Nov  8 04:56:04 2013
:/root>pwd^M
/root^M
:/root>cd A^H ^HSARAVANAN/^M
:/root/SARAVANAN>ls -lrt^M
total 20^M
drwxr-xr-x    5 root     root         4096 Mar 21  2013 SARAVANAN_RQ011305^M
drwxr-xr-x    4 root     root         4096 Apr 30  2013 65456A_T38^M
drwxr-xr-x    5 root     root         4096 Oct 15 06:13 SARAVANAN_RQ011308^M
drwxr-xr-x    3 root     root         4096 Nov  6 05:50 SARA_SIP_SIP_RQ011308^M
drwxr-xr-x    2 root     root         4096 Nov  8 04:24 3GPP_TEST^M
:/root/SARAVANAN>^M
:/root/SARAVANAN>^M
:/root/SARAVANAN>^M
:/root/SARAVANAN>exit^M

Script done on Fri Nov  8 04:56:21 2013




Note : Script captures every keystroke including Backspace, DELETE. (see cd SARAVANAN, ^H is for Backspace. The user had typed A instead of S. So he pressed backspace)

How to clear history in unix terminal

How to clear history in unix terminal

Run the command

history -c

This will clear the history of commands already entered in unix terminal.

Identifying Unix Flavour

Identifying Unix Flavour:

Below are the commands using which we can identify the working flavour of UNIX

~/>uname -a
~/>cat /proc/version
~/>cat /etc/release
~/>cat /etc/issue


Thursday, October 31, 2013

C to Assembly

From C To Assembly Language LG #94
From C To Assembly Language

1. Overview

What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.

This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).

The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.

The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.

Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.

Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

2. Getting Started

To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.

The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.

Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eax, ebx, ecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

movl $10, %eax
This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.

Our first assembly language program, stored in a file named first.s is shown in Listing 1.

#Listing 1
.globl main
main:
  movl $20, %eax
  ret
This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.

The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

3. Arithmetic, Comparison, Looping

Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

#Listing 2
.globl main
main: 
 movl $5, %eax
 movl $1, %ebx
L1: cmpl $0, %eax  //compare 0 with value in eax
 je L2   //jump to L2 if 0==eax (je - jump if equal)
 imull %eax, %ebx // ebx = ebx*eax
 decl %eax  //decrement eax
 jmp L1   // unconditional jump to L1
L2:  ret
L1 and L2 are labels. When control flow reaches L2, ebx would contain the factorial of the number stored in eax.

4. Subroutines

When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3 illustrates subroutine call and return in assembly language programs.

#Listing 3
.globl main
main:
 movl $10, %eax
 call foo
 ret
foo:
 addl $5, %eax
 ret
The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.

Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

5. Using The Stack

A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack. Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.

Figure 1 Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
Figure 2 The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.

The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.

In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

6. Allocating Space for Local Variables

It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.

The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

#Listing 4
.globl main
main:
 call foo
 ret
foo:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 movl $10, -4(%ebp)
 movl %ebp, %esp
 popl %ebp
 ret
First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebp, esp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

Table 1

6. Parameter Passing and Value Return

The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

#Listing 5
.globl main
main:
 movl $12, %ebx
 pushl %ebx
 call sqr
 addl $4, %esp       //adjust esp to its value before the push
 ret
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax    //compute eax * eax, store result in eax 
 ret
Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

8. Mixing C and Assembler

Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typing cc main.c sqr.s.

The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

#Listing 6
//main.c
main()
{
 int i = sqr(11);
 printf("%d\n",i);
}

//sqr.s
.globl sqr
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax
 ret


#Listing 7
//print.c
print(int i)
{
 printf("%d\n",i);
}

//main.s
.globl main
main:
 movl $123, %eax
 pushl %eax
 call print
 addl $4, %esp
 ret

9. Assembler Output Generated by GNU C

I guess this much reading is sufficient for understanding the assembler output produced by gcc. Listing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

#Listing 8
//add.c
int add(int i,int j)
{
 int p = i + j;
 return p;
}

//add.s
.globl add
add:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp  //create space for integer p
 movl 8(%ebp),%edx //8(%ebp) refers to i
 addl 12(%ebp), %edx //12(%ebp) refers to j
 movl %edx, -4(%ebp) //-4(%ebp) refers to p
 movl -4(%ebp), %eax //store return value in eax
 leave   //i.e. to movl %ebp, %esp; popl %ebp ret
The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

pushl $20
pushl $10
call add
Note that the second parameter is passed first.

10. Global Variables

Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

#Listing 9
//glob.c
int foo = 10;
main()
{
 int p foo;
}

//glob.s
.globl foo
foo:
 .long 10
.globl main
main:
 pushl %ebp
 movl %esp,%ebp
 subl $4,%esp
 movl foo,%eax
 movl %eax,-4(%ebp)
 leave
 ret
The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

11. System Calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.

There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.

Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.

System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.

Consider Listing 10 given below.

#Listing 10
#fork.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
 fork();
 printf("Hello\n");
 return 0;
}
Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.

There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

11. Inline Assembly Programming

The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

asm ("fsin" : "=t" (answer) : "0" (angle));
is an x86-specific way of coding this C statement:

answer = sin(angle);
You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

  • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
  • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
#Listing 11
#Name : bit-pos-loop.c 
#Description : Find bit position using a loop

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
 long max = atoi (argv[1]);
 long number;
 long i;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  for (i=(number>>1), position=0; i!=0; ++position)
   i >>= 1;
  result = position;
 }
 return 0;
}

#Listing 12
#Name : bit-pos-asm.c
#Description : Find bit position using bsrl

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
 long max = atoi(argv[1]);
 long number;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  asm("bsrl %1, %0" : "=r" (position) : "r" (number));
  result = position;
 }
 return 0;
}
Compile the two versions with full optimizations as given below:

$ cc -O2 -o bit-pos-loop bit-pos-loop.c
$ cc -O2 -o bit-pos-asm bit-pos-asm.c
Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

$ time ./bit-pos-loop 250000000
and

$ time ./bit-pos-asm 250000000
The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.

GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.

Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

12. Exercises

  1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
  2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
  3. Interpret assembly code for switch statement.
  4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
  5. A nested function is defined inside another function (the "enclosing function"), such that:
    • the nested function has access to the enclosing function's variables; and
    • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

    Nested functions can be useful because they help control the visibility of a function.
    Consider Listing 13 given below:

    #Listing 13
    /* myprint.c */
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
     int i;
     void my_print(int k)
     {
      printf("%d\n",k);
     }
     scanf("%d",&i);
     my_print(i);
     return 0;
    }
    
    Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?
 

OS Process States

OS Process States:




How VTABLE works?

In C++, what’s a vtable and how does it work?

Vtables: Known by Many Different Names

It’s worth a few brain cells to remember that a vtable is known by many different names: virtual function table, virtual method table, and even as a dispatch table. When interviewing, it’s always a good idea to be familiar with the terminology.

Vtables: Used Behind the Scenes of Polymorphism



When working with virtual functions in C++, it’s the vtable that’s being used behind the scenes to help achieve polymorphism. And, although you can understand polymorphism without understanding vtables, many interviewers like to ask this question just to see if you really know your stuff.

Vtables contain pointers to virtual functions

Whenever a class itself contains virtual functions or overrides virtual functions from a parent class the compiler builds a vtable for that class. This means that not all classes have a vtable created for them by the compiler. The vtable contains function pointers that point to the virtual functions in that class. There can only be one vtable per class, and all objects of the same class will share the same vtable.

Vpointers point to the vtable

Associated with every vtable is what’s called a vpointer. The vpointer points to the vtable, and is used to access the functions inside the vtable. The vtable would be useless without a vpointer.
Subscribe to our newsletter for more free interview questions.
For any class that contains a vtable, the compiler will also add "hidden" code to the constructor of that class to initialize the vpointers of its objects to the address of the corresponding vtable.
Because you’re probably confused now, let’s take a look at an example so that we can explain vtables more clearly and in more detail. Take a look at the code below:
class Animal // base class
{

public:

int weight;

virtual int getWeight() {};
}


// Obviously, Tiger derives from the Animal class
class Tiger: public Animal {

public:

int weight;

int height; 

int getWeight() {return weight;};

int getHeight() {return height;};

int main()
{ 
 Tiger t1;
 
 /* below, an Animal object pointer is set to point
    to an object of the derived Tiger class  */
 
 Animal *a1 = &t1; 
 
 /*  below, how does this know to call the 
  definition of getWeight in the Tiger class, 
  and not the definition provided in the Animal 
  class  */
  
 a1 -> getWeight(); 
}   
}

Only one vtable per class

The vtable contains function pointers that point to the virtual functions in that class. It’s important to note that there can only be one vtable per class, and all objects of the same class will share the same vtable. This means that in the example above, the Animal and Tiger classes will each have their very own vtable, and any objects of the Animal or Tiger classes will use their respective class’s vtables.

Vtables are used at runtime



In the example above, because the Tiger class overrides the getWeight virtual function provided in the Animal class, the compiler will construct a vtable for the Tiger class. This vtable will only contain a function pointer for the getWeight function. The getHeight function will not be put inside the vtable because it is not virtual, nor does it override a virtual function in the Animal base class.
The question that the code above raises is at runtime how does the call "a1 -> getWeight()" know to use the version of getWeight provided in the Tiger – and not the Animal class. The answer, as you probably guessed, is by using vtables.
We now know that a vtable will be created for the Tiger class. And every vtable must have a vpointer that points to the vtable (otherwise the vtable can not be referenced). Let’s call the vpointer that belongs to the Tiger class vptr1 – this is just a name we created so we can show our point.
Take a look at the code below again.
// Obviously, Tiger derives from the Animal class
class Tiger: public Animal {

/*
...some code left out here

*/

int main()
{ 
 Tiger t1;
 
 /* below, an Animal object pointer is set to point
    to an object of the derived Tiger class  */
 
 Animal *a1 = &t1; 
 
 /*  below, how does this know to call the 
  definition of getWeight in the Tiger class, 
  and not the definition provided in the Animal 
  class  */
  
 a1 -> getWeight(); 
}   
} 

Because the Tiger class contains a pointer to the vtable called vptr1, the call "a1 -> getWeight()" will actually be translated to "(*(a1 -> vptr1 -> getWeight())".
Tiger t1;
 
/* below, an Animal object pointer is set to point
to an object of the derived Tiger class  */
 
Animal *a1 = &t1;       
a1 -> getWeight(); 

/* 
   The call above gets translated to this,
   assuming the pointer to the vtable for the
   Tiger class is called vptr1
   :  
   *(a1 -> vptr1 -> getWeight())

*/


This means that the definition of getWeight() provided in the Tiger class will be called, which is what we want.

What if vtables were not used?

Now that we’ve seen vtables in action we should ask ourselves again why C++ would use vtables? Hopefully at this point you can answer that question yourself. But, just to review, vtables are used so that the correct definition of the function can be called at runtime – basically to help achieve polymorphism. In our example, we wanted the definition of the getWeight() function provided in the Tiger class to be called. And that is exactly what will happen with the help of vtables. 

From C To Assembly Language

From C To Assembly Language

1. Overview

What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.
This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).
The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.
The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.
Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.
Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

2. Getting Started

To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.
The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.
Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eaxebxecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

movl $10, %eax
This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.
Our first assembly language program, stored in a file named first.s is shown in Listing 1.

#Listing 1
.globl main
main:
  movl $20, %eax
  ret
This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.
The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

3. Arithmetic, Comparison, Looping

Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

#Listing 2
.globl main
main: 
 movl $5, %eax
 movl $1, %ebx
L1: cmpl $0, %eax  //compare 0 with value in eax
 je L2   //jump to L2 if 0==eax (je - jump if equal)
 imull %eax, %ebx // ebx = ebx*eax
 decl %eax  //decrement eax
 jmp L1   // unconditional jump to L1
L2:  ret
L1 and L2 are labels. When control flow reaches L2ebx would contain the factorial of the number stored in eax.

4. Subroutines

When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3illustrates subroutine call and return in assembly language programs.

#Listing 3
.globl main
main:
 movl $10, %eax
 call foo
 ret
foo:
 addl $5, %eax
 ret
The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.
Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

5. Using The Stack

A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack.Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.
Figure 1Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
Figure 2The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.
The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.
In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

6. Allocating Space for Local Variables

It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.
The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

#Listing 4
.globl main
main:
 call foo
 ret
foo:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 movl $10, -4(%ebp)
 movl %ebp, %esp
 popl %ebp
 ret
First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebpesp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

Table 1

6. Parameter Passing and Value Return

The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

#Listing 5
.globl main
main:
 movl $12, %ebx
 pushl %ebx
 call sqr
 addl $4, %esp       //adjust esp to its value before the push
 ret
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax    //compute eax * eax, store result in eax 
 ret
Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

8. Mixing C and Assembler

Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typingcc main.c sqr.s.
The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

#Listing 6
//main.c
main()
{
 int i = sqr(11);
 printf("%d\n",i);
}

//sqr.s
.globl sqr
sqr:
 movl 4(%esp), %eax
 imull %eax, %eax
 ret

#Listing 7
//print.c
print(int i)
{
 printf("%d\n",i);
}

//main.s
.globl main
main:
 movl $123, %eax
 pushl %eax
 call print
 addl $4, %esp
 ret

9. Assembler Output Generated by GNU C

I guess this much reading is sufficient for understanding the assembler output produced by gccListing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

#Listing 8
//add.c
int add(int i,int j)
{
 int p = i + j;
 return p;
}

//add.s
.globl add
add:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp  //create space for integer p
 movl 8(%ebp),%edx //8(%ebp) refers to i
 addl 12(%ebp), %edx //12(%ebp) refers to j
 movl %edx, -4(%ebp) //-4(%ebp) refers to p
 movl -4(%ebp), %eax //store return value in eax
 leave   //i.e. to movl %ebp, %esp; popl %ebp ret
The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

pushl $20
pushl $10
call add
Note that the second parameter is passed first.

10. Global Variables

Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

#Listing 9
//glob.c
int foo = 10;
main()
{
 int p foo;
}

//glob.s
.globl foo
foo:
 .long 10
.globl main
main:
 pushl %ebp
 movl %esp,%ebp
 subl $4,%esp
 movl foo,%eax
 movl %eax,-4(%ebp)
 leave
 ret
The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

11. System Calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.
There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.
Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.
System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.
Consider Listing 10 given below.

#Listing 10
#fork.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
 fork();
 printf("Hello\n");
 return 0;
}
Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.
There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

11. Inline Assembly Programming

The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

asm ("fsin" : "=t" (answer) : "0" (angle));
is an x86-specific way of coding this C statement:

answer = sin(angle);
You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

  • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
  • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
#Listing 11
#Name : bit-pos-loop.c 
#Description : Find bit position using a loop

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
 long max = atoi (argv[1]);
 long number;
 long i;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  for (i=(number>>1), position=0; i!=0; ++position)
   i >>= 1;
  result = position;
 }
 return 0;
}

#Listing 12
#Name : bit-pos-asm.c
#Description : Find bit position using bsrl

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
 long max = atoi(argv[1]);
 long number;
 unsigned position;
 volatile unsigned result;

 for (number = 1; number <= max; ; ++number) {
  asm("bsrl %1, %0" : "=r" (position) : "r" (number));
  result = position;
 }
 return 0;
}
Compile the two versions with full optimizations as given below:

$ cc -O2 -o bit-pos-loop bit-pos-loop.c
$ cc -O2 -o bit-pos-asm bit-pos-asm.c
Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

$ time ./bit-pos-loop 250000000
and

$ time ./bit-pos-asm 250000000
The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.
GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.
Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

12. Exercises

  1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
  2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
  3. Interpret assembly code for switch statement.
  4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
  5. A nested function is defined inside another function (the "enclosing function"), such that:
    • the nested function has access to the enclosing function's variables; and
    • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

    Nested functions can be useful because they help control the visibility of a function.
    Consider Listing 13 given below:

    #Listing 13
    /* myprint.c */
    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
     int i;
     void my_print(int k)
     {
      printf("%d\n",k);
     }
     scanf("%d",&i);
     my_print(i);
     return 0;
    }
    
    Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?

    << Prev  |  TOC  |  Front Page  |  Talkback  |  FAQ  |  Next >>
    LINUX GAZETTE
    ...making Linux just a little more fun!
    From C To Assembly Language
    By Hiran Ramankutty


    1. Overview

    What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.
    This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).
    The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.
    The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.
    Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.
    Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

    2. Getting Started

    To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.
    The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.
    Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eaxebxecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

    movl $10, %eax
    
    This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.
    Our first assembly language program, stored in a file named first.s is shown in Listing 1.

    #Listing 1
    .globl main
    main:
      movl $20, %eax
      ret
    
    This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.
    The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

    3. Arithmetic, Comparison, Looping

    Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

    #Listing 2
    .globl main
    main: 
     movl $5, %eax
     movl $1, %ebx
    L1: cmpl $0, %eax  //compare 0 with value in eax
     je L2   //jump to L2 if 0==eax (je - jump if equal)
     imull %eax, %ebx // ebx = ebx*eax
     decl %eax  //decrement eax
     jmp L1   // unconditional jump to L1
    L2:  ret
    
    L1 and L2 are labels. When control flow reaches L2ebx would contain the factorial of the number stored in eax.

    4. Subroutines

    When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3illustrates subroutine call and return in assembly language programs.

    #Listing 3
    .globl main
    main:
     movl $10, %eax
     call foo
     ret
    foo:
     addl $5, %eax
     ret
    
    The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.
    Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

    5. Using The Stack

    A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack.Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.
    Figure 1Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.
    Figure 2The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.
    The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.
    In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

    6. Allocating Space for Local Variables

    It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.
    The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

    #Listing 4
    .globl main
    main:
     call foo
     ret
    foo:
     pushl %ebp
     movl %esp, %ebp
     subl $4, %esp
     movl $10, -4(%ebp)
     movl %ebp, %esp
     popl %ebp
     ret
    
    First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebpesp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

    Table 1

    6. Parameter Passing and Value Return

    The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

    #Listing 5
    .globl main
    main:
     movl $12, %ebx
     pushl %ebx
     call sqr
     addl $4, %esp       //adjust esp to its value before the push
     ret
    sqr:
     movl 4(%esp), %eax
     imull %eax, %eax    //compute eax * eax, store result in eax 
     ret
    
    Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

    8. Mixing C and Assembler

    Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typingcc main.c sqr.s.
    The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

    #Listing 6
    //main.c
    main()
    {
     int i = sqr(11);
     printf("%d\n",i);
    }
    
    //sqr.s
    .globl sqr
    sqr:
     movl 4(%esp), %eax
     imull %eax, %eax
     ret
    

    #Listing 7
    //print.c
    print(int i)
    {
     printf("%d\n",i);
    }
    
    //main.s
    .globl main
    main:
     movl $123, %eax
     pushl %eax
     call print
     addl $4, %esp
     ret
    

    9. Assembler Output Generated by GNU C

    I guess this much reading is sufficient for understanding the assembler output produced by gccListing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

    #Listing 8
    //add.c
    int add(int i,int j)
    {
     int p = i + j;
     return p;
    }
    
    //add.s
    .globl add
    add:
     pushl %ebp
     movl %esp, %ebp
     subl $4, %esp  //create space for integer p
     movl 8(%ebp),%edx //8(%ebp) refers to i
     addl 12(%ebp), %edx //12(%ebp) refers to j
     movl %edx, -4(%ebp) //-4(%ebp) refers to p
     movl -4(%ebp), %eax //store return value in eax
     leave   //i.e. to movl %ebp, %esp; popl %ebp ret
    
    The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

    pushl $20
    pushl $10
    call add
    
    Note that the second parameter is passed first.

    10. Global Variables

    Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

    #Listing 9
    //glob.c
    int foo = 10;
    main()
    {
     int p foo;
    }
    
    //glob.s
    .globl foo
    foo:
     .long 10
    .globl main
    main:
     pushl %ebp
     movl %esp,%ebp
     subl $4,%esp
     movl foo,%eax
     movl %eax,-4(%ebp)
     leave
     ret
    
    The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

    11. System Calls

    Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.
    There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.
    Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.
    System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.
    Consider Listing 10 given below.

    #Listing 10
    #fork.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/types.h>
    #include <unistd.h>
    
    int main()
    {
     fork();
     printf("Hello\n");
     return 0;
    }
    
    Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.
    There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

    11. Inline Assembly Programming

    The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.
    The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

    asm ("fsin" : "=t" (answer) : "0" (angle));
    
    is an x86-specific way of coding this C statement:

    answer = sin(angle);
    
    You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

    • Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
    • Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.
    #Listing 11
    #Name : bit-pos-loop.c 
    #Description : Find bit position using a loop
    
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, char *argv[])
    {
     long max = atoi (argv[1]);
     long number;
     long i;
     unsigned position;
     volatile unsigned result;
    
     for (number = 1; number <= max; ; ++number) {
      for (i=(number>>1), position=0; i!=0; ++position)
       i >>= 1;
      result = position;
     }
     return 0;
    }
    

    #Listing 12
    #Name : bit-pos-asm.c
    #Description : Find bit position using bsrl
    
    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char *argv[])
    {
     long max = atoi(argv[1]);
     long number;
     unsigned position;
     volatile unsigned result;
    
     for (number = 1; number <= max; ; ++number) {
      asm("bsrl %1, %0" : "=r" (position) : "r" (number));
      result = position;
     }
     return 0;
    }
    
    Compile the two versions with full optimizations as given below:

    $ cc -O2 -o bit-pos-loop bit-pos-loop.c
    $ cc -O2 -o bit-pos-asm bit-pos-asm.c
    
    Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

    $ time ./bit-pos-loop 250000000
    
    and

    $ time ./bit-pos-asm 250000000
    
    The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.
    GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.
    Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

    12. Exercises

    1. Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
    2. Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
    3. Interpret assembly code for switch statement.
    4. Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
    5. A nested function is defined inside another function (the "enclosing function"), such that:
      • the nested function has access to the enclosing function's variables; and
      • the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.

      Nested functions can be useful because they help control the visibility of a function.
      Consider Listing 13 given below:

      #Listing 13
      /* myprint.c */
      #include <stdio.h>
      #include <stdlib.h>
      
      int main()
      {
       int i;
       void my_print(int k)
       {
        printf("%d\n",k);
       }
       scanf("%d",&i);
       my_print(i);
       return 0;
      }
      
      Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?