ECE 272 Laboratory - Lab 2

ECE L272 Lab 2
Simple Assignments

Objectives

Background

In this lab we will begin to explore the details of assembly language by looking at simple expression evaluation. We will provide you with a C program that calls assembly language routines that you will write. You need not worry about the interface between the C and assembler-we have taken care of that. In each case, the code you will provide will be one or more assignment statements. There will be no branching, no arrays, no structures, and no pointers. We will get to those soon enough.

Computer programs are composed of two basic elements: memory locations for storing data (variables) and instructions (statements) for manipulating the data. Assembly language programs have these same features plus one more: registers. Like memory variables, you may store values into registers and use them in computations. The microprocessor and the memory are two different entities within the computer. Registers are fast, temporary variables located within the microprocessor, whereas memory variables (or simply ``variables'') exist in the computer's memory which is more complicated to access. It is best to use registers as much as possible, but there are a limited number of them. We can't use only memory variables either, because the 8086 places limitations on us: You can use only one memory variable in a computation.

There is also a difference in the instructions provided in assembly language: You may only perform one computation per statement. For example, in C you can say:

        int a,b,c,d,e;
        a = ((b + c) - (d + e)) - 10;
This expression performs four computations in one statement using four variables and a constant. In assembly language you cannot perform such a complex statement. The exact restrictions vary from computer to computer (unlike high-level languages), but we will always use the Intel 8086 as our model. In Intel 8086 assembly language each instruction may perform only one computation at a time and may reference only one memory variable per computation. The other required data must be in a register. The general purpose registers provided in the 80386 are A, B, C, and D. Thus, the previous example would look like this:

        .comm   a, 4
        .comm   b, 4
        .comm   c, 4
        .comm   d, 4
        .comm   e, 4
        .text
        movl    b,%eax        ; move b into register ax
        addl    c,%eax        ; add c to register ax
        movl    d,%ebx        ; move d into register bx
        addl    e,%ebx        ; add e to register bx
        subl    %ebx,%eax     ; subtract register bx from register ax
        subl    $10,%eax      ; subtract 10 from register ax
        movl    %eax,a        ; move register ax out to a
Let's break this down piece by piece. First of all, in order to declare a variable, you use a statement that will ``define storage'' and assign a name or symbol to that storage. Actually, this isn't an instruction at all but an``assembler directive.'' These are commands to the assembler program (as opposed to the computer running the program) to perform some action - in this case reserve memory for a variable. There are several of these directives that reserve memory, depending on what size storage you want to define (just like the char, short, int, and long in C). There are directives for uninitialized variables, and directives to initialize variables. The most common directive, used here is .comm which creates a symbol with the name given as the first argument and reserves the number of bytes listed as the second argument. Note that there is no type information associated with the memory or the symbol. Alternatively we could have chosen to initialize them to some value. In C we could have said:

        int a;          /* uninitialized */
        int b = 10;     /* decimal */
        int c = 0x20;   /* hexadecimal */
        int d = 'a';    /* ascii */
        int e = 040;    /* octal */
        int f = 024;    /* C does not have a binary representation */
			        /* this is octal */
which, in assembly language would have been:

        .comm   a, 4
b:      .int    10
c:      .int    0x20
d:      .int    'a'
e:      .int    040
f:      .int    0b000010100
Note the syntax for expressing values in different number bases including the binary syntax, which does not exist in C. The symbol created is defined by the label to the left of the colon on each line. The value it is initialized to is listed on the right of the directive. Other directives include .byte, .hword, .word, .quad, .octa to initialize 1, 2, 4, 8, or 16 byte integers, and .float, .single, .double to initialize 4, 4, and 8 byte floating point numbers.

Now let us consider the evaluation of the assignment statement. You should notice that assembly language does not use the standard mathematical symbols for addition and multiplication and so on like high level languages do. Instead, each operation has its own instruction addl, mull, subl, etc. One thing to notice is that you have to tell the assembler what type the data you are operating on is. Data can be 1, 2 or 4 bytes. This is specified with an opcode ``suffix'' of either b, w, or l. All of the instruction so far in these examples have used 4 byte ``long-words'', and so all of the opcodes have had an l suffix as in addl, mull, subl. There are also coresponding instructions for the other sizes such as addw, mulb, subw. There is also a new operation not seen in C, the movl (move) instruction. You will find that most assembly language programs have alot of movl, movw and movb instructions. As you may have guessed, the move instruction simply moves data from one place to another, and thus the instruction:

        movl        src, dst
is equivalent to the simple assignment statement:

        dst = src;
the main limitation, however, is that at least one of dst and src must be a register. In other words, there can be no more than one memory variable in a movl instruction, but they both can be registers, if you want. The order of the operands determines which one is the source and which is the destination. The source comes first, and the destination comes second. By placing a register second, we can load a memory variable into that register. By placing the memory variable second, we can store a register into that memory variable. We can, of course, move a register into a register as well, but we cannot move a memory variable into another memory variable.

The next thing to notice is that the arithmetic instructions only have two operands. We cannot in one instruction say:

        dst = src1 + src2;
even if all of dst, src1, and src2 are registers. In assembly language we always ``add to'' a value as in:

        dst = dst + src;
or

        dst += src;
which in assembly language is just:

        addl     src, dst
but remember, either dst or src or both must be a register. Thus, if we want to add one variable to another, we must first move one into a register and then perform the addition:

        int a,b;
        a += b;
is

        .comm   a, 4
        .comm   b, 4
        movl    b, %eax
        addl    %eax, a
and

        int a,b,c;
        a = b + c;
is

        .comm        a, 4
        .comm        b, 4
        .comm        c, 4
        movl    b,%eax
        addl    c,%eax
        movl    %eax,a
Finally, just as in C you can specify a constant to use in a computation, you can specify a constant in assembly language:

        int a;
        a += 2;
is

        .comm        a, 4
        addl    $2, a
Now, once you understand these ideas, you should be able to go back to the first example and see if you can understand how it implements the original C expression (do it!).

Now it is time to expand our horizons a little more. The first thing to consider is that the 80386 can operate on several different sizes of data. The primary data sizes are 32, 16, and 8 bits. In support of this, the registers can be referenced as 32-bit registers, 16-bit registers or 8-bit registers. To do this, each of the four registers we have seen (%eax, %ebx, %ecx, %edx) can alternatively be referenced as four 16-bit registers (%ax, %bx, %cx, %dx). These four registers represent the least-significant 16 bits of their corresponding 32-bit ``extended'' registers. In addition, each of these registers can be thought of as a pair of 8-bit registers (%ah, %al, %bh, %bl, %ch, %cl, %dh, %dl) These are NOT different registers, they simply refer to part of the 16-bit registers (which are, in turn part of the 32-bit registers). Thus, if you store a value in %ax, and then a value in %ah, the value in %ax will no longer be the orginial value you put there. Specifically, %ah refers to the ``high order byte'' of %ax and AL refers to the ``low order byte'' of %ax. This means %ah is the 8 most significant bits of %ax and %al is the 8 least significant bits of %ax. When you refer to a register in an instruction, the size of the register must match the size of the opcode. Thus the instruction addb 2,%al is an 8-bit instruction, whereas addw 2,%ax is a 16-bit instruction.

Multiplication and division are different, however, since they can operate on more than one data size at a time. These instructions are somewhat more complex than the other operations. First, there are two versions: multiplication for integers imull and multiplication of unsigned numbers mull. The simplest form of these two instructions has a single operand (which can be memory or register). The value of this operand is multiplied times the ``a'' register (%al, %ax, or %eax) and places the result in the ``a'' register, and potentially the ``d'' register. Yes, that's right, we have no control in the ``mul'' statement over the destination register and also no control over one of the multiplication values. So, this means that if you want to multiply the contents of register B and register C you cannot do mull %ebx, %ecx. You have to move the contents of one of them to register A like movl %ebx, %eax and then multiply by register C like mull %ecx.

When we multiply two 8-bit numbers, we get a 16-bit result. When we multiply two 16-bit numbers we get a 32-bit result, and when we multiple two 32-bit numbers, we get a 64-bit result. Thus, in each case our result will take more space than the operands. If this doesn't make sense to you, try multiplying the maximum integer we can represent in 8 bits times itself (the highest multiplication we would do on two 8 bit values). How many bits does the result require?

In the 8-bit case the result is in %ah:%al (otherwise known as %ax). For 16 bits, the result is in %dx:%ax (which is weird, but it was done this way to be compatible with pre-32-bit code). Finally, in the 32-bit case, the result is in %edx:%eax.

So, this means that if you do the above example where you multiply a 32 bit C register by a 32 bit B register (by first moving B into A) your result will be 64 bits and occupy BOTH the A register and the D register, %edx:%eax. So, make sure you do not lose data you had wanted in %edx.

The imull instruction does have two other forms that take 2 and 3 operands. In each of these, the last operand must be a register. In this case you can multiply without using the ``a'' register. Also in this case, the result is truncated to the same number of bits as the operands, so additional registers are not needed. Here are three formulas you can use:

mulb     X8bit     Þ     %ax = %al * X8bit

mulw     X16bit     Þ     %dx:%ax = %ax * X16bit

mull     X32bit     Þ     %edx:%eax = %eax * X32bit

The text on the left of the odd symbol above is an example of the code you type in assembly. On the right if the symbol is a description of what the computer then interprets the operation as. Notice that for the 32bit multiply the result is placed into %edx:%eax like as was explained above.

You supply X which will be either a 32-, 16- or 8-bit register or a 32-, 16- or 8-bit memory variable.

The format of the divide instruction is similar: divl <divisor>. Like the mull instruction, divl assumes the dividend and the destinations of the quotient and remainder based on the opcode suffix. Here are the three formulas:

divb     X8bit     Þ     %al = %ax / X8bit,     %ah = remainder

divw     X16bit     Þ     %ax = %dx:%ax / X16bit,     %dx = remainder

divl     X32bit     Þ     %eax = %edx:%eax / X32bit,     %edx = remainder

Please note that when you divide by a 32- or 16-bit number, make sure that %dx or %edx contains the correct value (usually zero). This is so very important as by the end of the semester many of your bugs will take you a while to find and this is 25% of the time the bug. If you do a 32 bit divide, notice above that it infers you want to use %edx:%eax. What if your D register has data in it? Then the division will be altered because it will use that data. So, before doing a 32bit divide you MUST zero out the D register - unless of course you want it that large.

Assignment - due [see schedule]

  1. Try the assignment below

THIS IS THE SPECIFICATION OF WHAT THE ASSEMBLY FUNCTIONS THAT NEED TO BE WRITTEN WILL DO. DO NOT TYPE THIS CODE IN.

/* begin assignment specification code lab2asm.s */
int digit1;
int digit2;
int digit3;
int diff;
int sum;
int product;
int remainder;

dodiff() {
   diff = (digit1 * digit1) + (digit2 * digit2) - (digit3 * digit3);
}

dosumprod() {
   sum = digit1 + digit2 + digit3;
   product = digit1 * digit2 * digit3;
}

doremainder() {
   remainder = product % sum;
}

/* end assignment specification code lab2asm.s */

THIS IS THE C CODE WHICH ACTS AS A DRIVER FOR THE ASSEMBLY FUNCTIONS BY CALLING THEM.

lab2drv.c - the following code
/* ----- BEGIN C CODE ----- */
/* begin driver code lab2drv.c */
extern int digit1;
extern int digit2;
extern int digit3;
extern int diff;
extern int sum;
extern int product;
extern int remainder;

int main(int argc, char **argv)
{
   for (digit1 = 0; digit1 < 10; digit1++) {
      for (digit2 = digit1; digit2 < 10; digit2++) {
         for (digit3 = digit2; digit3 < 10; digit3++) {
            dodiff();           
            if (diff == 0)      
               printf("%d%d%d PT\n",digit1,digit2,digit3);
            dosumprod();        
            if (sum && product) {
               doremainder();           
               if (remainder == 0) printf("%d%d%d ED\n",digit1,digit2,digit3);
            }                   
         }              
      }         
   }    
   return 0;
}
/* end driver code */
/* ----- END C CODE ----- */

THIS IS THE STUB OF THE ASSEMBLY CODE TO SOLVE THE PROBLEM. YOU WILL GET SOMETHING LIKE THIS DURING YOUR PRACTICAL EXAM. YOU MUST FILL IN THE BLANKS TO COMPLETE THE PROBLEM.

lab2asm.s - the following code
/* ----- BEGIN ASSEMBLY CODE STUB ----- */
   .globl   dodiff
   .type    dodiff, @function
dodiff:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

   .globl   dosumprod
   .type    dosumprod, @function
dosumprod:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

   .globl   doremainder
   .type    doremainder, @function
doremainder:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

/* declare variables here */
/* ----- END ASSEMBLY CODE ----- */

THIS IS WHAT YOUR OUTPUT SHOULD LOOK LIKE.

000 PT
011 PT
022 PT
033 PT
044 PT
055 PT
066 PT
077 PT
088 PT
099 PT
123 ED
138 ED
145 ED
159 ED
167 ED
189 ED
224 ED
235 ED
246 ED
257 ED
268 ED
279 ED
333 ED
345 PT
345 ED
347 ED
357 ED
369 ED
448 ED
456 ED
459 ED
466 ED
578 ED
579 ED
666 ED
678 ED
789 ED
999 ED

THIS IS THE SOLUTION TO THE PROBLEM WITH THE STUBS FILLED IN. YOU SHOULD TRY AND DO THE PROBLEM WITHOUT LOOKING AT THIS ANSWER.

lab2asm-sol.s - the following code
/* ----- BEGIN ASSEMBLY CODE SOLUTION ----- */
   .globl   dodiff
   .type    dodiff, @function
dodiff:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */
   movl     $0, %edx
   movl     digit1, %eax
   mull     %eax
   movl     %eax, %ebx
   movl     digit2, %eax
   mull     %eax
   movl     %eax, %ecx
   movl     digit3, %eax
   mull     %eax
   addl     %ebx, %ecx
   subl     %eax, %ecx
   movl     %ecx, diff

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

   .globl   dosumprod
   .type    dosumprod, @function
dosumprod:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */
   movl     digit1, %eax
   addl     digit2, %eax
   addl     digit3, %eax
   movl     %eax, sum

   movl     $0, %edx
   movl     digit1, %eax
   mull     digit2
   mull     digit3
   movl     %eax, product

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

   .globl   doremainder
   .type    doremainder, @function
doremainder:
   /* prolog */
   pushl    %ebp
   pushl    %ebx
   movl     %esp, %ebp

   /* put code here */
   movl     $0, %edx
   movl     product, %eax
   divl     sum
   movl     %edx, remainder

   /* epilog */
   movl     %ebp, %esp
   popl     %ebx
   popl     %ebp
   ret

/* declare variables here */
   .comm    digit1, 4
   .comm    digit2, 4
   .comm    digit3, 4
   .comm    diff, 4
   .comm    sum, 4
   .comm    product, 4
   .comm    remainder, 4
/* ----- END ASSEMBLY CODE ----- */