ECE 272 Laboratory - Lab 4

ECE L272 Lab 4
Addressing Modes: Arrays and Pointers

Objectives

Background

In this lab we begin to explore different addressing modes. An addressing mode is the means by which the computer selects the data that is used in an instruction. The addressing mode is determined by the syntax %eax with which you specify the instruction's operand. Let's begin by clarifying these three terms.

First, data are numerical values that are used in computations. For example, if the CPU currently has the value 3 in the %eax register and the value 4 in the %ebx register and there is an instruction addl %eax, %ebx, then there are two data values involved: 3 and 4. The operands, however, are the symbols %eax and %ebx. Now if the instruction were addl $4, %eax, then the data would be the same, but the operands would be the symbol %eax and the number 4. So now that data and operand are understood, what is the addressing mode? The addressing mode describes the relationship between the operands and the data (i.e., how do we use the operands to get the correct data?). These examples include instances of the two simplest addressing modes, register addressing and immediate addressing. Register addressing means the data is in a register, and immediate addressing means the data is supplied as part of the instruction.

What about when we access a variable, as in addl %eax, VAR where VAR is a symbol that represents a memory location? This is called direct addressing, which means the memory address of the data is supplied with the instruction. How is the address supplied? When the assembler translates the program into executable machine code, it assigns an address to the symbol VAR, and it is that address that is included with the instruction in the program memory. Thus, the operand is the memory address represented by the symbol VAR, and the data is the contents of memory at that address.

Now let's look at a different kind of variable. Consider the following fragment of C code:

int a[10];   /* an array of 10 integers */
int i;
main()
{
     a[0] = 10;
     a[4] = 20;
     a[i] = 30;
}
So far we have not used arrays. First of all, how do we declare an array in assembly language? We use the same mechanism as a regular variable but specify more than one value in the initialization part of the declaration as follows:

       .comm   a, 40
This declares ten long words of memory and initializes them all to zero. a is a symbol that is equal to the address of the first word. Don't forget that! If we are going to set all of the elements of the array to the same initial value, we can also use the FILL construct as follows:

       .fill   10, 4, 0
If we wanted to set the first three elements to the values 1,2,3 and the rest to zero, we could write:

a:       .int  1, 2, 3
         .fill 7, 4, 0
Now that we can declare the array, how do we access its elements? In the first example a[0] = 10;, the access is easy since this refers to the first element of the array. We can access it using simple direct addressing:

      movl     $10, a
Remember that the symbol a is the address of the first word of the array, so the movl $10, a simply moves a ten into the first long word of the array.

The second example a[4] = 20; is a little more tricky, since we don't have a symbol defined for the 5th element. We can, however, specify an offset from the start of the array as follows:

      movl     $20, a+16
Recall that each element of a is four bytes, so the element a[4] begins at address a+16. What addressing mode is this? This is still direct addressing mode. The assembler adds the value 16 to the address a before it includes it as part of the instruction. After assembly, there is no indication that the address supplied was originally a label or a computed value. In fact, if you wanted to read from some fixed memory address (which does come up from time to time), you can forget the label altogether and write

      movl     4096, %eax
The labels just make it easier to declare and refer to variables by letting the assembler worry about the actual addresses of them. For example, if array a began at memory location 142H, the assembler would translate the line mov $10, a to a line like movl $10, 142H, and for movl $10, a+8, it would produce movl $10, 14aH. It would be hard for us to keep track of these addresses of our variables, so we use the symbols and offsets from those symbols instead-the assembler does the hard part.

Finally we come to the last example: a[i] = 30; This example is different from the rest. It is not possible to know the address of the data when the program is assembled (because i can change during execution), so the address cannot be specified in the instruction. In order to do this type of addressing, the 8086 gives us several more addressing modes. The best mode for this example is indexed (sometimes called direct indexed ) addressing. In indexed addressing the 8086 uses the contents of a register along with a constant to compute the memory address of the data. The constant is called the displacement, which can be a number or a label. This displacement is typically the starting address of the array, and the register then acts like an index into the array. To do indexed addressing, you use two special index registers: %esi and %edi. %esi stands for ``source index'' and %edi stands for ``destination index,'' but you may use the two interchangeably except in certain instructions which will be covered in a later lab. Thus, to implement the example C fragment, we would write:

         .comm    a, 40
         .comm    i, 4
         movl     $10, a          ; a[0] = 10;
         movl     $20, a+16       ; a[4] = 20;
         movl     i, %edi
         movl     $30, a(,%edi,4) ; a[i] = 30;
Here, the last instruction uses indexed addressing. The address of the memory location accessed by that instruction is found by taking the contents if %edi and adding it to the address represented by the symbol a.

There is another situation in which the address of the data is not known at assembly time, and that is when a program uses a pointer. In this situation, the address of the data is specified entirely in a register. For this case, the 8086 provides register indirect addressing. In register indirect addressing the %ebx register holds the address of the data to be addressed as in the following example:

int *p;
main()
{
     *p = 40;
}
which can be encoded in assembly language as:

        .comm   p, 4
        movl    p, %ebx
        movl    $40, (%ebx)
Note that you may also use %esi and %edi in register indirect addressing.

One might think that using a variable as a pointer would look like movl $40, (p), but in 8086 assembly this is not the case. Pointers can be stored in any register except %esp. If we have a pointer stored in memory, we must move it into one of the registers it before we can use it as a pointer.

Finally, there is the case where we have a pointer to an array, as is often the case in C programs. Here we use base indexed addressing which allows us to specify two registers, %ebx which holds the base address of the array, and %esi or %edi, which holds the index. If the array is an array of records, a constant offset may also be specified. Here is an example:

int    *ap;
struct {int a,b;} *asp;
int    i;
main()
{
       ap[3] = 50;
       ap[i] = 60;
       asp[i].b = 70;
}
which is coded in assembly language as:

        .comm    ap, 4
        .comm    asp, 4
        .comm    i, 4
        . . .
        movl     ap, %ebx    
        movl     $50, 12(%ebx)         # ap[3] = 50;
        movl     i, %edi      
        movl     $60, (%ebx, %edi, 4)  # ap[i] = 60;
        movl     asp, %ebx             # i is still in di
        movl     $70, 4(%ebx, %edi, 4) # asp[i].b = 70;
One final note about these addressing modes: Using the 8086, one can implement most of these expressions using %esi, %edi, and %ebx interchangeably. In fact, the only place this is not true is when specifying two registers (base indexed mode): one of the registers must be %ebx. There is, however, a distinction between a base register, which holds a pointer, and an index register which is used to compute an offset from the base.

So, to recap addressing modes:

immediate addressing
The instruction actually contains the operand itself rather than an address or other information describing where the operand is.
movl $4, %eax
register addressing
The data is in a register.
movl %eax, %ebx
direct addressing
The memory address of the data is supplied with the instruction. The assembler does this by assigning an address to the symbol variable with respect to an offset.
movl %eax, mem_variable
or
movl %eax, mem_variable+16
indexed addressing
The contents of a register along with a constant are used to compute the memory address of the data. The constant is called the displacement.
movl $30, a(,%edi,4)
register indirect addressing
The %ebx register holds the address of the data to be addressed.
movl $40, (%ebx)
base-indexed addressing
The %ebx register holds the base of the address (like that of an array) and %esi or %edi hold the index. The constant (4 in the following example) cooresponds to the number of bytes of each element in the array.
movl $60, (%ebx, %edi, 4)
Your assignment for this lab will give you a chance to work with arrays and pointers and the control structure mostly associated with them: loops. You will implement a program that is already written for you in C. The program is ``atoi'' which converts an ascii string of numbers into a numerical value. It is an integral part of any program's I/O library.

Assignment - due [see schedule]

  1. Try the assignment below

THIS IS THE SPECIFICATION OF WHAT THE ASSEMBLY FUNCTION THAT NEED TO BE WRITTEN WILL DO. DO NOT TYPE THIS CODE IN.

/* begin assignment specification code lab4asm.s */
AtoI()
{
     sign = 1;
     /* skip over leading spaces and tabs */
     while (*ascii == ' ' || *ascii == '\t')
          ascii++;
     /* check for a plus or minus sign */
     if (*ascii == '+')
          ascii++;
     else if (*ascii == '-')
          {
          sign = -1;
          ascii++;
          }
     /* convert characters until a non-digit is found */
     *intptr = 0;
     for(i = 0; ascii[i] >= '0' && ascii[i] <= '9'; i++);

     i--;    /* back up one to the last digit */
     multiplier = 1;
     for(; i>=0 ; i--)
          {
          *intptr += multiplier * (ascii[i] - '0');
          multiplier *= 10;
          }
     /* multiply in the sign */
     *intptr = *intptr * sign;
}
/* end assignment specification code lab4asm.s */

THIS IS THE C CODE WHICH ACTS AS A DRIVER FOR THE ASSEMBLY FUNCTION BY CALLING IT.

lab4drv.c - the following code
----- BEGIN C CODE -----
/* begin driver code lab4drv.c */
#include <stdio.h>

char input[10];
int output
char *ascii;
int  *intptr;
int  sign, multiplier, i;

int main(int argc, char **argv)
{
     printf("Enter a number -> ");
     fgets(input, 10, stdin);
     ascii = input;
     intptr = &output;
     AtoI();
     printf("\nAscii is     -> %sInteger is   -> %d\n", input, output);
 
     return 0;
}
/* end driver code */
----- END C CODE -----

THIS IS THE STUB OF THE ASSEMBLY CODE TO SOLVE THE PROBLEM. YOU WILL GET SOMETHING LIKE THIS DURING YOUR PRACTICAL EXAM. YOU MUST FILL IN THE BLANKS TO COMPLETE THE PROBLEM.

lab4asm.s - the following code
----- BEGIN ASSEMBLY CODE STUB -----
.globl   AtoI
.type    AtoI,@function

AtoI:
   /* prolog */
   pushl %ebp
   movl  %esp, %ebp
   pushl %ebx
   pushl %esi
   pushl %edi

   /* put code here */

return:
   /* epilog */
   popl  %edi
   popl  %esi
   popl  %ebx
   movl  %ebp, %esp
   popl  %ebp
   ret
----- END ASSEMBLY CODE -----


THIS IS THE SOLUTION TO THE PROBLEM WITH THE STUBS FILLED IN. YOU SHOULD TRY AND DO THE PROBLEM WITHOUT LOOKING AT THIS ANSWER.

lab4asm-sol.s - the following code
.globl   AtoI
.type    AtoI,@function

AtoI:
   /* prolog */
   pushl %ebp
   movl  %esp, %ebp
   pushl %ebx
   pushl %esi
   pushl %edi

   /* put code here */
   movl  $1, sign

## while(*ascii == ' ' || *ascii == '\t') ascii++; #####################
CheckWhiteSpace:
   movl  ascii, %eax # put the char array pointer in %eax
   cmpb  $32, (%eax) # (deref.) comp this element to a space
   jne   CheckTab # if not equal, check if it's a tab
   addl  $1, ascii   # if equal, move ahead one space
   jmp   CheckWhiteSpace   # reloop and check next space

CheckTab:
   movl  ascii, %eax # put the char array pointer in %eax
   cmpb  $9, (%eax)  # (deref.) comp this element to a tab
   jne   CheckPlus   # if not equal, goto the next checks
   addl  $1, ascii   # if equal, move ahead one space
   jmp   CheckWhiteSpace   # reloop and check next space
#######################################################################

## if(*ascii == '+') ascii++; #########################################
CheckPlus:
   movl  ascii, %eax
   cmpb  $43, (%eax)
   jne   CheckMinus
   addl  $1, ascii
#######################################################################

## else if(*ascii == '-') { sign = -1; ascii++; } #####################
CheckMinus:
   movl  ascii, %eax
   cmpb  $45, (%eax)
   jne   ChecksDone
   movl  $-1, sign
   addl  $1, ascii
#######################################################################

ChecksDone:
   movl  intptr, %eax
   movl  $0, (%eax)
   movl  $0, i

LoopSearch:
   movl  $0, %eax
   movl  ascii, %ebx
   movl  i, %esi
   movb  (%ebx,%esi,1),%al
   cmp   $48,%eax
   jl LoopSearchDone
   cmp   $57,%eax
   jg LoopSearchDone
   addl  $1, i
   jmp   LoopSearch

LoopSearchDone:
   addl  $-1, i
   movl  $1, multiplier

LoopCalculate:
   cmp   $0, i
   jl FinishUp

   movl  $0, %ecx
   movl  ascii, %ebx
   movl  i, %esi
   movl  $0, %eax
   movb  (%ebx, %esi, 1), %al
   subl  $48, %eax
   mull  multiplier
   movl  intptr, %ebx
   addl  %eax, (%ebx)
   movl  multiplier, %eax
   movl  $10, %edx
   mull  %edx
   movl  %eax, multiplier
   addl  $-1, i
   jmp   LoopCalculate

FinishUp:
   movl  intptr, %ebx
   movl  (%ebx), %eax
   mull  sign
   movl  %eax, (%ebx)

return:
   /* epilog */
   popl  %edi
   popl  %esi
   popl  %ebx
   movl  %ebp, %esp
   popl  %ebp
   ret