In this lab we begin to explore different addressing modes. An addressing mode is the means by which the computer selects the data that is used in an instruction. The addressing mode is determined by the syntax %eax with which you specify the instruction's operand. Let's begin by clarifying these three terms.
First, data are numerical values that are used in computations. For example, if the CPU currently has the value 3 in the %eax register and the value 4 in the %ebx register and there is an instruction addl %eax, %ebx, then there are two data values involved: 3 and 4. The operands, however, are the symbols %eax and %ebx. Now if the instruction were addl $4, %eax, then the data would be the same, but the operands would be the symbol %eax and the number 4. So now that data and operand are understood, what is the addressing mode? The addressing mode describes the relationship between the operands and the data (i.e., how do we use the operands to get the correct data?). These examples include instances of the two simplest addressing modes, register addressing and immediate addressing. Register addressing means the data is in a register, and immediate addressing means the data is supplied as part of the instruction.
What about when we access a variable, as in addl %eax, VAR where VAR is a symbol that represents a memory location? This is called direct addressing, which means the memory address of the data is supplied with the instruction. How is the address supplied? When the assembler translates the program into executable machine code, it assigns an address to the symbol VAR, and it is that address that is included with the instruction in the program memory. Thus, the operand is the memory address represented by the symbol VAR, and the data is the contents of memory at that address.
Now let's look at a different kind of variable. Consider the following fragment of C code:
int a[10]; /* an array of 10 integers */
int i;
main()
{
a[0] = 10;
a[4] = 20;
a[i] = 30;
}
So far we have not used arrays. First of all, how do we declare an array
in assembly language? We use the same mechanism as a regular
variable but specify more than one value in the initialization part of
the declaration as follows:
.comm a, 40
This declares ten long words of memory and initializes them all to zero.
a is a symbol that is equal to the address of the
first word. Don't forget that! If we are going to set all of the
elements of the array to the same initial value, we can also use the
FILL construct as follows:
.fill 10, 4, 0
If we wanted to set the
first three elements to the values 1,2,3 and the rest to zero, we
could write:
a: .int 1, 2, 3
.fill 7, 4, 0
Now that we can declare the array, how do we access its elements? In the
first example a[0] = 10;, the access is easy since this refers to the
first element of the array. We can access it using simple direct addressing:
movl $10, a
Remember that the symbol a is the address of the first word of the
array, so the movl $10, a simply moves a ten into the first long word of the array.
The second example a[4] = 20; is a little more tricky, since we don't have a symbol defined for the 5th element. We can, however, specify an offset from the start of the array as follows:
movl $20, a+16
Recall that each element of a is four bytes, so the element
a[4] begins at address a+16. What addressing mode is this? This
is still direct addressing mode. The assembler adds the value
16 to the address a before it includes it as part of the
instruction. After assembly, there is no indication that the address
supplied was originally a label or a computed value. In fact, if you
wanted to read from some fixed memory address (which does come up from
time to time), you can forget the label altogether and write
movl 4096, %eax
The labels just make it easier to declare and refer to variables by
letting the assembler worry about the actual addresses of them. For
example, if array a began at memory location 142H, the
assembler would translate the line mov $10, a to a line like
movl $10, 142H, and for movl $10, a+8, it would produce movl
$10, 14aH. It would be hard for us to keep track of these addresses
of our variables, so we use the symbols and offsets from those symbols
instead-the assembler does the hard part.
Finally we come to the last example: a[i] = 30; This example is different from the rest. It is not possible to know the address of the data when the program is assembled (because i can change during execution), so the address cannot be specified in the instruction. In order to do this type of addressing, the 8086 gives us several more addressing modes. The best mode for this example is indexed (sometimes called direct indexed ) addressing. In indexed addressing the 8086 uses the contents of a register along with a constant to compute the memory address of the data. The constant is called the displacement, which can be a number or a label. This displacement is typically the starting address of the array, and the register then acts like an index into the array. To do indexed addressing, you use two special index registers: %esi and %edi. %esi stands for ``source index'' and %edi stands for ``destination index,'' but you may use the two interchangeably except in certain instructions which will be covered in a later lab. Thus, to implement the example C fragment, we would write:
.comm a, 40
.comm i, 4
movl $10, a ; a[0] = 10;
movl $20, a+16 ; a[4] = 20;
movl i, %edi
movl $30, a(,%edi,4) ; a[i] = 30;
Here, the last instruction uses indexed addressing. The address of
the memory location accessed by that instruction is found by taking the
contents if %edi and adding it to the address represented by the symbol
a.
There is another situation in which the address of the data is not known at assembly time, and that is when a program uses a pointer. In this situation, the address of the data is specified entirely in a register. For this case, the 8086 provides register indirect addressing. In register indirect addressing the %ebx register holds the address of the data to be addressed as in the following example:
int *p;
main()
{
*p = 40;
}
which can be encoded in assembly language as:
.comm p, 4
movl p, %ebx
movl $40, (%ebx)
Note that you may also use %esi and %edi in register indirect
addressing.
One might think that using a variable as a pointer would look like movl $40, (p), but in 8086 assembly this is not the case. Pointers can be stored in any register except %esp. If we have a pointer stored in memory, we must move it into one of the registers it before we can use it as a pointer.
Finally, there is the case where we have a pointer to an array, as is often the case in C programs. Here we use base indexed addressing which allows us to specify two registers, %ebx which holds the base address of the array, and %esi or %edi, which holds the index. If the array is an array of records, a constant offset may also be specified. Here is an example:
int *ap;
struct {int a,b;} *asp;
int i;
main()
{
ap[3] = 50;
ap[i] = 60;
asp[i].b = 70;
}
which is coded in assembly language as:
.comm ap, 4
.comm asp, 4
.comm i, 4
. . .
movl ap, %ebx
movl $50, 12(%ebx) # ap[3] = 50;
movl i, %edi
movl $60, (%ebx, %edi, 4) # ap[i] = 60;
movl asp, %ebx # i is still in di
movl $70, 4(%ebx, %edi, 4) # asp[i].b = 70;
One final note about these addressing modes: Using the 8086, one can
implement most of these expressions using %esi, %edi, and %ebx
interchangeably. In fact, the only place this is not true is when
specifying two registers (base indexed mode): one of the registers
must be %ebx. There is, however, a distinction between
a base register, which holds a pointer, and an index
register which is used to compute an offset from the base.
So, to recap addressing modes:
movl $4, %eax
movl %eax, %ebx
movl %eax, mem_variable or movl %eax, mem_variable+16
movl $30, a(,%edi,4)
movl $40, (%ebx)
movl $60, (%ebx, %edi, 4)
/* begin assignment specification code lab4asm.s */
AtoI()
{
sign = 1;
/* skip over leading spaces and tabs */
while (*ascii == ' ' || *ascii == '\t')
ascii++;
/* check for a plus or minus sign */
if (*ascii == '+')
ascii++;
else if (*ascii == '-')
{
sign = -1;
ascii++;
}
/* convert characters until a non-digit is found */
*intptr = 0;
for(i = 0; ascii[i] >= '0' && ascii[i] <= '9'; i++);
i--; /* back up one to the last digit */
multiplier = 1;
for(; i>=0 ; i--)
{
*intptr += multiplier * (ascii[i] - '0');
multiplier *= 10;
}
/* multiply in the sign */
*intptr = *intptr * sign;
}
/* end assignment specification code lab4asm.s */
----- BEGIN C CODE -----
/* begin driver code lab4drv.c */
#include <stdio.h>
char input[10];
int output
char *ascii;
int *intptr;
int sign, multiplier, i;
int main(int argc, char **argv)
{
printf("Enter a number -> ");
fgets(input, 10, stdin);
ascii = input;
intptr = &output;
AtoI();
printf("\nAscii is -> %sInteger is -> %d\n", input, output);
return 0;
}
/* end driver code */
----- END C CODE -----
----- BEGIN ASSEMBLY CODE STUB ----- .globl AtoI .type AtoI,@function AtoI: /* prolog */ pushl %ebp movl %esp, %ebp pushl %ebx pushl %esi pushl %edi /* put code here */ return: /* epilog */ popl %edi popl %esi popl %ebx movl %ebp, %esp popl %ebp ret ----- END ASSEMBLY CODE -----
.globl AtoI
.type AtoI,@function
AtoI:
/* prolog */
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %esi
pushl %edi
/* put code here */
movl $1, sign
## while(*ascii == ' ' || *ascii == '\t') ascii++; #####################
CheckWhiteSpace:
movl ascii, %eax # put the char array pointer in %eax
cmpb $32, (%eax) # (deref.) comp this element to a space
jne CheckTab # if not equal, check if it's a tab
addl $1, ascii # if equal, move ahead one space
jmp CheckWhiteSpace # reloop and check next space
CheckTab:
movl ascii, %eax # put the char array pointer in %eax
cmpb $9, (%eax) # (deref.) comp this element to a tab
jne CheckPlus # if not equal, goto the next checks
addl $1, ascii # if equal, move ahead one space
jmp CheckWhiteSpace # reloop and check next space
#######################################################################
## if(*ascii == '+') ascii++; #########################################
CheckPlus:
movl ascii, %eax
cmpb $43, (%eax)
jne CheckMinus
addl $1, ascii
#######################################################################
## else if(*ascii == '-') { sign = -1; ascii++; } #####################
CheckMinus:
movl ascii, %eax
cmpb $45, (%eax)
jne ChecksDone
movl $-1, sign
addl $1, ascii
#######################################################################
ChecksDone:
movl intptr, %eax
movl $0, (%eax)
movl $0, i
LoopSearch:
movl $0, %eax
movl ascii, %ebx
movl i, %esi
movb (%ebx,%esi,1),%al
cmp $48,%eax
jl LoopSearchDone
cmp $57,%eax
jg LoopSearchDone
addl $1, i
jmp LoopSearch
LoopSearchDone:
addl $-1, i
movl $1, multiplier
LoopCalculate:
cmp $0, i
jl FinishUp
movl $0, %ecx
movl ascii, %ebx
movl i, %esi
movl $0, %eax
movb (%ebx, %esi, 1), %al
subl $48, %eax
mull multiplier
movl intptr, %ebx
addl %eax, (%ebx)
movl multiplier, %eax
movl $10, %edx
mull %edx
movl %eax, multiplier
addl $-1, i
jmp LoopCalculate
FinishUp:
movl intptr, %ebx
movl (%ebx), %eax
mull sign
movl %eax, (%ebx)
return:
/* epilog */
popl %edi
popl %esi
popl %ebx
movl %ebp, %esp
popl %ebp
ret