In this lab we will begin to explore the details of assembly language by looking at simple expression evaluation. We will provide you with a C program that calls assembly language routines that you will write. You need not worry about the interface between the C and assembler-we have taken care of that. In each case, the code you will provide will be one or more assignment statements. There will be no branching, no arrays, no structures, and no pointers. We will get to those soon enough.
Computer programs are composed of two basic elements: memory locations for storing data (variables) and instructions (statements) for manipulating the data. Assembly language programs have these same features plus one more: registers. Like memory variables, you may store values into registers and use them in computations. The microprocessor and the memory are two different entities within the computer. Registers are fast, temporary variables located within the microprocessor, whereas memory variables (or simply ``variables'') exist in the computer's memory which is more complicated to access. It is best to use registers as much as possible, but there are a limited number of them. We can't use only memory variables either, because the 8086 places limitations on us: You can use only one memory variable in a computation.
There is also a difference in the instructions provided in assembly language: You may only perform one computation per statement. For example, in C you can say:
int a,b,c,d,e;
a = ((b + c) - (d + e)) - 10;
This expression performs four computations in one statement using four
variables and a constant. In assembly language you cannot perform such
a complex statement. The exact restrictions vary from computer to
computer (unlike high-level languages), but we will always use the
Intel 8086 as our model. In Intel 8086 assembly language each
instruction may perform only one computation at a time and may
reference only one memory variable per computation. The other
required data must be in a register. The general purpose
registers provided in the 80386 are A, B, C, and D. Thus,
the previous example would look like this:
.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx ; add e to register bx
subl %ebx,%eax ; subtract register bx from register ax
subl $10,%eax ; subtract 10 from register ax
movl %eax,a ; move register ax out to a
Let's break this down piece by piece. First of all, in order to declare a
variable, you use a statement that will ``define storage'' and assign a name
or symbol to that storage.
Actually, this isn't an instruction at all but an``assembler directive.''
These are commands to the assembler program (as opposed to the computer running
the program) to perform some action - in this case reserve memory for a variable.
There are several of these directives that reserve memory,
depending on what size storage you want to define
(just like the char, short, int, and long in C).
There are directives for uninitialized
variables, and directives to initialize variables. The most common directive, used here
is .comm which creates a symbol with the name given as the first argument and
reserves the number of bytes listed as the second argument. Note that there is no
type information associated with the memory or the symbol. Alternatively we could have
chosen to initialize them to some value. In C we could have said:
int a; /* uninitialized */
int b = 10; /* decimal */
int c = 0x20; /* hexadecimal */
int d = 'a'; /* ascii */
int e = 040; /* octal */
int f = 024; /* C does not have a binary representation */
/* this is octal */
which, in assembly language would have been:
.comm a, 4
b: .int 10
c: .int 0x20
d: .int 'a'
e: .int 040
f: .int 0b000010100
Note the syntax for expressing values in different number bases including
the binary syntax, which does not exist in C. The symbol created is defined
by the label to the left of the colon on each line. The value it is initialized
to is listed on the right of the directive. Other directives include .byte,
.hword, .word, .quad, .octa to initialize 1, 2, 4, 8, or 16 byte integers, and
.float, .single, .double to initialize 4, 4, and 8 byte floating point
numbers.
Now let us consider the evaluation of the assignment statement. You should notice that assembly language does not use the standard mathematical symbols for addition and multiplication and so on like high level languages do. Instead, each operation has its own instruction addl, mull, subl, etc. One thing to notice is that you have to tell the assembler what type the data you are operating on is. Data can be 1, 2 or 4 bytes. This is specified with an opcode ``suffix'' of either b, w, or l. All of the instruction so far in these examples have used 4 byte ``long-words'', and so all of the opcodes have had an l suffix as in addl, mull, subl. There are also coresponding instructions for the other sizes such as addw, mulb, subw. There is also a new operation not seen in C, the movl (move) instruction. You will find that most assembly language programs have alot of movl, movw and movb instructions. As you may have guessed, the move instruction simply moves data from one place to another, and thus the instruction:
movl src, dst
is equivalent to the simple assignment statement:
dst = src;
the main limitation, however, is that at least one of dst and src
must be a register. In other words, there can be no more
than one memory variable in a movl instruction, but they both can
be registers, if you want. The order of the operands determines which
one is the source and which is the destination. The source comes
first, and the destination comes second. By placing a register second,
we can load a memory variable into that register. By placing the memory variable
second, we can store a register into that memory variable. We can, of course, move
a register into a register as well, but
we cannot move a memory variable into another memory variable.
The next thing to notice is that the arithmetic instructions only have two operands. We cannot in one instruction say:
dst = src1 + src2;
even if all of dst, src1, and src2 are registers. In assembly
language we always ``add to'' a value as in:
dst = dst + src;
or
dst += src;
which in assembly language is just:
addl src, dst
but remember, either dst or src or both must be a register. Thus,
if we want to add one variable to another, we must first move one into a
register and then perform the addition:
int a,b;
a += b;
is
.comm a, 4
.comm b, 4
movl b, %eax
addl %eax, a
and
int a,b,c;
a = b + c;
is
.comm a, 4
.comm b, 4
.comm c, 4
movl b,%eax
addl c,%eax
movl %eax,a
Finally, just as in C you can specify a constant to use in a computation, you
can specify a constant in assembly language:
int a;
a += 2;
is
.comm a, 4
addl $2, a
Now, once you understand these ideas, you should be able to go back to the
first example and see if you can understand how it implements the original
C expression (do it!).
Now it is time to expand our horizons a little more. The first thing to consider is that the 80386 can operate on several different sizes of data. The primary data sizes are 32, 16, and 8 bits. In support of this, the registers can be referenced as 32-bit registers, 16-bit registers or 8-bit registers. To do this, each of the four registers we have seen (%eax, %ebx, %ecx, %edx) can alternatively be referenced as four 16-bit registers (%ax, %bx, %cx, %dx). These four registers represent the least-significant 16 bits of their corresponding 32-bit ``extended'' registers. In addition, each of these registers can be thought of as a pair of 8-bit registers (%ah, %al, %bh, %bl, %ch, %cl, %dh, %dl) These are NOT different registers, they simply refer to part of the 16-bit registers (which are, in turn part of the 32-bit registers). Thus, if you store a value in %ax, and then a value in %ah, the value in %ax will no longer be the orginial value you put there. Specifically, %ah refers to the ``high order byte'' of %ax and AL refers to the ``low order byte'' of %ax. This means %ah is the 8 most significant bits of %ax and %al is the 8 least significant bits of %ax. When you refer to a register in an instruction, the size of the register must match the size of the opcode. Thus the instruction addb 2,%al is an 8-bit instruction, whereas addw 2,%ax is a 16-bit instruction.
Multiplication and division are different, however, since they can operate on more than one data size at a time. These instructions are somewhat more complex than the other operations. First, there are two versions: multiplication for integers imull and multiplication of unsigned numbers mull. The simplest form of these two instructions has a single operand (which can be memory or register). The value of this operand is multiplied times the ``a'' register (%al, %ax, or %eax) and places the result in the ``a'' register, and potentially the ``d'' register. Yes, that's right, we have no control in the ``mul'' statement over the destination register and also no control over one of the multiplication values. So, this means that if you want to multiply the contents of register B and register C you cannot do mull %ebx, %ecx. You have to move the contents of one of them to register A like movl %ebx, %eax and then multiply by register C like mull %ecx.
When we multiply two 8-bit numbers, we get a 16-bit result. When we multiply two 16-bit numbers we get a 32-bit result, and when we multiple two 32-bit numbers, we get a 64-bit result. Thus, in each case our result will take more space than the operands. If this doesn't make sense to you, try multiplying the maximum integer we can represent in 8 bits times itself (the highest multiplication we would do on two 8 bit values). How many bits does the result require?
In the 8-bit case the result is in %ah:%al (otherwise known as %ax). For 16 bits, the result is in %dx:%ax (which is weird, but it was done this way to be compatible with pre-32-bit code). Finally, in the 32-bit case, the result is in %edx:%eax.
So, this means that if you do the above example where you multiply a 32 bit C register by a 32 bit B register (by first moving B into A) your result will be 64 bits and occupy BOTH the A register and the D register, %edx:%eax. So, make sure you do not lose data you had wanted in %edx.
The imull instruction does have two other forms that take 2 and 3 operands. In each of these, the last operand must be a register. In this case you can multiply without using the ``a'' register. Also in this case, the result is truncated to the same number of bits as the operands, so additional registers are not needed. Here are three formulas you can use:
|
|
|
The text on the left of the odd symbol above is an example of the code you type in assembly. On the right if the symbol is a description of what the computer then interprets the operation as. Notice that for the 32bit multiply the result is placed into %edx:%eax like as was explained above.
You supply X which will be either a 32-, 16- or 8-bit register or a 32-, 16- or 8-bit memory variable.
The format of the divide instruction is similar: divl <divisor>. Like the mull instruction, divl assumes the dividend and the destinations of the quotient and remainder based on the opcode suffix. Here are the three formulas:
|
|
|
Please note that when you divide by a 32- or 16-bit number, make sure that %dx or %edx contains the correct value (usually zero). This is so very important as by the end of the semester many of your bugs will take you a while to find and this is 25% of the time the bug. If you do a 32 bit divide, notice above that it infers you want to use %edx:%eax. What if your D register has data in it? Then the division will be altered because it will use that data. So, before doing a 32bit divide you MUST zero out the D register - unless of course you want it that large.
/* begin assignment specification code lab2asm.s */
int digit1;
int digit2;
int digit3;
int diff;
int sum;
int product;
int remainder;
dodiff() {
diff = (digit1 * digit1) + (digit2 * digit2) - (digit3 * digit3);
}
dosumprod() {
sum = digit1 + digit2 + digit3;
product = digit1 * digit2 * digit3;
}
doremainder() {
remainder = product % sum;
}
/* end assignment specification code lab2asm.s */
/* ----- BEGIN C CODE ----- */
/* begin driver code lab2drv.c */
extern int digit1;
extern int digit2;
extern int digit3;
extern int diff;
extern int sum;
extern int product;
extern int remainder;
int main(int argc, char **argv)
{
for (digit1 = 0; digit1 < 10; digit1++) {
for (digit2 = digit1; digit2 < 10; digit2++) {
for (digit3 = digit2; digit3 < 10; digit3++) {
dodiff();
if (diff == 0)
printf("%d%d%d PT\n",digit1,digit2,digit3);
dosumprod();
if (sum && product) {
doremainder();
if (remainder == 0) printf("%d%d%d ED\n",digit1,digit2,digit3);
}
}
}
}
return 0;
}
/* end driver code */
/* ----- END C CODE ----- */
/* ----- BEGIN ASSEMBLY CODE STUB ----- */ .globl dodiff .type dodiff, @function dodiff: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret .globl dosumprod .type dosumprod, @function dosumprod: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret .globl doremainder .type doremainder, @function doremainder: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret /* declare variables here */ /* ----- END ASSEMBLY CODE ----- */
000 PT 011 PT 022 PT 033 PT 044 PT 055 PT 066 PT 077 PT 088 PT 099 PT 123 ED 138 ED 145 ED 159 ED 167 ED 189 ED 224 ED 235 ED 246 ED 257 ED 268 ED 279 ED 333 ED 345 PT 345 ED 347 ED 357 ED 369 ED 448 ED 456 ED 459 ED 466 ED 578 ED 579 ED 666 ED 678 ED 789 ED 999 ED
/* ----- BEGIN ASSEMBLY CODE SOLUTION ----- */ .globl dodiff .type dodiff, @function dodiff: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ movl $0, %edx movl digit1, %eax mull %eax movl %eax, %ebx movl digit2, %eax mull %eax movl %eax, %ecx movl digit3, %eax mull %eax addl %ebx, %ecx subl %eax, %ecx movl %ecx, diff /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret .globl dosumprod .type dosumprod, @function dosumprod: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ movl digit1, %eax addl digit2, %eax addl digit3, %eax movl %eax, sum movl $0, %edx movl digit1, %eax mull digit2 mull digit3 movl %eax, product /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret .globl doremainder .type doremainder, @function doremainder: /* prolog */ pushl %ebp pushl %ebx movl %esp, %ebp /* put code here */ movl $0, %edx movl product, %eax divl sum movl %edx, remainder /* epilog */ movl %ebp, %esp popl %ebx popl %ebp ret /* declare variables here */ .comm digit1, 4 .comm digit2, 4 .comm digit3, 4 .comm diff, 4 .comm sum, 4 .comm product, 4 .comm remainder, 4 /* ----- END ASSEMBLY CODE ----- */