0.7em

Logic Design for a Direct-Mapped Cache

This project, and it's companion (coming soon) will replace labs 8 and 9 from your lab manual. No wiring will be required - the circuits in this project are too complicated to fit on your breadboard. This will be a simulation only project. You will be required to demonstrate your functional simulation to your TA using the computers in the laboratory. You are expected to complete a normal prelab report, including your schematics (top-level schematics, plus macro schematics), and an English description of your goals and outline of how you did your design.

1.  Background

1.1  Memory

A computer's memory is organized as a linear collection of storage locations. Each location has an address. Typical computer memories are byte-addressable meaning that each byte in the memory has it's own unique address (even though you usually read the memory a word at a time, and a word is typically 32 bits).

The size of the address limits the amount of memory the computer can see. Most computers use at least a 32 bit address now, meaning they can address 232 bytes, or 4 gigabytes of memory. Naturally, the first byte is at address 0000000016 and the last one is at ( ffffffff16 ). One of the most fundamental things a computer has to do is get things out of the memory. Memory stores both the program the computer runs and the data it might need. In order to operate, the processor generates a stream of addresses, first to get instructions out of the memory and then to fetch the data that those instructions need (the operands).

1.2  Memory hierarchy

Unfortunately, memory is relatively slow, and relatively expensive, when compared to processors. As a result, a processor can spend a lot of its time doing nothing while it waits for memory to complete. In fact, bandwidth (the amount of data you can get in and out in a given time) to memory is one of the primary limiting factors in computer performance (the Von Neumann bottleneck). To deal with this problem, and the cost problem, a memory hierarchy has been created. The basic concept is that data you need most often will be kept in small, fast, expensive memories, and other data will be kept in progressively larger, slower, and cheaper memories. The fastest memory is one that is made out of registers and is built onto the processor chip itself - the so-called on-chip or Level 1 cache. The second level of the hierarchy is typically an off-chip, or L2 cache, made up of Static RAM. The third level is main memory (Dynamic RAM), and the fourth level is secondary storage (disk).

When the computer needs to read a byte of memory, it generates an address. The next step is to figure out where the data associated with that address is currently residing - is it in one of the caches, main memory, or on disk? The first thing to check is the L1 cache. If the location you want is there, this is known as a

cache hit, and you can read the value immediately. If the location is not there, you have a cache miss, and you must continue to search through the memory hierarchy, and usually bring that memory location (and the ones around it) into the L1 cache.

1.3  Caches

The purpose of this project is to build the logic that, given an address, determines if the corresponding memory location is or isn't in the cache. We will be looking at a scheme known as direct-mapped caching. The output of your circuit will be a signal indicating whether we have a cache hit or miss. Before we can get started, we need to examine how a direct-mapped cache works. A cache is divided into a number of lines. A line is a contiguous block of bytes in the memory that are stored at a single cache address. We will be designing circuits for a fictional computer which has 16 bit addresses and a cache made up of 16 lines of 256 bytes each. This means our computer will be able to address a total of 216 or 64Kbytes of RAM, and at any time a maximum 16*256 = 4096 or 4Kbytes of this memory will be in the cache. To use our direct-mapped cache we will, break the address into 3 fields. Since the 256 bytes that make up a line are contiguous, that means the last 8 bits ( 28 =256) of any address are the offset into the cache line. Since there are 16 lines in the cache, 4 bits of the address are used to determine which line in the cache the address will be in. The last 4 bits of the address are the cache tag. The decomposition of the address into fields is shown in the figure below:


So, if you received an address, of 1111001000110010 (or F232 in hex), that would mean for that address the tag is F, the line address is 2, and the offset is 32 hex.

This type of cache is called 'direct-mapped' because there is only one place in the cache where each address in memory can reside. However, since the cache is only 4k, and the whole memory is 64k, only 16 of the possible 256 lines can be in the cache at any one time. The mapping of 256 bytes of memory to the cache lines is shown below. Only memory addresses whose cache line address field matches can be in a particular

cache line. Therefore, if you are looking at cache line address 1, only memory addresses who have the bits 0001 in bits <11..8> can go into that cache line. If we give the memory addresses in hex (Since 4 hex digits represent 16 bits), that means 256 byte blocks whose addresses start at 0100, 1100, 2100, 3100...F100 can go into cache line 1. Blocks that fit into a particular cache line are shown across a row in the figure below.


Note that for every address that goes into a single cache line, the line address is always the same, and there is exactly one block possible for each of the 16 tags. Therefore, to see if you have a cache hit, you can ignore the offset, and simply compare the tag to the tag currently stored in the appropriate line of the cache. To make this easier, associated with the cache there is a tag memory, a small memory that stores the tag of each block currently stored in the cache line. There is also one additional bit with each tag entry, a 'valid' bit which indicates if any valid block is stored at that cache line (at power on, for instance, none of the lines are filled yet - there are several other cases when cache lines are invalid that we won't go into). For the cache described above, a 16x5 memory would be needed to store the 16 4-bit tags + valid bits.

2.  The Simulation

You are to design and simulate the circuit to determine if you get a hit or a miss for the cache above. You will have a 16x5 tag memory, which you will implement using a ROM (or RAM, they're exactly the same in

Digital Works). You will initialize the tag memory according to the table below. The most significant bit in each word will be the valid bit, and the other 4 bits will be the tag. You will receive as input a 16-bit address. You

will use the cache line address bits to retrieve the appropriate tag from your memory. Then you will compare the tag you got from the memory to the tag in your address using a 4-bit comparator (you are expected to design a 4-bit comparator macro - see below). You will have 1 output, ``hit'', which will be a 1 if the tags are equal and the valid bit is 1 - otherwise ``hit'' will be zero. Hook up 8 switches to the inputs of your simulation (corresponding to address bits 8-15) so your instructor can enter addresses to test the design. Wire the hit signal to an LED.

Addressvalid/tag
000010001
000110001
001011111
001111110
010000001
010100101
011001010
011101010
100010000
100110000
101010000
101110111
110000000
110101111
111010010
111110011

3.  A Note about Comparators

Recall that the XNOR function is also called equivalence ( (xÅy)' = xy+x'y' ). Also recall that two would be equivalent if the first bits were equivalent and the second bits were equivalent and the third bits were equivalent, and so on. I think you can take it from there.


File translated from TEX by TTH, version 0.9.