Differences between revisions 22 and 31 (spanning 9 versions)

Cache

Direct Mapped Cache

Cache's are directed mapped if each memory location is mapped to exactly one location in the cache. An example would be:

     location = (Block Address) MOD (Number of cache blocks in the cache)

In this way we can map a memory location to a cache location.

Example: Suppose we have a cache with 8 slots 2^3. Then a word at 45 would be found at slot 5 in the cache.

https://www.scotnpatti.com/images/directmappedcache.jpg

Tags contain the address information required to identify if the information in a cache location coresponds to the data needed. These tags contain the upper bits of the memory address.

Byte offset contains the number of bits required to specify bytes in a block. Since this example uses 4 byte ( or word) sized blocks the byte offset is 2 bits

Valid bit (or validity bit): When the computer is first started the tags contain junk. Since we don't want to recognize this junk, we start with all the valid bits set to 0 for invalid. Even as time goes on we may still have invalid information and so we have to check this bit.

https://www.scotnpatti.com/images/directmappedcache2.jpg

To determine the size of the cache above:

cache size = (number of locations=1024 = 2^10) X 
             ( Block size=32 + Validity bit=1 + (32 - TagSize=10 - Block bits=2)) 
           = 2^10 X (32 + 1 + 32 - 10 - 2) 
           = 1024 X (53)
           = 54,272 bits or 6,784 bytes

How many bits are required for a direct-mapped cache with 64 KB of data and one-word blocks, assuming a 32-bit address.

64/4 = 16 KWords = 2^16
Cache Size = 2^16 X (32 + (32-16-2) + 1) 
           = 2^16 X 49 = 802816 bits = 100,352 bytes = 98 KB.

Cache Misses

A data cache miss requires us to "freeze" the processor on the instruction that caused the miss. We then retrieve the data from memory, place it in cache and restart the instruction - this time we are guaranteed a hit.

An instruction cache miss requires us to do the following

Send the original PC value (Current PC - 4) to memory.
Instruct main memory to performa read and wait for the memory to complete it's access.
Write the cache entry, putting the data from maemory in the data portion of the entry, writing the upper bitsof the address (from the ALU) into the tag field, and turning the valid bit on.
Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache.

Points to Ponder

A 32-bit processor has 32-bit registers --> The smallest unit of data to be loaded anywhere is 32-bits = 4 bytes = 1 word. A word becomes the smallest unit of the memory hierarchy.
Block size is a conversion factor:

       Bytes
     C -----
       Block

Locality

The schemes of using one word blocks does help with temporal locality but it does not help with spatial locality. In order to take advantage of spatial locality we must use multi-word blocks. This will also increase the effeciency of the cache because more bits will be used for data instead of overhead (tag, valid bit)

Direct Mapped cache with multi-word blocks

Since we are now talking about multi-word blocks, we must have a way to identify not just the word, but the block!

                  |  Byte Address   |
  Block Address = | --------------- | or (Byte Address) DIV (Bytes Per Block)
                  | Bytes Per Block |
                  --               --

So here we have four different divisions of the address:

Tag is now the Block Address, we still use it to check and make sure that the block in the cache is the block that we want.

Index identifies the block within the cache, that is Block Address MOD Number of Cache Blocks (remember Blocks are multi-word now).

Block offset Identifies the word within the block to the multiplexor. This the size in bits = n where 2^n = words per block.

Byte offset Will probably always be 2 since we are dealing with 32-bit machines with 32-bit words.

https://www.scotnpatti.com/images/directmappedcache3.jpg

-  ⇤ ← Revision 22 as of 2004-03-10 00:30:09 → 
  Size: 2570
  Editor: yakko
  Comment:
+   ← Revision 31 as of 2004-03-10 01:16:25 → ⇥
  Size: 4275
  Editor: yakko
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 28:
-             ( Block size = 32 + Validity bit = 1 + (32 - TagSize=10 - Block bits = 2))
+             ( Block size=32 + Validity bit=1 + (32 - TagSize=10 - Block bits=2))
 Line 37:
-Cache Size = 2^16 X (32 + (32 - 16 - 2) + 1) = 2^16 X 49 = 802816 bits = 100,352 bytes = 98 KB.
+Cache Size = 2^16 X (32 + (32-16-2) + 1)             = 2^16 X 49 = 802816 bits = 100,352 bytes = 98 KB.
-Line 49:
+Line 50:
-. Restart the instruction execution at the first step, which will refetch teh instruction, this time finding it in the cache.
+. Restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache.

=== Points to Ponder ===

   1. A 32-bit processor has 32-bit registers --> The smallest unit of data to be loaded anywhere is 32-bits = 4 bytes = 1 ''word''. A ''word'' becomes the smallest unit of the memory hierarchy.
   1. Block size is a '''conversion factor''':

{{{
       Bytes
     C -----
       Block
}}}

== Locality ==

The schemes of using one word blocks does help with '''temporal locality''' but it does not help with '''spatial locality'''. In order to take advantage of spatial locality we must use multi-word blocks. This will also increase the effeciency of the cache because more bits will be used for data instead of overhead (tag, valid bit)

== Direct Mapped cache with multi-word blocks ==

Since we are now talking about multi-word blocks, we must have a way to identify not just the word, but the block!

{{{
                  |  Byte Address   |
  Block Address = | --------------- | or (Byte Address) DIV (Bytes Per Block)
                  | Bytes Per Block |
                  --               --
}}}

So here we have four different divisions of the address:

'''Tag''' is now the ''Block Address'', we still use it to check and make sure that the block in the cache is the block that we want.

'''Index''' identifies the block within the cache, that is Block Address MOD Number of Cache Blocks (remember Blocks are multi-word now).

'''Block offset''' Identifies the word within the block to the multiplexor. This the size in bits = n where 2^n = words per block.

'''Byte offset''' Will probably always be 2 since we are dealing with 32-bit machines with 32-bit words.


https://www.scotnpatti.com/images/directmappedcache3.jpg