Cache Memories

# What Is Cache Memory? **Cache memory is ... than regular system memory.** Cache memory is *faster* than regular system memory. > [!example] Examples of different SRAM cache memory configurations > * Embedded in the processor core (similar to registers). > * On the same silicon die as the processor, but outside the core. > * On a Printed Circuit Assembly (PCA). **How fast is SRAM compared to DRAM?** Compared to DRAM, SRAM is *faster.* > **Why is DRAM slower than SRAM?** > DRAM is slower than SRAM *because it has to be refreshed constantly, which causes wait states.* # Cache Levels **What are the three cache levels and their typical capacities?** The three cache levels and their typical capacities are: 1. L1 cache - Between 256 kilobytes and 1 megabyte. 2. L2 cache - Up to 8 megabytes. 3. L3 cache - Between 4 megabytes and 50 megabytes. > **How does the access speed change as the cache level increases?** > As the cache level increases, the access speed *decreases.* # Cache Function **How are instructions and data transferred between DRAM and the caches?** Instructions and data are transferred between DRAM and the caches like so: ```mermaid graph LR A(DRAM) --> B(L3 cache) --> C(L2 cache) --> D(L1 cache) ``` **What is it called when the processor finds or doesn't find instructions or data in the cache?** When the processor finds or doesn't find instructions or data in the cache, it is called *a hit or a miss.* **Why are instructions and data fetched from DRAM and stored in the three levels of cache?** Instructions and data are fetched from DRAM and stored in the three levels of cache *to speed up execution.* **What happens if the instructions or data are found in the L2 cache and not the L1 cache?** If the instructions or data are found in the L2 cache and not the L1 cache, *it updates L1 cache with the instructions or data.* **What can DRAM be used to cache and how?** DRAM can be used to cache *data from the hard drive using swap files.* # Why Caches Help **How fast were early processors compared to the speed of RAM?** Compared to the speed of RAM, early processors were *not much faster.* > **What did faster processors become limited by?** > Faster processors became limited by *the speed of memory.* > > **How much faster is cache memory compared to DRAM?** > Compared to DRAM, cache memory is *10 to 100 times faster.* > > **Why are caches so much faster than DRAM?** > Caches are so much faster than DRAM *because they're physically closer to the processor core.* **What are the two benefits of DRAM?** The two benefits of DRAM are: 1. It's less expensive. 2. It can be manufactured more densely for higher storage capacity. **What is the storage capacity for SRAM and DRAM measured in?** The storage capacity for SRAM and DRAM is measured in: * SRAM - Kilobytes or megabytes. * DRAM - Gigabytes. # L1 Cache **How much storage does L1 cache typically have in x86 processors?** In x86 processors, L1 cache typically has *64 kilobytes per core.* **What is L1 cache divided into?** L1 cache is divided into: * L1-I - Instructions. * L1-D - Data. **How fast does L1 cache operate?** L1 cache operates *as fast or faster than the maximum clock speed of the CPU.* **What is the diagram of the sequence of transfers between DRAM and the CPU?** The diagram of the sequence of transfers between DRAM and the CPU is: ![[Diagram Of The Transfers Between DRAM And The CPU.png]] # L2 Cache **How much storage does the L2 cache typically have?** The L2 cache typically has *6 to 12 megabytes.* > **How much storage does the L2 cache typically have in higher-end processors?** > In higher-end processors, the L2 cache typically has *32 megabytes of L2 cache.* **What is the diagram of the placement of L2 cache in a processor?** The diagram for the placement of L2 cache in a processor is: ![[Diagram Of The Placement Of L2 Cache In A Processor.png]] # L3 Cache **What does the L3 cache act as in multi-core processors?** In multi-core processors, the L3 cache acts as *a shared memory bank for all cores.* **What do misses in the L3 cache result in?** Misses in the L3 cache result in *transfers from slower DRAM.* **How was early L3 cache implemented?** Early L3 cache was implemented *as a separate chip on the motherboard.* > **How is modern L3 cache implemented and why?** > Modern L3 cache is implemented *inside the CPU for better performance.* # Cache Functional Illustration ... # Memory And Storage Pyramid **What is the diagram for the relationship between memory size and speed?** The diagram for the relationship between memory size and speed is: ![[Memory And Storage Pyramid Diagram.png]] # L1 To L3 Cache Summary ... # Memory Size/Speed Comparison **How does the latency and capacity of memory types compare with one another?** The latency and capacity of memory types compare with one another like so: | Memory Type | Latency (Cycles) | Capacity | | ------------------ | ---------------- | --------------- | | Registers | 0 | 8-256 registers | | L1 / L2 / L3 cache | 1 to 40 | 32 KB to 32 MB | | RAM | 50 to 100 | GB | **How does the latency and capacity of storage device types compare with one another?** The latency and capacity of memory types compare with one another like so: | Storage Device Type | Latency | Capacity | | ---------------------------------------- | -------------------- | -------------- | | Flash memory (nonvolatile) | 0.1 ms (~300 cycles) | 128 GB to 1 TB | | Spinning magnetic platters, moving heads | 10 ms (~30M cycles) | 1 TB to 10 TB | # Cache RAM Latency ... # Disk Storage Latency ... # Software Locality For Caches **What access speed does the CPU need to run instructions at at full speed?** To run instructions at full speed, the CPU needs *sub-nanosecond access speed.* **How does the size of storage affect access speed?** The size of storage affects access speed like so: * 100-1000 bytes - Sub-nanosecond speed. * Gigabytes - 15 nanoseconds. * Terabytes - Milliseconds. **What is the overall goal of caching?** The overall goal of caching is *to have an average access time in sub-nanoseconds but with many gigabytes of memory.* **What does a program need to exhibit to benefit from proper caching?** To benefit from proper caching, a program needs to exhibit *good locality.* # Software Locality **What are the two kinds of software locality?** The two kinds of software locality are: 1. Temporal locality - If a program references `X` now, it will probably reference it again soon. 2. Spatial locality - If a program references `X` now, it will probably reference something at address `X+1` soon. # Locality Example > [!example] Example of a program with good locality > ```c > sum = 0; > for (i = 0; i < n; i++) > sum += a[i]; > ``` > * Temporal locality. > * Data - Whenever it accesses `sum`, it accesses it again shortly after. > * Instructions - Whenever it does `sum += a[i];`, it does it again shortly after. > * Spatial locality. > * Data - Whenever it access `a[i]`, it accesses `a[i]` shortly after. > * Instructions - Whenever it does `sum += a[i]`, it does `i++` shortly after. # Cache Hits and Misses > [!example] Example of a cache hit and miss > ![[Example Diagram Of Cache Levels.png]] > 1. The CPU requests block 10, which is in L1 cache. It can access block 10 right away. A hit in the first level of cache is the most ideal (fastest) situation possible. > 2. The CPU requests block 8, which is in L2 cache. It must first check L1 cache, which results in a miss. Since it's not in L1 cache, one of the blocks there must be evicted to make room for block 8. Block 8 is loaded from the L2 cache to the L1 cache. The CPU can then access block 8 from the L1 cache. A miss in any level of cache causes the access time to slow down. # Cache Eviction Policies **What is the best and worst cache eviction policy?** The best and worst cache eviction policies are: 1. The oracle - Evicting blocks that are never accessed again or accessed the furthest in the future. 2. Evicting the block that will be accessed next, leading to thrashing. > **Are the best and worst cache eviction policies possible in the general case?** > *No*, the best and worst cache eviction policies are impossible in the general case. **What is a reasonable cache eviction policy?** A reasonable cache eviction policy is *the Least Recently Used (LRU) policy, where blocks that were accessed the longest time ago are evicted assuming they won't be accessed again soon.* > **When can the Least Recently Used (LRU) cache eviction policy be good?** > The LRU cache eviction policy can be good *when used with straight-line code.* > > **When can the Least Recently Used (LRU) cache eviction policy not good?** > The LRU cache eviction policy can be not good *when used with loops, especially large loops.* > > **Is the Least Recently Used (LRU) cache eviction policy cheap to implement?** > *No*, the LRU cache eviction policy is expensive to implement. # Storage Hierarchy & Caching Issues **What is the issue concerning block size?** The issue concerning block size is *whether or not it should be large or small depending on the storage device.* > **What are the two benefits of large block sizes?** > The two benefits of large block sizes are: > 1. They don't have to be transferred as often. > 2. They can take advantage of spatial locality. > > **What are the two downsides of large block sizes?** > The two downsides of large block sizes are: > 1. They take a longer time to transfer. > 2. They can't take advantage of temporal locality. > > **Are the benefits and downsides of small block sizes just the opposite of ones for large block sizes?** > *Yes*, the benefits and downsides of small block sizes are just the opposite of ones for large block sizes. **What is the typical block size for each storage device?** The typical block size for each storage device is: | Device | Block Size | | ----------------------- | --------------------------- | | Register | 8 bytes | | L1 / L2 / L3 cache line | 128 bytes | | Main memory page | 4 to 64 kilobytes | | Disk block | 512 bytes to 4 kilobytes | | Disk transfer block | 4 kilobytes to 64 megabytes | **What is the issue concerning managing the cache?** The issue concerning managing the cache is *who is responsible for managing each type of storage device.* **Who is typically responsible for managing each storage device?** Typically, those responsible for managing each storage device include: | Device | Managed by | | ----------------------- | ----------------------------------------------------------------------------------- | | Registers | Compiler - Using complex code-analysis techniques.<br>Assembly language programmer. | | L1 / L2 / L3 cache | Hardware - Using simple algorithms. | | Main memory | Hardware and OS - Using virtual memory and complex algorithms. | | Local secondary storage | End user - By deciding what files to store or delete. | # Main Memory: Illusion **What is main memory to a process?** To a process, main memory is *an illusion.* > **How much main memory does a process see?** > A process sees *16 exabytes of uniform main memory.* > > **What does it mean for memory to be uniform?** > > Memory is uniform when *there are continuous memory locations from 0 to the highest capacity available.* # Main Memory: Reality **What is memory divided into?** Memory is divided into *pages.* > **Can the location of each page of memory differ?** > *Yes*, the location of each page of memory can differ. # Virtual & Physical Addresses (cont.) **What does a virtual memory address identify?** A virtual memory address identifies *a location in a particular process's virtual memory.* > **What does a virtual memory address consist of?** > A virtual memory address consists of *a virtual page number and offset.* > > **What are virtual memory addresses used by?** > Virtual memory addresses are used by *applications and programs.* **What does a physical memory address identify?** A physical memory address identifies *a location in physical memory.* > **What does a physical memory address consist of?** > A physical memory address consists of *a physical page number and offset.* > > **What are physical memory addresses used by?** > Physical memory addresses are used by *the operating system and hardware.* **Is the offset the same between a virtual memory address and its corresponding physical memory address?** *Yes*, the offset is the same between a virtual memory address and its corresponding physical memory address.