# What Is Cache Memory?
**Cache memory is ... than regular system memory.**
Cache memory is *faster* than regular system memory.
> [!example] Examples of different SRAM cache memory configurations
> * Embedded in the processor core (similar to registers).
> * On the same silicon die as the processor, but outside the core.
> * On a Printed Circuit Assembly (PCA).
**How fast is SRAM compared to DRAM?**
Compared to DRAM, SRAM is *faster.*
> **Why is DRAM slower than SRAM?**
> DRAM is slower than SRAM *because it has to be refreshed constantly, which causes wait states.*
# Cache Levels
**What are the three cache levels and their typical capacities?**
The three cache levels and their typical capacities are:
1. L1 cache - Between 256 kilobytes and 1 megabyte.
2. L2 cache - Up to 8 megabytes.
3. L3 cache - Between 4 megabytes and 50 megabytes.
> **How does the access speed change as the cache level increases?**
> As the cache level increases, the access speed *decreases.*
# Cache Function
**How are instructions and data transferred between DRAM and the caches?**
Instructions and data are transferred between DRAM and the caches like so:
```mermaid
graph LR
A(DRAM) --> B(L3 cache) --> C(L2 cache) --> D(L1 cache)
```
**What is it called when the processor finds or doesn't find instructions or data in the cache?**
When the processor finds or doesn't find instructions or data in the cache, it is called *a hit or a miss.*
**Why are instructions and data fetched from DRAM and stored in the three levels of cache?**
Instructions and data are fetched from DRAM and stored in the three levels of cache *to speed up execution.*
**What happens if the instructions or data are found in the L2 cache and not the L1 cache?**
If the instructions or data are found in the L2 cache and not the L1 cache, *it updates L1 cache with the instructions or data.*
**What can DRAM be used to cache and how?**
DRAM can be used to cache *data from the hard drive using swap files.*
# Why Caches Help
**How fast were early processors compared to the speed of RAM?**
Compared to the speed of RAM, early processors were *not much faster.*
> **What did faster processors become limited by?**
> Faster processors became limited by *the speed of memory.*
>
> **How much faster is cache memory compared to DRAM?**
> Compared to DRAM, cache memory is *10 to 100 times faster.*
>
> **Why are caches so much faster than DRAM?**
> Caches are so much faster than DRAM *because they're physically closer to the processor core.*
**What are the two benefits of DRAM?**
The two benefits of DRAM are:
1. It's less expensive.
2. It can be manufactured more densely for higher storage capacity.
**What is the storage capacity for SRAM and DRAM measured in?**
The storage capacity for SRAM and DRAM is measured in:
* SRAM - Kilobytes or megabytes.
* DRAM - Gigabytes.
# L1 Cache
**How much storage does L1 cache typically have in x86 processors?**
In x86 processors, L1 cache typically has *64 kilobytes per core.*
**What is L1 cache divided into?**
L1 cache is divided into:
* L1-I - Instructions.
* L1-D - Data.
**How fast does L1 cache operate?**
L1 cache operates *as fast or faster than the maximum clock speed of the CPU.*
**What is the diagram of the sequence of transfers between DRAM and the CPU?**
The diagram of the sequence of transfers between DRAM and the CPU is:
![[Diagram Of The Transfers Between DRAM And The CPU.png]]
# L2 Cache
**How much storage does the L2 cache typically have?**
The L2 cache typically has *6 to 12 megabytes.*
> **How much storage does the L2 cache typically have in higher-end processors?**
> In higher-end processors, the L2 cache typically has *32 megabytes of L2 cache.*
**What is the diagram of the placement of L2 cache in a processor?**
The diagram for the placement of L2 cache in a processor is:
![[Diagram Of The Placement Of L2 Cache In A Processor.png]]
# L3 Cache
**What does the L3 cache act as in multi-core processors?**
In multi-core processors, the L3 cache acts as *a shared memory bank for all cores.*
**What do misses in the L3 cache result in?**
Misses in the L3 cache result in *transfers from slower DRAM.*
**How was early L3 cache implemented?**
Early L3 cache was implemented *as a separate chip on the motherboard.*
> **How is modern L3 cache implemented and why?**
> Modern L3 cache is implemented *inside the CPU for better performance.*
# Cache Functional Illustration
...
# Memory And Storage Pyramid
**What is the diagram for the relationship between memory size and speed?**
The diagram for the relationship between memory size and speed is:
![[Memory And Storage Pyramid Diagram.png]]
# L1 To L3 Cache Summary
...
# Memory Size/Speed Comparison
**How does the latency and capacity of memory types compare with one another?**
The latency and capacity of memory types compare with one another like so:
| Memory Type | Latency (Cycles) | Capacity |
| ------------------ | ---------------- | --------------- |
| Registers | 0 | 8-256 registers |
| L1 / L2 / L3 cache | 1 to 40 | 32 KB to 32 MB |
| RAM | 50 to 100 | GB |
**How does the latency and capacity of storage device types compare with one another?**
The latency and capacity of memory types compare with one another like so:
| Storage Device Type | Latency | Capacity |
| ---------------------------------------- | -------------------- | -------------- |
| Flash memory (nonvolatile) | 0.1 ms (~300 cycles) | 128 GB to 1 TB |
| Spinning magnetic platters, moving heads | 10 ms (~30M cycles) | 1 TB to 10 TB |
# Cache RAM Latency
...
# Disk Storage Latency
...
# Software Locality For Caches
**What access speed does the CPU need to run instructions at at full speed?**
To run instructions at full speed, the CPU needs *sub-nanosecond access speed.*
**How does the size of storage affect access speed?**
The size of storage affects access speed like so:
* 100-1000 bytes - Sub-nanosecond speed.
* Gigabytes - 15 nanoseconds.
* Terabytes - Milliseconds.
**What is the overall goal of caching?**
The overall goal of caching is *to have an average access time in sub-nanoseconds but with many gigabytes of memory.*
**What does a program need to exhibit to benefit from proper caching?**
To benefit from proper caching, a program needs to exhibit *good locality.*
# Software Locality
**What are the two kinds of software locality?**
The two kinds of software locality are:
1. Temporal locality - If a program references `X` now, it will probably reference it again soon.
2. Spatial locality - If a program references `X` now, it will probably reference something at address `X+1` soon.
# Locality Example
> [!example] Example of a program with good locality
> ```c
> sum = 0;
> for (i = 0; i < n; i++)
> sum += a[i];
> ```
> * Temporal locality.
> * Data - Whenever it accesses `sum`, it accesses it again shortly after.
> * Instructions - Whenever it does `sum += a[i];`, it does it again shortly after.
> * Spatial locality.
> * Data - Whenever it access `a[i]`, it accesses `a[i]` shortly after.
> * Instructions - Whenever it does `sum += a[i]`, it does `i++` shortly after.
# Cache Hits and Misses
> [!example] Example of a cache hit and miss
> ![[Example Diagram Of Cache Levels.png]]
> 1. The CPU requests block 10, which is in L1 cache. It can access block 10 right away. A hit in the first level of cache is the most ideal (fastest) situation possible.
> 2. The CPU requests block 8, which is in L2 cache. It must first check L1 cache, which results in a miss. Since it's not in L1 cache, one of the blocks there must be evicted to make room for block 8. Block 8 is loaded from the L2 cache to the L1 cache. The CPU can then access block 8 from the L1 cache. A miss in any level of cache causes the access time to slow down.
# Cache Eviction Policies
**What is the best and worst cache eviction policy?**
The best and worst cache eviction policies are:
1. The oracle - Evicting blocks that are never accessed again or accessed the furthest in the future.
2. Evicting the block that will be accessed next, leading to thrashing.
> **Are the best and worst cache eviction policies possible in the general case?**
> *No*, the best and worst cache eviction policies are impossible in the general case.
**What is a reasonable cache eviction policy?**
A reasonable cache eviction policy is *the Least Recently Used (LRU) policy, where blocks that were accessed the longest time ago are evicted assuming they won't be accessed again soon.*
> **When can the Least Recently Used (LRU) cache eviction policy be good?**
> The LRU cache eviction policy can be good *when used with straight-line code.*
>
> **When can the Least Recently Used (LRU) cache eviction policy not good?**
> The LRU cache eviction policy can be not good *when used with loops, especially large loops.*
>
> **Is the Least Recently Used (LRU) cache eviction policy cheap to implement?**
> *No*, the LRU cache eviction policy is expensive to implement.
# Storage Hierarchy & Caching Issues
**What is the issue concerning block size?**
The issue concerning block size is *whether or not it should be large or small depending on the storage device.*
> **What are the two benefits of large block sizes?**
> The two benefits of large block sizes are:
> 1. They don't have to be transferred as often.
> 2. They can take advantage of spatial locality.
>
> **What are the two downsides of large block sizes?**
> The two downsides of large block sizes are:
> 1. They take a longer time to transfer.
> 2. They can't take advantage of temporal locality.
>
> **Are the benefits and downsides of small block sizes just the opposite of ones for large block sizes?**
> *Yes*, the benefits and downsides of small block sizes are just the opposite of ones for large block sizes.
**What is the typical block size for each storage device?**
The typical block size for each storage device is:
| Device | Block Size |
| ----------------------- | --------------------------- |
| Register | 8 bytes |
| L1 / L2 / L3 cache line | 128 bytes |
| Main memory page | 4 to 64 kilobytes |
| Disk block | 512 bytes to 4 kilobytes |
| Disk transfer block | 4 kilobytes to 64 megabytes |
**What is the issue concerning managing the cache?**
The issue concerning managing the cache is *who is responsible for managing each type of storage device.*
**Who is typically responsible for managing each storage device?**
Typically, those responsible for managing each storage device include:
| Device | Managed by |
| ----------------------- | ----------------------------------------------------------------------------------- |
| Registers | Compiler - Using complex code-analysis techniques.<br>Assembly language programmer. |
| L1 / L2 / L3 cache | Hardware - Using simple algorithms. |
| Main memory | Hardware and OS - Using virtual memory and complex algorithms. |
| Local secondary storage | End user - By deciding what files to store or delete. |
# Main Memory: Illusion
**What is main memory to a process?**
To a process, main memory is *an illusion.*
> **How much main memory does a process see?**
> A process sees *16 exabytes of uniform main memory.*
> > **What does it mean for memory to be uniform?**
> > Memory is uniform when *there are continuous memory locations from 0 to the highest capacity available.*
# Main Memory: Reality
**What is memory divided into?**
Memory is divided into *pages.*
> **Can the location of each page of memory differ?**
> *Yes*, the location of each page of memory can differ.
# Virtual & Physical Addresses (cont.)
**What does a virtual memory address identify?**
A virtual memory address identifies *a location in a particular process's virtual memory.*
> **What does a virtual memory address consist of?**
> A virtual memory address consists of *a virtual page number and offset.*
>
> **What are virtual memory addresses used by?**
> Virtual memory addresses are used by *applications and programs.*
**What does a physical memory address identify?**
A physical memory address identifies *a location in physical memory.*
> **What does a physical memory address consist of?**
> A physical memory address consists of *a physical page number and offset.*
>
> **What are physical memory addresses used by?**
> Physical memory addresses are used by *the operating system and hardware.*
**Is the offset the same between a virtual memory address and its corresponding physical memory address?**
*Yes*, the offset is the same between a virtual memory address and its corresponding physical memory address.