The L1, L2, and L3 caches are crucial components in modern computing architectures, playing a pivotal role in bridging the speed gap between the ultra-fast CPU and the relatively slower main memory (RAM). Let’s have a overview on these cache levels
1. π³πππππππ & π·ππππππππ:
π³π: Directly integrated into the processor core, ensuring minimal access latency.
It’s the primary cache used by the CPU to store instructions and data for immediate processing.
π³π:Β L2 cache is either located on the CPU chip or situated very close to it on the same die.
π³π:Β Located on the CPU die but may serve multiple cores, making it a shared resource.
2. πΊπππ & πΊππππ
:
π³π: ~16KB to 128KB per core. Latency is ~ 0.5 to 1.5 ns. Offers the fastest access time due to its proximity to the CPU core.
π³π: ~256KB to 512KB per core, but can vary. Latency is ~ 5 to 14 ns.
π³π: Ranges widely from 2MB to 50MB or more, shared across all cores. Latency, ~2 to 50 ns, depending on the architecture and distance from the core.
3. πΊππππππππ & πΆπππππππππππ:
π³π: Often split into two:
I-Cache (Instruction Cache): Dedicated to holding the upcoming instructions for the CPU. Often uses a direct-mapped or 2-way set associative structure.
D-Cache (Data Cache): Contains data for instructions. Typically uses a 2-way or 4-way set associative structure.
Cache line/block size is typically 32 or 64 bytes.
π³π: Can be unified (holding both data and instructions) or split like L1.
Typically uses a more highly associative structure than L1, like 8-way or 16-way set associative.
Similar to L1, typically 32 or 64 bytes for cache line size.
π³π: Typically unified. More highly associative than L2, with designs like 16-way or 32-way set associative structures being common. Generally hasΒ 64 bytes for cache line size.
4. πΉππππππππππ π·πππππππ:
π³π: Common algorithms like Least Recently Used (LRU) are employed to decide which entries to evict when new data is brought in.
May use write-through or write-back strategies for handling writes.
π³π: Common algorithms like Least Recently Used (LRU) are employed to decide which entries to evict when new data is brought in.
May use write-through or write-back strategies for handling writes.
π³π: L3 often incorporates advanced prefetching algorithms to anticipate data needs.
Some architectures use a portion of L3 as a “victim cache” for data evicted from L1 or L2, providing a second chance before data is fetched from the slower main memory. Various write policies are employed, similar to L1 and L2
Cache Line: When data is fetched from main memory, it’s fetched in blocks (not individually). This block is referred to as a cache line.
Cache Coherency: Especially in multicore systems, ensuring that all cores have a consistent view of memory is crucial. Protocols like MESI (Modified, Exclusive, Shared, Invalid) are used to manage this.
Cache can handle write operations in various ways:
– Write Through: Write to both the cache and main memory.
– Write Back: Write to the cache first and then write to main memory when the cache line is replaced.
– Write Allocate: On a cache miss during a write, the cache line is loaded from main memory, then written.
– No Write Allocate: On a cache miss during a write, the data is written to main memory, not the cache.
LinkedIn post: https://www.linkedin.com/posts/t-yashwanth-naidu_embedded-embeddedengineers-embeddedsystems-activity-7120250898420834304-TmhQ/?utm_source=share&utm_medium=member_desktop
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An Article by: Yashwanth Naidu Tikkisetty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
