Friday 1 July 2016

performance - Cycles/cost for L1 Cache hit vs. Register on x86?




I remember assuming that an L1 cache hit is 1 cycle (i.e. identical to register access time) in my architecture class, but is that actually true on modern x86 processors?



How many cycles does an L1 cache hit take? How does it compare to register access?


Answer



Here's a great article on the subject:



http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/1



To answer your question - yes, a cache hit has approximately the same cost as a register access. And of course a cache miss is quite costly ;)




PS:



The specifics will vary, but this link has some good ballpark figures:



Approximate cost to access various caches and main memory?



Core i7 Xeon 5500 Series Data Source Latency (approximate)
L1 CACHE hit, ~4 cycles
L2 CACHE hit, ~10 cycles

L3 CACHE hit, line unshared ~40 cycles
L3 CACHE hit, shared line in another core ~65 cycles
L3 CACHE hit, modified in another core ~75 cycles remote
L3 CACHE ~100-300 cycles
Local DRAM ~30 ns (~120 cycles)
Remote DRAM ~100 ns


PPS:




These figures represent much older, slower CPUs, but the ratios basically hold:



http://arstechnica.com/gadgets/reviews/2002/07/caching.ars/2



Level                    Access Time  Typical Size  Technology    Managed By
----- ----------- ------------ --------- -----------
Registers 1-3 ns ?1 KB Custom CMOS Compiler
Level 1 Cache (on-chip) 2-8 ns 8 KB-128 KB SRAM Hardware
Level 2 Cache (off-chip) 5-12 ns 0.5 MB - 8 MB SRAM Hardware
Main Memory 10-60 ns 64 MB - 1 GB DRAM Operating System

Hard Disk 3M - 10M ns 20 - 100 GB Magnetic Operating System/User

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...