Thursday, 7 July 2016

latency - Approximate cost to access various caches and main memory?



Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors?



While this isn't specifically a programming question, knowing these kinds of speed details is neccessary for some low-latency programming challenges.


Answer



Here is a Performance Analysis Guide for the i7 and Xeon range of processors. I should stress, this has what you need and more (for example, check page 22 for some timings & cycles for example).




Additionally, this page has some details on clock cycles etc. The second link served the following numbers:



Core i7 Xeon 5500 Series Data Source Latency (approximate)               [Pg. 22]

local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns )
local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns )
local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns )
local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns )
local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns )


remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns )

local DRAM ~60 ns
remote DRAM ~100 ns


EDIT2:

The most important is the notice under the cited table, saying:





"NOTE: THESE VALUES ARE ROUGH APPROXIMATIONS. THEY DEPEND ON
CORE AND UNCORE FREQUENCIES, MEMORY SPEEDS, BIOS SETTINGS,
NUMBERS OF DIMMS
, ETC,ETC..YOUR MILEAGE MAY VARY."




EDIT: I should highlight that, as well as timing/cycle information, the above intel document addresses much more (extremely) useful details of the i7 and Xeon range of processors (from a performance point of view).


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...