intel - Understanding CYCLE_ACTIVITY.* Haswell Performance-Monitoring Events

Tuesday, 31 January 2017

intel - Understanding CYCLE_ACTIVITY.* Haswell Performance-Monitoring Events

I'm trying to analyse an execution on an Intel Haswell CPU (Intel® Core™ i7-4900MQ) with the Top-down Microarchitecture Analysis Method (TMAM), described in Chapters B.1 and B.4 of the Intel® 64 and IA-32 Architectures
Optimization Reference Manual. (I adjust the Sandy Bridge formulas described in B.4 to the Haswell Microarchitecture if needed.)

Therefore I perform performance counter events measurements with Perf. There are some results I don’t understand:

CPU_CLK_UNHALTED.THREAD_P < CYCLE_ACTIVITY.CYCLES_LDM_PENDING

This holds only for a few measurements, but still is weird. Does the PMU count halted cycles for CYCLE_ACTIVITY.CYCLES_LDM_PENDING?

CYCLE_ACTIVITY.CYCLES_L2_PENDING > CYCLE_ACTIVITY.CYCLES_L1D_PENDING
and CYCLE_ACTIVITY.STALLS_L2_PENDING > CYCLE_ACTIVITY.STALLS_L1D_PENDING

This applies for all measurements. When there is a L1D cache miss, the load gets transferred to the L2 cache, right? So a load missed L2 earlier also missed L1. There is the L1 instruction cache not counted here, but with *_L2_PENDING being 100x or even 1000x greater than *_L1D_PENDING it is probably not that.. Are the stalls/cycles being measured somehow separately? But than there is this formula:

%L2_Bound = (CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING) / CLOCKS

Hence CYCLE_ACTIVITY.STALLS_L2_PENDING < CYCLE_ACTIVITY.STALLS_L1D_PENDING is assumed (the result of the formula must be positive). (The other thing with this formula is that it should probably be CYCLES instead of STALLS. However this wouldn't solve the problem described above.) So how can this be explained?

edit: My OS: Ubuntu 14.04.3 LTS, kernel: 3.13.0-65-generic x86_64, perf version: 3.13.11-ckt26

Blog

Tuesday, 31 January 2017

intel - Understanding CYCLE_ACTIVITY.* Haswell Performance-Monitoring Events

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?