Saturday, 4 June 2016

assembly - Performance Monitoring Counter (RDPMC) on a specific processor



I'm trying to use RDPMC Instruction for counting retired instructions and as Intel Software Developer's Manual Volume 3, Appendix A (In PERFORMANCE MONITORING section) mentioned:




• Instructions Retired — Event select C0H, Umask 00H
This event counts the number of instructions at retirement. For instructions that consist of multiple micro-ops,

this event counts the retirement of the last micro-op of the instruction. An instruction with a REP prefix counts
as one instruction (not per iteration). Faults before the retirement of the last micro-op of a multi-ops instruction
are not counted.




I used the answer from here to enable the performance counter from Linux Kernel-Mode module.



As you can see from here (Description of RDPMC):





Loads the contents of the 40-bit performance-monitoring counter specified in the ECX register into registers EDX:EAX. The EDX register is loaded with the high-order 8 bits of the counter and the EAX register is loaded with the low-order 32 bits. The Pentium® Pro processor has two performance-monitoring counters (0 and 1), which are specified by placing 0000H or 0001H, respectively, in the ECX register.




After that, I put the 0 to RAX and execute RDPMC (in user-mode) but after RDPMC executed multiple times EDX:EAX are still zero.



So my questions are :




  1. How to count the Retired Instructions on a specific process in user-mode?

  2. What are the differences between Event select C0H and Umask 00H and I want to know how to use C0H and 00H?



Answer




I put the 0 to RAX and execute RDPMC




The selector goes in ECX, not EAX.




How to count the Retired Instructions on a specific process in user-mode?





Use perf stat ./a.out if you want Linux to virtualize the performance counters on context switches and CPU migrations to track things on per-process basis instead of a per-CPU basis. Or if you're programming the performance counters manually, make sure you pin your process to a core.



I often profile stuff with perf stat -etask-clock,context-switches,cpu-migrations,page-faults,cycles,branches,instructions,uops_issued.any,uops_executed.thread ./a.out. (e.g. see the output in Can x86's MOV really be "free"? Why can't I reproduce this at all?).



Perf's instructions event uses the Instructions Retired counter. (Actually it uses the fixed counter for that event, instead of using up a slot on one of the programmable counters.)



Symbolic names for non-generic uarch-specific events like uops_issued.any used to only be available in the ocperf.py wrapper script, but perf 4.15.gd8a5b8 on Arch Linux supports them directly. I think this change was pretty recent.





What are the differences between Event select C0H and Umask 00H and I want to know how to use C0H and 00H?




You have to program the programmable counter with the right event and unit mask. The umask usually selects variations of some related thing. See http://oprofile.sourceforge.net/docs/intel-haswell-events.php for a list of what the umask values for each event do on Haswell.






Besides the large an complex perf subsystem in Linux, there are already a few open-source libraries for programming the perf counters to set up for reading them from user-space. See Perf overcounting simple CPU-bound loop: mysterious kernel work? for libpfc, which includes a demo.




You really don't need to write your own if you just want to use it.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...