Wednesday, 24 May 2017

c++ - SSE intrinsics check zero flag




I was wondering if it was possible to check the processor's flags register by the means of Intel's SSE intrinsic functions?



For example:



int idx = _mm_cmpistri(mmrange, mmstr, 0x14);
int zero = _mm_cmpistrz(mmrange, mmstr, 0x14);


In this example the compiler is able to optimize those two intrinsics to a single instruction (pcmpistri) and checking the flags register by a jump instruction (jz).




However in the following example the compiler doesn't manage to optimize the code properly:



__m128i mmmask = _mm_cmpistrm(mmoldchar, mmstr, 0x40);
int zero = _mm_cmpistrz(mmoldchar, mmstr, 0x40);


Here, the compiler generates a pcmpistrm and a pcmpistri instruction. However, in my opinion, the second instruction is redundant because pcmpistrmsets the flags in the processor's flags register in the same way as pcmistri.



So, to come back to my question, is there a way to either read the flags register directly or to instruct the compiler to only generate a pcmpistrm instruction?


Answer




Looks like just an MSVC missed-optimization bug, not anything inherent.



gcc6.2 and icc17 successfully use both results from one PCMPISTRM in a test function I wrote that branches on the zero result (on the Godbolt compiler explorer):



#include 
__m128i foo(__m128i mmoldchar, __m128i mmstr)
{
__m128i mmmask = _mm_cmpistrm(mmoldchar, mmstr, 0x40);
int zero = _mm_cmpistrz(mmoldchar, mmstr, 0x40);
if(zero)

return mmmask;
else
return _mm_setzero_si128();
}

##gcc6.2 -O3 -march=nehalem
pcmpistrm xmm0, xmm1, 64
je .L5
pxor xmm0, xmm0
ret

.L5:
ret


OTOH, clang3.9 fails to CSE, and uses a PCMPISTRI.



foo:
movdqa xmm2, xmm0
pcmpistri xmm2, xmm1, 64
pxor xmm0, xmm0

jne .LBB0_2
pcmpistrm xmm2, xmm1, 64
.LBB0_2:
ret





Note that according to Agner Fog's instruction tables, PCMPISTRM has good throughput but high latency, so there's lots of room to do two in parallel if latency is the bottleneck. Jumping through hoops like using __readflags() might actually be worse.


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...