
On 28/09/2022 03.28, Aaron Puchert wrote:
Am 28.09.22 um 02:58 schrieb Bernhard M. Wiedemann:
On 28/09/2022 02.29, Aaron Puchert wrote:
I would love to see some kind of assembly diff for building the whole distribution with x86_64-v2, but this might be pretty hard. Perhaps Bernhard might be able to do this using the reproducible builds infrastructure.
I have seen software use -march=native and that always differed. In any non-trivial code, the compiler will find places where newer instructions can be used.
Of course on i586 we can expect -march=native to have a big impact, but x86_64 base already has SSE2 and doesn't use x87 anymore. Of course there's wider vectorization with AVX, but lots of code doesn't vectorize well.
I guess -march=xxx also implies -mtune=xxx, or at least changes scheduling decisions. So we don't only see newer instructions being used, we also see older instructions being shuffled around or replaced by different older instructions because newer hardware has different latency/throughput characteristics.
There is probably no way to reliably assess what changed due to new instructions being available.
Aaron
https://rb.zq1.de/temp/zstd-compare.out I created this with a new still-to-be-pushed version of my reproducibleopensuse tools with c="-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection" ; optflags1="$c -march=x86-64" optflags2="$c -march=x86-64-v2" rbk The gcc-11 man-page says, that in the case of -march=x86-64* generic tuning is applied and you can see a difference in instructions created, not just in ordering: https://github.com/facebook/zstd/blob/eadb6c874f9d0c9e90c835f8b0181da802361e...
--- old /usr/lib64/libzstd.a/cover.o (disasm) +++ new /usr/lib64/libzstd.a/cover.o (disasm) @@ -122,19 +122,19 @@ jg <COVER_ctx_init + ofs> mov (%rsp),%rax sub %r15,%rbp + movd %ebx,%xmm0 mov %r12,offset(%r13) lea offset(%rbp),%r14 + pinsrd $something,%r11d,%xmm0 mov %esi,offset(%rsp) mov %rax,offset(%r13) - mov %ebx,%eax + mov %r10d,%eax + pmovzxdq %xmm0,%xmm0 lea offset(,%r14,4),%r15 mov %rax,offset(%r13) - mov %r11d,%eax mov %r15,%rdi - mov %rax,offset(%r13) - mov %r10d,%eax - mov %rax,offset(%r13) mov %r14,offset(%r13) + movups %xmm0,offset(%r13) mov %r9d,offset(%rsp) call <COVER_ctx_init + ofs>