Am 08.03.20 um 21:59 schrieb Aaron Puchert:
I noticed that
when I compile boinc, it is always around 4x slower on
the CPU benchmarks then the version distributed by Tumbleweed. I do use
'-O3 -funroll-loops -ffast-math' as specified on the boinc website.
So, I wonder what options I might be missing. Or is the binary
distributed by TW custom optimized?
You can inspect the package sources, there is nothing unusual. 
The obvious suspect would be link-time optimization (-flto=auto), which
is enabled in TW by default. My suspicion: the test for integer
operations  is split among several functions that a compiler would be
unwilling to inline because of their length, but LTO can see the whole
program, observe that a function is only used once and then inline it
anyway. This might enable further optimizations, for example aliasing
information from the caller can be used in the now inlined subroutines.
Looking at the floating-point benchmark , there seems to be just one
function, but then we have this:
// External array; store results here so that optimizing compilers
// don't do away with their computation.
// suggested by Ben Herndon
Well this breaks down with LTO: the linker can "internalize" the array,
observing there are no accesses to it outside of this translation unit.
This is just speculation though, I didn't look at the actual assembly.
If you want to know more about what LTO can do, I can recommend this
talk by Teresa Johnson: <https://www.youtube.com/watch?v=p9nH2vZ2mNo>.
It's specifically about LLVM's ThinLTO, but many things apply to LTO in
Oh, I missed the most obvious thing: Dhrystone is actually split into
two translation units , so LTO opens up new inlining opportunities
To unsubscribe, e-mail: opensuse-factory+unsubscribe(a)opensuse.org
To contact the owner, e-mail: opensuse-factory+owner(a)opensuse.org