I noticed that when I compile boinc, it is always around 4x slower on the CPU benchmarks then the version distributed by Tumbleweed. I do use '-O3 -funroll-loops -ffast-math' as specified on the boinc website.
So, I wonder what options I might be missing. Or is the binary distributed by TW custom optimized?
You can inspect the package sources, there is nothing unusual. [1] The obvious suspect would be link-time optimization (-flto=auto), which is enabled in TW by default. My suspicion: the test for integer operations [2] is split among several functions that a compiler would be unwilling to inline because of their length, but LTO can see the whole program, observe that a function is only used once and then inline it anyway. This might enable further optimizations, for example aliasing information from the caller can be used in the now inlined subroutines. Looking at the floating-point benchmark [3], there seems to be just one function, but then we have this: // External array; store results here so that optimizing compilers // don't do away with their computation. // suggested by Ben Herndon // double extern_array[12]; Well this breaks down with LTO: the linker can "internalize" the array, observing there are no accesses to it outside of this translation unit. This is just speculation though, I didn't look at the actual assembly. If you want to know more about what LTO can do, I can recommend this talk by Teresa Johnson: https://www.youtube.com/watch?v=p9nH2vZ2mNo. It's specifically about LLVM's ThinLTO, but many things apply to LTO in general. [1] https://build.opensuse.org/package/show/network/boinc-client [2] https://github.com/BOINC/boinc/blob/master/client/dhrystone.cpp [3] https://github.com/BOINC/boinc/blob/master/client/whetstone.cpp -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org