Re: ALP will be x86_64-v2 was: x86_64 architecture level requirements, x86-64-v2 for openSUSE Factory

1 Oct 2022

      On Thu, Sep 29, 2022 at 10:14:59AM +0100, Daniel Morris wrote:
...
On Wed, Sep 28, 2022 at 09:12:43PM +0200, Michal Suchánek wrote:
...
...
Full disclosure, our company collaborated on further research to
instrument a system and see if power optimisations could be written into
gcc/a compiler. Initially it was an immense project, similar to
"extracting a decagram of myelin from 4 tons of earth worms".
And do you have some data that you can share that shows which
optimizarions save how much power for which workloads?
Sadly I don't think anything is as clear cut as that, but here's a link
to Bristol Uni's research (courtesy of Embecosm stumping up the open
access publication fee):
https://doi.org/10.1093/comjnl/bxt129
That's an interesting paper, thanks.

It's mostly focused on embedded systems, and mostly ARM architecture,
and is not exactly recent.

However, there are a few points that are clear.

It is pretty much impossible to predict how optimization techniques
affect performance without doing the actual measurement. For decisions
arguing about performance solid data is a must, otherwise it's nonsense.

Short execution time correlates with less power consumption in general
but there is a case when it does not. On Cartex-A8 cores enabling NEON
instruction generation switches from using the general purpose
arithmetic unit to the SIMD unit which does not save time but is more
energy efficient on this CPU for the tested benchmarks.

This is interesting in multiple ways.

The benchmarks used are mostly number crunching, and on this particular
CPU it is difficult to move data between the general purpose arithmetic
unit and the SIMD unit so you get to use one or the other. And the SIMD
unit is not faster but is more energy efficient. If you had an algorithm
that was easier to parallelize you could use both units but with the
code in question GCC could not pull it off.

But it can be seen in a different way, too. This NEON unit is an
accelerator which you can use to offload computation in a more efficient
way, and for many common computations accelerators are becoming
available, either inside the CPU core itself or as a separate device.

This makes the performance of the code running on the general purpose
CPU less important over time, and this chase for more CPU features
enabled for all code less rewarding.

Thanks

Michal