Re: ALP will be x86_64-v2 was: x86_64 architecture level requirements, x86-64-v2 for openSUSE Factory

28 Sep 2022

      On Wed, Sep 28, 2022 at 10:24:56AM +0100, Daniel Morris wrote:
...
On Wed, Sep 28, 2022 at 12:14:07AM +0200, Michal Suchánek wrote:
...
And that's the thing: for vast majority of code being able to run it a
few % faster makes absolutely no difference for vast majority of use
cases.
And for a lot of code it would be great to run it faster but the
bottleneck is not the CPU so the new instructions don't really help.
Whilst not wanting to dead-end lots of very serviceable hardware, should
be careful about not exploiting newer features too. Findings presented
at a power conference by Intel a few years ago (seen similar from others
too) was the benefit of doing things as quickly as possible, then diving
into a low-power mode and/or shutting down subsystems of the chip.
One might think "who cares about power-saving? This system's not a
laptop/embedded/sensor etc", but inefficient work gets turned into heat,
which then requires further management - one of our colos has charged us
per U (rackspace), per GiB (bandwidth) and per W (thermal cost of
aircon) for two decades; which was an unusual pricing structure at the
outset. Seemingly marginal gains do add up over time.
Full disclosure, our company collaborated on further research to
instrument a system and see if power optimisations could be written into
gcc/a compiler. Initially it was an immense project, similar to
"extracting a decagram of myelin from 4 tons of earth worms".
And do you have some data that you can share that shows which
optimizarions save how much power for which workloads?

That's exactly the thing I am missing here: some data showing the
benefits of axing that hardware.

Because the same applies to power savings as applies to time savings -
optimizing code you do not run at all or very rarely does not save you
time nor power.

So far we only got some benchmark showing that compiling random pieces
of code for x86-64-v3 runs mostly faster, and compiling some random
pieces of code for x86-64-v2 runs sometimes faster and sometimes slower.

Whith no summary so it is not clear how often it runs faster, and how
often it runs slower.

And it is not clear how often you will encounter such pieces of code in
real world workloads, either.

Thanks

Michal