On Wed, Sep 28, 2022 at 10:24:56AM +0100, Daniel Morris wrote:
On Wed, Sep 28, 2022 at 12:14:07AM +0200, Michal Suchánek wrote:
And that's the thing: for vast majority of code being able to run it a few % faster makes absolutely no difference for vast majority of use cases.
And for a lot of code it would be great to run it faster but the bottleneck is not the CPU so the new instructions don't really help.
Whilst not wanting to dead-end lots of very serviceable hardware, should be careful about not exploiting newer features too. Findings presented at a power conference by Intel a few years ago (seen similar from others too) was the benefit of doing things as quickly as possible, then diving into a low-power mode and/or shutting down subsystems of the chip.
One might think "who cares about power-saving? This system's not a laptop/embedded/sensor etc", but inefficient work gets turned into heat, which then requires further management - one of our colos has charged us per U (rackspace), per GiB (bandwidth) and per W (thermal cost of aircon) for two decades; which was an unusual pricing structure at the outset. Seemingly marginal gains do add up over time.
Full disclosure, our company collaborated on further research to instrument a system and see if power optimisations could be written into gcc/a compiler. Initially it was an immense project, similar to "extracting a decagram of myelin from 4 tons of earth worms".
And do you have some data that you can share that shows which optimizarions save how much power for which workloads? That's exactly the thing I am missing here: some data showing the benefits of axing that hardware. Because the same applies to power savings as applies to time savings - optimizing code you do not run at all or very rarely does not save you time nor power. So far we only got some benchmark showing that compiling random pieces of code for x86-64-v3 runs mostly faster, and compiling some random pieces of code for x86-64-v2 runs sometimes faster and sometimes slower. Whith no summary so it is not clear how often it runs faster, and how often it runs slower. And it is not clear how often you will encounter such pieces of code in real world workloads, either. Thanks Michal