![](https://seccdn.libravatar.org/avatar/eb9f93fa252f97a3d17d437ff9aa9f35.jpg?s=120&d=mm&r=g)
On Thu, Sep 29, 2022 at 10:14:59AM +0100, Daniel Morris wrote:
On Wed, Sep 28, 2022 at 09:12:43PM +0200, Michal Suchánek wrote:
Full disclosure, our company collaborated on further research to instrument a system and see if power optimisations could be written into gcc/a compiler. Initially it was an immense project, similar to "extracting a decagram of myelin from 4 tons of earth worms".
And do you have some data that you can share that shows which optimizarions save how much power for which workloads?
Sadly I don't think anything is as clear cut as that, but here's a link to Bristol Uni's research (courtesy of Embecosm stumping up the open access publication fee):
That's an interesting paper, thanks. It's mostly focused on embedded systems, and mostly ARM architecture, and is not exactly recent. However, there are a few points that are clear. It is pretty much impossible to predict how optimization techniques affect performance without doing the actual measurement. For decisions arguing about performance solid data is a must, otherwise it's nonsense. Short execution time correlates with less power consumption in general but there is a case when it does not. On Cartex-A8 cores enabling NEON instruction generation switches from using the general purpose arithmetic unit to the SIMD unit which does not save time but is more energy efficient on this CPU for the tested benchmarks. This is interesting in multiple ways. The benchmarks used are mostly number crunching, and on this particular CPU it is difficult to move data between the general purpose arithmetic unit and the SIMD unit so you get to use one or the other. And the SIMD unit is not faster but is more energy efficient. If you had an algorithm that was easier to parallelize you could use both units but with the code in question GCC could not pull it off. But it can be seen in a different way, too. This NEON unit is an accelerator which you can use to offload computation in a more efficient way, and for many common computations accelerators are becoming available, either inside the CPU core itself or as a separate device. This makes the performance of the code running on the general purpose CPU less important over time, and this chase for more CPU features enabled for all code less rewarding. Thanks Michal