Am 05.08.22 um 11:25 schrieb Richard Biener:
I'll note that while some specialized libraries come with separate code paths for ISAs they benefit from and appropriately dispatch via CPUid most developers are simply too lazy to even think about that. You may say that performance may not matter in most cases but performance often also translates to less energy use which for mobile uses might be even more important than performance (but that's even more difficult to measure, of course).
Totally with you on the energy consumption, but I have to slightly disagree about how usual such dispatching is. Lots of software that benefits most from vectorization already does something like that, most notably anything in the "multimedia" arena, like audio/video codecs, filters, and so on. Same goes for cryptography. It's hard to see exactly how widespread it is, but even with lots of libraries missing .symtab I find a few on my system: $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse2 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libfftw3f.so.3 /usr/lib64/libfftw3.so.3 /usr/lib64/libmpeg2.so.0 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libnettle.so.8 /usr/lib64/libsodium.so.23 /usr/lib64/libSvtAv1Enc.so.1 /usr/lib64/libvidstab.so.1.1 /usr/lib64/libvisual-0.4.so.0 /usr/lib64/libwebrtc_audio_processing.so.1 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse3 >/dev/null && echo $lib; done /usr/lib64/libsodium.so.23 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _ssse3 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libQt5Gui.so.5 /usr/lib64/libsodium.so.23 /usr/lib64/libSvtAv1Enc.so.1 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse4 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libde265.so.0 /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libsodium.so.23 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _avx >/dev/null && echo $lib; done /usr/lib64/ld-linux-x86-64.so.2 /usr/lib64/ld-lsb-x86-64.so.3 /usr/lib64/libcblas.so.3 /usr/lib64/libc.so.6 /usr/lib64/libfftw3.so.3 /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/liblapacke.so.3 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libopenblas_pthreads.so.0 /usr/lib64/libopenblas.so.0 /usr/lib64/libsodium.so.23 /usr/lib64/libvmaf.so.1 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _popcnt >/dev/null && echo $lib; done /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/libzvbi.so.0 Granted, scientific software tends to be an issue, perhaps because the implicit assumption is that people will build their own with -march=native. But if that's common enough I'd rather have those packages built in an additional version (which could even be x86_64-v3) than up the requirements across the board. But even there it's not unheard of, I mean look at the madness that is GMP: they have versions of their functions for at least a handful of different CPUs. That's not even just taking the instruction sets into account but often hand-tuned for a specific chip. Aaron