Am 07.08.22 um 00:30 schrieb Aaron Puchert:
Am 05.08.22 um 11:25 schrieb Richard Biener:
I'll note that while some specialized libraries come with separate code paths for ISAs they benefit from and appropriately dispatch via CPUid most developers are simply too lazy to even think about that. You may say that performance may not matter in most cases but performance often also translates to less energy use which for mobile uses might be even more important than performance (but that's even more difficult to measure, of course).
Totally with you on the energy consumption, but I have to slightly disagree about how usual such dispatching is. Lots of software that benefits most from vectorization already does something like that, most notably anything in the "multimedia" arena, like audio/video codecs, filters, and so on. Same goes for cryptography.
It's hard to see exactly how widespread it is, but even with lots of libraries missing .symtab I find a few on my system:
$ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse2 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libfftw3f.so.3 /usr/lib64/libfftw3.so.3 /usr/lib64/libmpeg2.so.0 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libnettle.so.8 /usr/lib64/libsodium.so.23 /usr/lib64/libSvtAv1Enc.so.1 /usr/lib64/libvidstab.so.1.1 /usr/lib64/libvisual-0.4.so.0 /usr/lib64/libwebrtc_audio_processing.so.1 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse3 >/dev/null && echo $lib; done /usr/lib64/libsodium.so.23 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _ssse3 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libQt5Gui.so.5 /usr/lib64/libsodium.so.23 /usr/lib64/libSvtAv1Enc.so.1 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _sse4 >/dev/null && echo $lib; done /usr/lib64/libc.so.6 /usr/lib64/libde265.so.0 /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libsodium.so.23 /usr/lib64/libx265.so.199 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _avx >/dev/null && echo $lib; done /usr/lib64/ld-linux-x86-64.so.2 /usr/lib64/ld-lsb-x86-64.so.3 /usr/lib64/libcblas.so.3 /usr/lib64/libc.so.6 /usr/lib64/libfftw3.so.3 /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/liblapacke.so.3 /usr/lib64/libm.so.6 /usr/lib64/libmvec.so.1 /usr/lib64/libopenblas_pthreads.so.0 /usr/lib64/libopenblas.so.0 /usr/lib64/libsodium.so.23 /usr/lib64/libvmaf.so.1 $ for lib in /usr/lib64/*.so.{[0-9],[0-9][0-9],[0-9][0-9][0-9]}; do readelf -a --wide $lib | grep _popcnt >/dev/null && echo $lib; done /usr/lib64/libjavascriptcoregtk-4.0.so.18 /usr/lib64/libzvbi.so.0
Of course to be fair these are specialized libraries in some sense, but as I've argued in another thread, maybe the gap between SSE2 and SSE4.2 only helps in special circumstances. Certainly these code paths are small relative to the overall size of /usr/lib64, but they likely make up a considerable portion of a typical user's run time. (What the grepping didn't find are libraries like libavcodec.so or libx264.so with AVX instructions, and I presume most video codecs use some kind of acceleration.) When I'm watching a video on Firefox (with the default media.ffmpeg.vaapi.enabled = false) and run "perf record" on the RDD process, I see lots of AVX instructions. Which is to say: yes, newer instructions aren't widely used, but they're widely used where it matters. So at least I don't have the feeling that my new stuff (if you can count AVX as new) is not being used properly. Aaron