John Paul Adrian Glaubitz wrote:
On 12/9/22 17:26, Michal Suchánek wrote:
I understand what you mean now. But I'm not sure whether the x86-64 baseline with the limited set of registers on i386 will bring you any remarkable benefit. Hard to tell without measurements. Still SSE does give you extra registers so it should help especially for 32bit where the base registers are limited. Sure, some benefit will be there. But I doubt it's really measurable.
SSE and SSE2 are parts of x86 architecture. https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions https://en.wikipedia.org/wiki/SSE2 Pentium III gets ~70% speed-up with SSE for fp32-intensive calculations (AMD CPUs in that time gets ~100% because of slow FPU). Pentium 4 gets ~30% speed-up with SSE2. I hope that Tumbleweed 32-bit x86 is using i686 + SSE for compilation. IMHO it is possible to use i686 + SSE + SSE2 for code that will be used in 64-bit OSes. Hint for Tumbleweed x86 supporters: some code contains SSE2 instructions. Read more here: https://forums.opensuse.org/showthread.php/570815-Who-needs-i586-TW-for-CPUs... I hope to get x86-64-v3 Tumbleweed & Leap both in the nearest future. Suggestion for x86-64-v4: add AVX-512 instructions {VPOPCNTDQ, IFMA, VBMI, VBMI2, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES} that is used in Ice Lake, Tiger Lake, Rocket Lake, Zen4. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX-512