ILL as gcc or code optimiser have changed instruction. With that change code works faster on compatible CPUs by using AVX, avoiding penalties for hopping between ordinary and VEX coding scheme https://en.wikipedia.org/wiki/VEX_prefix