On Sun, 03 Jun 2007 11:03:34 +0100
G T Smith
Some assembler code is truly ugly (using 100s of NOPS for timing is one example I can think off), where timing or speed is essential one will nearly always get a better result with well written assembler (it often is not pretty to look at at)..
Just a general comment on this since I was one of the authors of the
Unix/Windows NT assembler for the Alpha chip. (This is a bit of a
generalization) Alpha chip could execute multiple simultaneous streams
(2 or 4 depending on the chip version), but the instructions had to be
ordered properly for this to happen. We used a "scheduler" as an
optimization as the last pass of the assembler. In some cases, using
NOPs had a positive performance effect. Additionally, in the Intel
Itanium chip, they pack 6 instructions together, and properly placed
NOPs help performance.
--
Jerry Feldman