Pettini, Don wrote:
Kees,
It really depends what you want scenario you would like to benchmark. If you are looking at processor instruction attributes, there are benchmarks like linpack or SPEC which are a good reference for those in the scientific computing community. Others that want to look at total system performance look to TPC (Transaction Processing Council), www.tpc.org, to look at a bounded, real-world TP environment. This benchmark is far larger than you would probably consider but all the rules, documentation, procedure and audit requirements were necessary to removing the marketing hype from what is a quantitative exercise. If you look at TPC though, you can see they created a few variations to show different common workloads:
TPC-C is the OLTP workload TPC-H is an ad-hoc query workload TPC-R is a Report generation workload TPC-W web based transaction workload.
Remember that the folks who design these systems/processors/compilers have also had their eyes open to what benchmarks are out these and I remember in days of Digital's 64 bit Alpha chip, system cache size was would make x benchmark work best. You only have to take a look at the present Itanium 2 MP processor line with 3, 4, or 6MB L3 cache variants. Manufacturer compilers use to recognize code for standard benchmarks and inline hand optimized routines to speed their platform over another. That is why GNU is a great equalizer.
Hmmm .... do I infer correctly that GNU is 'a great equalizer' because it compiles everything DOWN to the same level ? I agree with the observation that compilers used to do essentially program specific optimizations (Linpack particularly), but most of these compilers do a MUCH better job in my 1st-hand experience with more generic code as well. I refer specifically to SGI (MIPSPro) FORTRAN compiler, as well as Intel's Linux FORTRAN compiler & Portland group's Linux FORTRAN compiler. All of these compile code that is as much as 70% faster than GNU in other-than-linpack benchmarks, & probably average 25-30 % faster.
Lastly, consider most people will by AMD or Intel based system, not processor. Operations that read and update utilize paths in and out of the processor that are tens, hundreds and evens thousands of times slower when doing an I/O. Some benchmarks can exploit HyperTransport's duplex advantage, while others minimize the advantages of NUMA versus non-NUMA architectures. Some specific benchmarks can look at cache hotspots in a multiprocessor environment and cache coherency issues (which limits the ability to scale in a linear fashion). These benchmarks will become much more interesting with the release of AMD and Intel multi-core chips.
BTW: If you would like to look at Opteron benchmarks, AMD has assembled the most a page of them at:
http://www.amd.com/usen/Processors/ProductInformation/0,,30_118_8796_880 0,00.html
Don