Mailinglist Archive: opensuse-amd64 (274 mails)

< Previous Next >
RE: [suse-amd64] Opteron Board preference ....
  • From: "Alan Gray" <alan.gray@xxxxxxx>
  • Date: Thu, 8 Apr 2004 02:58:11 +0000 (UTC)
  • Message-id: <C25A01AC472E4041910A7897FC500A9651F232@xxxxxxxxxxxxxx>
Very informative !!!

Thanks
Alan

> -----Original Message-----
> From: ascotti@xxxxxxxxxxxxx [mailto:ascotti@xxxxxxxxxxxxx]
> Sent: Thursday, 8 April 2004 12:16 PM
> To: William A. Mahaffey III; suse-amd64@xxxxxxxx
> Subject: Re: [suse-amd64] Opteron Board preference ....
>
>
>
>
> --On Wednesday, April 07, 2004 7:00 PM -0500 "William A.
> Mahaffey III"
> <wam@xxxxxxxxxx> wrote:
>
> > Darrell Shively wrote:
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Hi William:
> >>
> >> On Wednesday 07 April 2004 05:54, William A. Mahaffey III wrote:
> >>
> >>
> >>> [...]
> >>>
> >>> Hmmmm .... I thought the CPUs talked to eachother (at
> least the 200 &
> >>> 800 series) through high speed busses & could shuttle data between
> >>> eachother as fast as direct memory access (except for
> some small latency
> >>> to start the proceedings), no ?
> >>>
> >>>
> >>
> >> Turns out no. The Hyptertransport connection between the
> processors
> >> *is* very fast but not as fast as each processors' 128+
> bit wide memory
> >> bus. This is why the processor affinity feature of NUMA
> kernels is
> >> important; it tries to keep a process on the processor whose RAM
> >> contains its data.
> >>
> >>
> >>
> >>> I had been leaning toward some of the
> >>> balanced MP boards (TYAN S2882, Arima HDAMA) on that count.
> >>>
> >>>
> >>
> >> It depends on your needs. A second processor can be
> useful even if it's
> >> memory access is via a hypertransport link. It depends on
> what sort of
> >> jobs you are running - if stuff fits mostly in the 2nd processors'
> >> cache then there is happiness.
> >>
> >> Regards,
> >> - Darrell
> >> - --
> >> sused@xxxxxxxxx "Perfect! ....what am I doing?"
> >> -- Washu
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.0.7 (GNU/Linux)
> >>
> >> iD8DBQFAdB5Veo6c0kw6mZ0RAkurAKCiJKUfv8aSxeUjfS5hF9D4WfNf3wCgt6B1
> >> KiTOydpVERJIfX8PLiCywCU=
> >> =nCUl
> >> -----END PGP SIGNATURE-----
> >>
> >>
> >>
> > Actually most of the stuff I run would be large jobs requiring a
> > significant fraction of available RAM, too big to fit into cache. I
> > thought the actual data speed of the hyper-transport bus
> (6.4 GB/s) was
> > similar to the memory bus (6.4 GB/s using PC3200 RAM, 5.3 GB/s using
> > PC2700 RAM), although by different means (64 bit
> dual-channel DDR bus at
> > either 166 MHz or 200 MHz for the RAM, 16 bit DDR at 1600
> MHz for the
> > hyper-transport bus). I would also be interested in knowing
> how SMP is
> > working .... just to help keep the already busy thread going :-).
>
> I can relate my experience with a quad opteron. Using the
> STREAM benchmark,
> which calculates bandwidth by measuring the time required to
> perform simple
> operations on large arrays (where the bottleneck is the
> streaming speed), I
> get about 2 GB/s on a single CPU, (PC2700 ->333MHz*16 5.3
> GB/s, 1.4 GHz CPU
> speed). Under a NUMA kernel (e.g. 2.4.21-207, SLES SP3), the
> collective
> bandwidth gets as high as 7.5 GB/s, running on 4 CPUS,
> whereas running a
> non NUMA kernel (2.4.24) it rarely exceeds 3 GB/s. I have
> observed similar
> trends with different codes, all constrained by the memory
> bandwidth. The
> bottom line is
> 1) The theoretical Bandwidth is an ephemeral goal. Compilers
> just are not
> that tuned. I can get closer if I start writing my own
> assembler code, but
> that takes time..
> 2) NUMA makes a big difference.
> 3) Using standard MPICH benchmarks, CPU X to CPU Y bandwidth
> is of the
> order of 750 MB/s, for large packets. Why so low, I do not
> know. It could
> be related to the compiler used to compile the MPICH library
> (pgcc with
> -fastsse -Mvect=prefetch, v 5.1, 64 bit). I have read
> elsewhere that pgcc
> does not produce good code (I think was for the ATLAS
> libraries), but that
> might have to do with the way ATLAS routines are written.
> Actually, it
> would be interesting to hear from other people on the best
> combination of
> compiler/switches. On a related note, I have not noticed much
> difference
> between compiling in 32 or 64 bit.
>
> Having said that, my BIOS (quartet motherboard, manifactured
> by Celestica)
> allows to distribute memory addresses in a roundrobin
> fashion among CPUs.
> When this option is enabled, a large array is spread
> uniformly among CPUs.
> While this is bad running NUMA, it could improve the performance with
> traditional SMP kernels, but I have not tried that.
>
> Overall, I am quite happy with the machine. Based on runs of
> a CFD code
> (parallel), each Opteron CPU is equivalent to 2.5 Athlon cpus
> at the same
> frequency. It took almost a month to put all the pieces of
> software and
> firmware to work together, but it was worth the hassle.
>
> Alberto Scotti
>
>
> --
> Check the List-Unsubscribe header to unsubscribe
> For additional commands, email: suse-amd64-help@xxxxxxxx
>
>

< Previous Next >