tg3 Drivers for Broadcom Network Cards on Dell 2650 and 6650 Servers.
Hello all, Has anyone had any recent issues with the tg3 driver on the hardware of either of these servers (all running various revisions of the 2.4 kernel). specifically with auto negotiation and crashes caused by it. Or is the bcm5700 driver still the better option? Alternatively am I better off ripping out the cards and sticking in something that uses the e1000 driver. Thanks, Ben
Ben Higginbottom wrote:
Hello all,
Has anyone had any recent issues with the tg3 driver on the hardware of either of these servers (all running various revisions of the 2.4 kernel). specifically with auto negotiation and crashes caused by it. Or is the bcm5700 driver still the better option?
Alternatively am I better off ripping out the cards and sticking in something that uses the e1000 driver.
Thanks,
Ben
I don't know those servers, but here is what I have on a X86_64 laptop that worked from 9.2 to 9.3 with various kernels (2.6) up to 2.6.12-rc5 at present. The bcm5700 driver experienced network stalls when I installed 9.2 initially, so I had to switch to tg3. You could try the alias for each module in turn. # lsmod|grep tg3 tg3 92612 0 0000:00:0c.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03) Regards Sid. -- Sid Boyce ... Hamradio License G3VBV, Keen licensed Private Pilot Retired IBM Mainframes and Sun Servers Tech Support Specialist Microsoft Windows Free Zone - Linux for all Computing Tasks
On 5/31/05, Sid Boyce <sboyce@blueyonder.co.uk> wrote:
I don't know those servers, but here is what I have on a X86_64 laptop that worked from 9.2 to 9.3 with various kernels (2.6) up to 2.6.12-rc5 at present. The bcm5700 driver experienced network stalls when I installed 9.2 initially, so I had to switch to tg3. You could try the alias for each module in turn. # lsmod|grep tg3 tg3 92612 0
0000:00:0c.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03)
Cheers Sid, Unfortunately its on a production server that doesn't have a test environment, so playing about is out of the question. But knowing that the 5700 is causing network stalls certainly doesn't encourage me to start using it. Thanks, Ben
On Tue, 31 May 2005, Sid Boyce wrote:
Ben Higginbottom wrote:
Hello all,
Has anyone had any recent issues with the tg3 driver on the hardware of either of these servers (all running various revisions of the 2.4 kernel). specifically with auto negotiation and crashes caused by it. Or is the bcm5700 driver still the better option?
Alternatively am I better off ripping out the cards and sticking in something that uses the e1000 driver.
Google is your friend here. The short version is that, while the cards themselves are actually pretty good, the company that owns the rights to the technology (Broadcom, formerly Altima), isn't considered exactly 'Linux friendly'. They sort-of made their driver, the bcm7500 or whatever, available but it's generally considered terrible. An Open Source effort, the tg3 ('Tigon III'), is available. It works OK but not great, with LOTS of caveats. For instance, I have two of these cards (The D-Link version). Hooked together, they completely weird out and I get 35KB/s through them. Put a hub or switch in between, and they are happy. Using the 2.4 kernel I could get 40-45MB/s through them, but they would oops often and have strange resets. In short, I've had nothing but problems with them. It's not the fault of the chipset or the good open source folks doing work on the tg3, you can lay /all/ of the blame here at Broadcom's door. Oh, and the driver won't or can't do jumbo frames, so you are limited to about 45MB/s. As much as I don't like Intel, at least they /kinda/ get the open source thing, and the driver(s) for their cards (the O/S versions) are very good these days. As for the tg3 on Dell equipment, like I said, Google is your friend here - nobody likes the tg3 on Dell, and most people have had problems with them, and neither Dell nor Broadcom are going to do anything about it. -- Carpe diem - Seize the day. Carp in denim - There's a fish in my pants! Jon Nelson <jnelson-suse@jamponi.net>
On 5/31/05, Jon Nelson <jnelson-suse@jamponi.net> wrote:
Google is your friend here.
As allways :-)
The short version is that, while the cards themselves are actually pretty good, the company that owns the rights to the technology (Broadcom, formerly Altima), isn't considered exactly 'Linux friendly'. They sort-of made their driver, the bcm7500 or whatever, available but it's generally considered terrible. An Open Source effort, the tg3 ('Tigon III'), is available. It works OK but not great, with LOTS of caveats.
Certainly, but I need the cards to just work and stop locking up in the early hours of the morning, if the 5700 does that, then thats the one that goes in.
As for the tg3 on Dell equipment, like I said, Google is your friend here - nobody likes the tg3 on Dell, and most people have had problems with them, and neither Dell nor Broadcom are going to do anything about it.
Hell, personally I dont like Dell equipment, but you've got to work with what you've got. Regards, Ben
On Wed, 1 Jun 2005, Ben Higginbottom wrote:
On 5/31/05, Jon Nelson <jnelson-suse@jamponi.net> wrote:
Google is your friend here.
As allways :-)
The short version is that, while the cards themselves are actually pretty good, the company that owns the rights to the technology (Broadcom, formerly Altima), isn't considered exactly 'Linux friendly'. They sort-of made their driver, the bcm7500 or whatever, available but it's generally considered terrible. An Open Source effort, the tg3 ('Tigon III'), is available. It works OK but not great, with LOTS of caveats.
Certainly, but I need the cards to just work and stop locking up in the early hours of the morning, if the 5700 does that, then thats the one that goes in.
I think you misunderstood me - use the tg3 driver, no doubt. If it locks up with that driver, contact the authors, they may be able to help you. I wouldn't use the bcm driver, no way. -- Carpe diem - Seize the day. Carp in denim - There's a fish in my pants! Jon Nelson <jnelson-suse@jamponi.net>
On 6/1/05, Jon Nelson <jnelson-suse@jamponi.net> wrote:
I think you misunderstood me - use the tg3 driver, no doubt. If it locks up with that driver, contact the authors, they may be able to help you. I wouldn't use the bcm driver, no way.
Its way too early to escalate this to the devs. What I was trying to ascertain was if the tg3 driver had improved significantly in later revisions, specifically that the devs had managed to solve the issues with the broadcom cards. Clearly they have, if they hadn't then I would have had to see if the 5700 would have solved the problem, irrelevant of poorer performance or any ideological concerns as the bug is acting as a show stopper. Regards, Ben
On Tue, 31 May 2005 10:29 pm, Ben Higginbottom wrote:
Has anyone had any recent issues with the tg3 driver on the hardware of either of these servers (all running various revisions of the 2.4 kernel). specifically with auto negotiation and crashes caused by it. Or is the bcm5700 driver still the better option?
I've still got a cluster of 66 machines using the tg3 drivers without problems. All versions of Suse Pro from 8.1 on.
Alternatively am I better off ripping out the cards and sticking in something that uses the e1000 driver.
On the PE2650s it's on the motherboard. but when Dell sold me Dual Intel cards for AU$150 each I happily walked away from the 2650 broadcoms, although they hadn't caused me any real grief. I read about lots of people (mainly redhat users) having lots of grief on the Dell list, that seems to have died down though. FWIW, michaelj -- Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166 Internet Explorer is fine for downloading Firefox, but after that....
On 6/1/05, Michael James <Michael.James@csiro.au> wrote:
I've still got a cluster of 66 machines using the tg3 drivers without problems. All versions of Suse Pro from 8.1 on.
I take it thats with the last 2.4 kernel that they released for 8.1, or have you manually patched the kernel?
On the PE2650s it's on the motherboard.
Just makes it slightly more difficult :)
but when Dell sold me Dual Intel cards for AU$150 each I happily walked away from the 2650 broadcoms, although they hadn't caused me any real grief.
This would be my prefered option, but the sticking point is getting the bosses to spend the money and schedule downtime.
I read about lots of people (mainly redhat users) having lots of grief on the Dell list, that seems to have died down though.
Yes, unfortunately some of the servers run an un-updated Deadrat AS3, really unfortunately for me is that they host the vital Oracle DB's. It suggests that a full update should cure alot of the problems. Cheers, Ben
On Wed, 1 Jun 2005 09:56 pm, Ben Higginbottom wrote:
On 6/1/05, Michael James <Michael.James@csiro.au> wrote:
I've still got a cluster of 66 machines using the tg3 drivers without problems. All versions of Suse Pro from 8.1 on.
I take it thats with the last 2.4 kernel that they released for 8.1, or have you manually patched the kernel? All stock Suse kernels and drivers.
cat /etc/SuSE-release /proc/version: gives SuSE Linux 8.1 (i386) VERSION = 8.1 Linux version 2.4.21-251-smp This machine is my obsolete cross to bear (no one else to blame) I just have to do the work of porting all services to SLES9. It's not on the internet... ;^| and SuSE Linux 8.2 (i586) VERSION = 8.2 Linux version 2.4.20-64GB-SMP and SuSE Linux 9.1 (i586) VERSION = 9.1 Linux version 2.6.5-7.108-bigsmp (the nodes aren't patched much) and SuSE Linux 9.1 (i586) VERSION = 9.1 Linux version 2.6.5-7.151-bigsmp and SuSE Linux 9.3 (i586) VERSION = 9.3 Linux version 2.6.11.4-20a-bigsmp (the new model node) and SUSE LINUX Enterprise Server 9 (i586) VERSION = 9 Linux version 2.6.5-7.151-bigsmp
I read about lots of people (mainly redhat users) having lots of grief on the Dell list, that seems to have died down though.
Yes, unfortunately some of the servers run an un-updated Deadrat AS3, really unfortunately for me is that they host the vital Oracle DB's. It suggests that a full update should cure alot of the problems.
We've got oracle 10g running a treat on SLES9. I have the luxury of a development server, so I can run up the new server, load the app, test it, then swing it into production with very little downtime. As soon as the new server is accepted as stable, the old one gets re-cycled as the development machine. I think you have already answered your own question here:
This would be my prefered option, but the sticking point is getting the bosses to spend the money and schedule downtime. Hit them with a report they can't ignore, new cards IS the easy cheap option.
michaelj -- Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166 Internet Explorer is fine for downloading Firefox, but after that....
You may wish to see if there is a module, tg3-new installed on your system. I can't remember what system I had seen it on, but I do know it was on a machine I was testing at work and it worked rather well on IBM xServer and eServer series hardware and Broadcom 10/100/1000 cards on-board and PCI cards. RK Davies On 6/1/05, Michael James <Michael.James@csiro.au> wrote:
On Wed, 1 Jun 2005 09:56 pm, Ben Higginbottom wrote:
On 6/1/05, Michael James <Michael.James@csiro.au> wrote:
I've still got a cluster of 66 machines using the tg3 drivers without problems. All versions of Suse Pro from 8.1 on.
I take it thats with the last 2.4 kernel that they released for 8.1, or have you manually patched the kernel? All stock Suse kernels and drivers.
cat /etc/SuSE-release /proc/version:
gives SuSE Linux 8.1 (i386) VERSION = 8.1 Linux version 2.4.21-251-smp This machine is my obsolete cross to bear (no one else to blame) I just have to do the work of porting all services to SLES9. It's not on the internet... ;^|
and SuSE Linux 8.2 (i586) VERSION = 8.2 Linux version 2.4.20-64GB-SMP
and SuSE Linux 9.1 (i586) VERSION = 9.1 Linux version 2.6.5-7.108-bigsmp (the nodes aren't patched much)
and SuSE Linux 9.1 (i586) VERSION = 9.1 Linux version 2.6.5-7.151-bigsmp
and SuSE Linux 9.3 (i586) VERSION = 9.3 Linux version 2.6.11.4-20a-bigsmp (the new model node)
and SUSE LINUX Enterprise Server 9 (i586) VERSION = 9 Linux version 2.6.5-7.151-bigsmp
I read about lots of people (mainly redhat users) having lots of grief on the Dell list, that seems to have died down though.
Yes, unfortunately some of the servers run an un-updated Deadrat AS3, really unfortunately for me is that they host the vital Oracle DB's. It suggests that a full update should cure alot of the problems.
We've got oracle 10g running a treat on SLES9.
I have the luxury of a development server, so I can run up the new server, load the app, test it, then swing it into production with very little downtime. As soon as the new server is accepted as stable, the old one gets re-cycled as the development machine.
I think you have already answered your own question here:
This would be my prefered option, but the sticking point is getting the bosses to spend the money and schedule downtime. Hit them with a report they can't ignore, new cards IS the easy cheap option.
michaelj
-- Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166
Internet Explorer is fine for downloading Firefox, but after that....
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
On 6/2/05, Michael James <Michael.James@csiro.au> wrote:
cat /etc/SuSE-release /proc/version:
gives SuSE Linux 8.1 (i386) VERSION = 8.1 Linux version 2.4.21-251-smp This machine is my obsolete cross to bear (no one else to blame) I just have to do the work of porting all services to SLES9. It's not on the internet... ;^|
Up until two months ago I was nursing a 7.3 server that was, and bosses that thought a firewall was a security device. But I am not in the bad place anymore :)
We've got oracle 10g running a treat on SLES9.
/me sends daggers of envy I have a real mismash, including my predecessors personal distro rolled off the source of RH 4. If I have any more surprises I'm thinking of sending his .bash_history to his new employers :-)
I have the luxury of a development server, so I can run up the new server, load the app, test it, then swing it into production with very little downtime. As soon as the new server is accepted as stable, the old one gets re-cycled as the development machine.
I'm pushing for something like this, although personally I'm after a very high spec server that I can then stick vmware on and do destructive testing. At least I can dream.
Hit them with a report they can't ignore, new cards IS the easy cheap option.
Your right. Any recommendations for current hardware? Cheers, Ben
On Thu, 2 Jun 2005 09:49 pm, Ben Higginbottom wrote:
On 6/2/05, Michael James <Michael.James@csiro.au> wrote:
I have the luxury of a development server, so I can run up the new server, load the app, test it, then swing it into production with very little downtime. As soon as the new server is accepted as stable, the old one gets re-cycled as the development machine.
I'm pushing for something like this, although personally I'm after a very high spec server that I can then stick vmware on and do destructive testing. At least I can dream.
Hmmm interesting, the only time I've thought of using vmware in that context is playing with virus infection on a windows guest. Or do you just want to see how far rm -rf / can go? Andrew Tridgel told an amusing story at the Linux conf of a bad interaction of unicode and command line feeding a bare "/" into a string to be recursively deleted. Ran fine till it started to clean out /proc Astute readers will note that "p" comes after "h", Ouch. I like having multiple identical hardware. Enough machines to do the job and at least 1 with no user-visible services. Apart from being good for testing, it means that I can pull a working system out of anything but a total computer room meltdown. Yes we have maintenance but sometimes it takes longer than you can afford to be down. I had a disk array fail, fortunately for me it happened just after I'd taken delivery of a couple of new servers. I swiped enough disks to put together a duplicate array, rsync nursed the home dirs across, and the installation was running normally when I rang Dell to sort out the problem hardware Monday morning. (big weekend) As it turned out we lost the lot on the old array. Dell replaced everything, controller, cables, disks. I don't know (or care) how much was necessary, how much was apology. It enabled me to walk away from the problem.
Hit them with a report they can't ignore, new cards IS the easy cheap option.
Your right. Any recommendations for current hardware?
For network cards: Intel, always intel. (Pure prejudice but it's served me well) Dual e1000 MT? Must be a server card. The e1000 workstation card is limited to 250 Meg by the narrow PCI bus. To Work, michaelj -- Michael James michael.james@csiro.au System Administrator voice: 02 6246 5040 CSIRO Bioinformatics Facility fax: 02 6246 5166 Internet Explorer is fine for downloading Firefox, but after that....
On 6/3/05, Michael James <Michael.James@csiro.au> wrote:
Hmmm interesting, the only time I've thought of using vmware in that context is playing with virus infection on a windows guest. Or do you just want to see how far rm -rf / can go? Andrew Tridgel told an amusing story at the Linux conf of a bad interaction of unicode and command line feeding a bare "/" into a string to be recursively deleted. Ran fine till it started to clean out /proc Astute readers will note that "p" comes after "h", Ouch.
Oh I've done that in the past (deliberately I want to make absolutely clear) on my laptop which was at the time running 9.0 curiosity finally got the better of me and I wanted to do a clean rebuild of 9.1; it crashed out when it tried to delete rm. Amazingly the system remained up, although I couldnt really do anything given that bash had bitten the dust :-) VMware will be for a mixture of things, keeping the bosses happy by kinda running windows and being part of AD and so on. But my primary reason is to be able to simulate our production enviroment and have a rapid rollback if anything goes wrong with the snapshot facility. Autoyast and kickstart are fine, but they still require me to do something other than sitting down :-)
I like having multiple identical hardware. Enough machines to do the job and at least 1 with no user-visible services. Apart from being good for testing, it means that I can pull a working system out of anything but a total computer room meltdown. Yes we have maintenance but sometimes it takes longer than you can afford to be down. I had a disk array fail, fortunately for me it happened just after I'd taken delivery of a couple of new servers. I swiped enough disks to put together a duplicate array, rsync nursed the home dirs across, and the installation was running normally when I rang Dell to sort out the problem hardware Monday morning. (big weekend) As it turned out we lost the lot on the old array. Dell replaced everything, controller, cables, disks. I don't know (or care) how much was necessary, how much was apology. It enabled me to walk away from the problem.
That would be an ideal situation, unfortunatley the app thats running on them needs to be compiled for each server due to the 'interesting' use of hostnames. Shades of solaris have gone dancing through my head in the last few weeks.
For network cards: Intel, always intel. (Pure prejudice but it's served me well)
The same I've been looking at :-) My previous supplier (DNUK) always used them, however my new job is a HP shop. Its nice kit, but I just dont think it has much of a future. Thanks for the help. Best, Ben
participants (5)
-
Ben Higginbottom
-
Jon Nelson
-
Michael James
-
Robert Davies
-
Sid Boyce