Hello SuSE folks, Today I wasn't able to login on my remote server over ssh, although the server was pingable. After about ten minutes I was able to login and I found these records in /var/log/warn Feb 18 16:13:50 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:13:50 box1 kernel: dpti0: Bus reset success. Feb 18 16:19:00 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:19:00 box1 kernel: dpti0: Bus reset success. Feb 18 16:23:21 box1 kernel: SFW2-OUT-ERROR IN= OUT=eth0 SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=52 TOS=0x10 PREC=0x 00 TTL=64 ID=2147 DF PROTO=TCP SPT=22 DPT=44630 WINDOW=2088 RES=0x00 ACK FIN URGP=0 OPT (0101080A36F612A8927EDD69) Up until now the server runs great under SuSE9.2 distro. It has Adaptec I2O add on SCSI raid controller and six drives configured as a RAID5. Could somebody tell me please if this was a sign of potential hardware failure or it was caused by some other network or firewall interruption (judging by the last record)? Many thanks in advance, Alex
On Saturday 19 February 2005 00:20, Alex Daniloff wrote:
Hello SuSE folks, Today I wasn't able to login on my remote server over ssh, although the server was pingable. After about ten minutes I was able to login and I found these records in /var/log/warn Feb 18 16:13:50 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:13:50 box1 kernel: dpti0: Bus reset success. Feb 18 16:19:00 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:19:00 box1 kernel: dpti0: Bus reset success. Feb 18 16:23:21 box1 kernel: SFW2-OUT-ERROR IN= OUT=eth0 SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=52 TOS=0x10 PREC=0x 00 TTL=64 ID=2147 DF PROTO=TCP SPT=22 DPT=44630 WINDOW=2088 RES=0x00 ACK FIN URGP=0 OPT (0101080A36F612A8927EDD69)
Were there any interesting messages in /var/log/messages at this time or just before it? Unless you did something to change the configuration, this looks like a hardware problem. The firewall message is probably the server trying to respond to a RELATED packet on a broken connection that had been reset. Jeff
Le 19 Février 2005 17:56, Jeffrey Laramie a écrit :
On Saturday 19 February 2005 00:20, Alex Daniloff wrote:
Hello SuSE folks, Today I wasn't able to login on my remote server over ssh, although the server was pingable. After about ten minutes I was able to login and I found these records in /var/log/warn Feb 18 16:13:50 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:13:50 box1 kernel: dpti0: Bus reset success. Feb 18 16:19:00 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:19:00 box1 kernel: dpti0: Bus reset success. Feb 18 16:23:21 box1 kernel: SFW2-OUT-ERROR IN= OUT=eth0 SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=52 TOS=0x10 PREC=0x 00 TTL=64 ID=2147 DF PROTO=TCP SPT=22 DPT=44630 WINDOW=2088 RES=0x00 ACK FIN URGP=0 OPT (0101080A36F612A8927EDD69)
what release of suse do you use? what is your scsi card?
I'm using SuSE9.2 The SCSI controller is RAID bus controller: Adaptec (formerly DPT) SmartRAID V Controller (rev 01) Vendor: Adaptec Model: 2000S FW:380E ADAPTEC RAID-5 Rev: 380EDPTM On Saturday 19 February 2005 03:18 pm, Marc Collin wrote:
Le 19 Février 2005 17:56, Jeffrey Laramie a écrit :
On Saturday 19 February 2005 00:20, Alex Daniloff wrote:
Hello SuSE folks, Today I wasn't able to login on my remote server over ssh, although the server was pingable. After about ten minutes I was able to login and I found these records in /var/log/warn Feb 18 16:13:50 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:13:50 box1 kernel: dpti0: Bus reset success. Feb 18 16:19:00 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:19:00 box1 kernel: dpti0: Bus reset success. Feb 18 16:23:21 box1 kernel: SFW2-OUT-ERROR IN= OUT=eth0 SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=52 TOS=0x10 PREC=0x 00 TTL=64 ID=2147 DF PROTO=TCP SPT=22 DPT=44630 WINDOW=2088 RES=0x00 ACK FIN URGP=0 OPT (0101080A36F612A8927EDD69)
what release of suse do you use? what is your scsi card?
On Sat, 19 Feb 2005 19:41:03 -0800, Alex Daniloff <alex@daniloff.com> wrote:
I'm using SuSE9.2 The SCSI controller is RAID bus controller: Adaptec (formerly DPT) SmartRAID V Controller (rev 01) Vendor: Adaptec Model: 2000S FW:380E ADAPTEC RAID-5 Rev: 380EDPTM
Are you using the aacraid module for this card? I've got a 2200S and I've got similar issues with 9.1. I've tried booting from the 9.2 disks and get the scsi resets and then a kernel panic. I even tried the beta for RHEL 4 and got the scsi reset and then a kernel panic. Went to Adaptec's site. Adaptec states that the latest firmware and drivers from their site work, but I haven't had any success. After going round and round with their support guys, they tried to say they didn't support 2.6 kernels until I sent them the url on their site that stated otherwise. Then they said they only supported the kernel that shipped on the disks - no kernel updates supported. I suspect that the aacraid module is having issues with the 2.6 kernels. I've googled it and have come across quite a few posts with people having problems with Adaptec RAID cards and various distros. The only combo I found that was stable using my 2200S was to use a 2.4 kernel distro with aacraid module. All the problems went away, but how long can you stay on 2.4 kernels and get vendor support? At some point, you move or you're on your own. My boss doesn't like that option. I've decided that I'm going to replace the Adaptec card with an LSI Logic MegaRAID card. I haven't had any problems with any of their cards yet (knock on wood). Guess I've got an expensive paper weight for my desk now. John
Check out this link, http://i2o.shadowconnect.com/, these guys are keeping up the i20 stuff. Basicly these adaptec/dpt cards were supported well until 9.0, where they worked but you could not recover the raid. Suse 9.1 no support at all. and with suse 9.2, spotty support. I've got a dozen of these cards/machines and I can get suse 9.2 to install with an out of the box driver...but suse will frezze up after a little while. I've moved to adaptec 2120 and things work OK....joe On Mon, 2005-02-21 at 21:57 -0500, John Scott wrote:
On Sat, 19 Feb 2005 19:41:03 -0800, Alex Daniloff <alex@daniloff.com> wrote:
I'm using SuSE9.2 The SCSI controller is RAID bus controller: Adaptec (formerly DPT) SmartRAID V Controller (rev 01) Vendor: Adaptec Model: 2000S FW:380E ADAPTEC RAID-5 Rev: 380EDPTM
Are you using the aacraid module for this card? I've got a 2200S and I've got similar issues with 9.1. I've tried booting from the 9.2 disks and get the scsi resets and then a kernel panic. I even tried the beta for RHEL 4 and got the scsi reset and then a kernel panic.
Went to Adaptec's site. Adaptec states that the latest firmware and drivers from their site work, but I haven't had any success. After going round and round with their support guys, they tried to say they didn't support 2.6 kernels until I sent them the url on their site that stated otherwise. Then they said they only supported the kernel that shipped on the disks - no kernel updates supported.
I suspect that the aacraid module is having issues with the 2.6 kernels. I've googled it and have come across quite a few posts with people having problems with Adaptec RAID cards and various distros.
The only combo I found that was stable using my 2200S was to use a 2.4 kernel distro with aacraid module. All the problems went away, but how long can you stay on 2.4 kernels and get vendor support? At some point, you move or you're on your own. My boss doesn't like that option.
I've decided that I'm going to replace the Adaptec card with an LSI Logic MegaRAID card. I haven't had any problems with any of their cards yet (knock on wood). Guess I've got an expensive paper weight for my desk now.
John
On Mon, 21 Feb 2005 21:57:07 -0500, John Scott wrote:
The only combo I found that was stable using my 2200S was to use a 2.4 kernel distro with aacraid module. All the problems went away, but how long can you stay on 2.4 kernels and get vendor support? At some point, you move or you're on your own. My boss doesn't like that option.
If that is your only driving issue, SLES 8 should be supported for 3 more years I think. I assume you can still buy it? Greg -- Greg Freemyer
I saw these in /var/log/messages Feb 19 13:45:23 box1 kernel: dpti0: Trying to Abort cmd=835603 Feb 19 13:45:23 box1 kernel: dpti0: Abort cmd not supported Feb 19 13:45:23 box1 kernel: dpti0: Trying to reset device Feb 19 13:45:23 box1 kernel: dpti0: Device reset not supported Feb 19 13:45:23 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 19 13:45:23 box1 kernel: dpti0: Bus reset success. On Saturday 19 February 2005 02:56 pm, Jeffrey Laramie wrote:
On Saturday 19 February 2005 00:20, Alex Daniloff wrote:
Hello SuSE folks, Today I wasn't able to login on my remote server over ssh, although the server was pingable. After about ten minutes I was able to login and I found these records in /var/log/warn Feb 18 16:13:50 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:13:50 box1 kernel: dpti0: Bus reset success. Feb 18 16:19:00 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 18 16:19:00 box1 kernel: dpti0: Bus reset success. Feb 18 16:23:21 box1 kernel: SFW2-OUT-ERROR IN= OUT=eth0 SRC=xxx.xxx.xxx.xxx DST=xxx.xxx.xxx.xxx LEN=52 TOS=0x10 PREC=0x 00 TTL=64 ID=2147 DF PROTO=TCP SPT=22 DPT=44630 WINDOW=2088 RES=0x00 ACK FIN URGP=0 OPT (0101080A36F612A8927EDD69)
Were there any interesting messages in /var/log/messages at this time or just before it? Unless you did something to change the configuration, this looks like a hardware problem. The firewall message is probably the server trying to respond to a RELATED packet on a broken connection that had been reset.
Jeff
On Saturday 19 February 2005 22:53, Alex Daniloff wrote:
I saw these in /var/log/messages Feb 19 13:45:23 box1 kernel: dpti0: Trying to Abort cmd=835603 Feb 19 13:45:23 box1 kernel: dpti0: Abort cmd not supported Feb 19 13:45:23 box1 kernel: dpti0: Trying to reset device Feb 19 13:45:23 box1 kernel: dpti0: Device reset not supported Feb 19 13:45:23 box1 kernel: dpti0: Bus reset: SCSI Bus 0: tid: 9 Feb 19 13:45:23 box1 kernel: dpti0: Bus reset success.
These are all kernel level messages which indicates either a kernel or hardware issue. If you don't get any answers here you might want to try to google on the first error message you have and see what turns up, or maybe try a kernel mailing list. Jeff
participants (6)
-
Alex Daniloff
-
Greg Freemyer
-
Jeffrey Laramie
-
Joe Brockert
-
John Scott
-
Marc Collin