[opensuse] Failed disk? Which one?
Hello, openSUSE experts, recently I have got unpleasant news from logs: end_device-0:10: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 [...] end_device-0:63: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 phy-0:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff880272981c00) port-0:10: mptsas: ioc0: delete port 10, sas_addr (0x922210b547a6049) [...] port-0:63: mptsas: ioc0: delete port 63, sas_addr (0x922210b547a6049) audit: audit_backlog=326 > audit_backlog_limit=320 audit: audit_backlog=327 > audit_backlog_limit=320 audit: audit_lost=189 audit_rate_limit=0 audit_backlog_limit=320 [...] audit: audit_lost=423 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit_log_start: 32 callbacks suppressed audit_log_start: 671 callbacks suppressed mptbase: ioc0: PhysDisk is now failed, out of sync mptbase: ioc0: PhysDisk is now missing, out of sync mptbase: ioc0: PhysDisk is now online, out of sync mptbase: ioc0: SMART data received, ASC/ASCQ = 5dh/10h mptbase: ioc0: volume is now degraded, enabled mptbase: ioc0: volume is now degraded, enabled, resync in progress mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0 id=4 mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 scsi 0:0:11:0: Attached scsi generic sg3 type 0 scsi 0:0:11:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 [...] scsi 0:0:64:0: Attached scsi generic sg3 type 0 scsi 0:0:64:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 scsi target0:0:10: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 [...] scsi target0:0:63: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 Once, I had to hard reset it because of kernel panic during RAID syncing error. It is tower server IBM X3200 M3 with 4 x 500 GB hot-swap HDD http://www-03.ibm.com/systems/x/hardware/tower/x3200m3/ It serves as LAMP and SFTP. I used built-in SAS utility to create HW RAID 10 which openSUSE 12.2 sees as one disk. If it is important, it has UEFI, but I use legacy BIOS mode (and GRUB 1). I don't have so much experience with such HW, so I rather ask before taking action. It is still running. ;-) So am I correct, that one disc is dying? How to recognize which one? And can I really just open running machine the case, exchange HDD and wait little bit? ;-) Have a nice day, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
El 01/04/13 11:37, Vojtěch Zeisek escribió:
Hello, openSUSE experts, recently I have got unpleasant news from logs:
end_device-0:10: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 [...] end_device-0:63: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 phy-0:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff880272981c00) port-0:10: mptsas: ioc0: delete port 10, sas_addr (0x922210b547a6049) [...] port-0:63: mptsas: ioc0: delete port 63, sas_addr (0x922210b547a6049) audit: audit_backlog=326 > audit_backlog_limit=320 audit: audit_backlog=327 > audit_backlog_limit=320 audit: audit_lost=189 audit_rate_limit=0 audit_backlog_limit=320 [...] audit: audit_lost=423 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit_log_start: 32 callbacks suppressed audit_log_start: 671 callbacks suppressed mptbase: ioc0: PhysDisk is now failed, out of sync mptbase: ioc0: PhysDisk is now missing, out of sync mptbase: ioc0: PhysDisk is now online, out of sync mptbase: ioc0: SMART data received, ASC/ASCQ = 5dh/10h
OK, use smartctl to access the smart data information , among other things it will display the hard disk serial number, then you can figure out which of the disks is dying by inspecting the hardware and comparing the serial numbers. If smartd does not work with this controller, try the (usually awful) tools provided by the controller's manufacturer. ps: I suggest inmediate action ;P -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. dubna 2013 13:28:26, Cristian Rodríguez napsal(a):
El 01/04/13 11:37, Vojtěch Zeisek escribió:
Hello, openSUSE experts, recently I have got unpleasant news from logs:
end_device-0:10: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 [...] end_device-0:63: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 phy-0:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff880272981c00) port-0:10: mptsas: ioc0: delete port 10, sas_addr (0x922210b547a6049) [...] port-0:63: mptsas: ioc0: delete port 63, sas_addr (0x922210b547a6049) audit: audit_backlog=326 > audit_backlog_limit=320 audit: audit_backlog=327 > audit_backlog_limit=320 audit: audit_lost=189 audit_rate_limit=0 audit_backlog_limit=320 [...] audit: audit_lost=423 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit_log_start: 32 callbacks suppressed audit_log_start: 671 callbacks suppressed mptbase: ioc0: PhysDisk is now failed, out of sync mptbase: ioc0: PhysDisk is now missing, out of sync mptbase: ioc0: PhysDisk is now online, out of sync mptbase: ioc0: SMART data received, ASC/ASCQ = 5dh/10h
OK, use smartctl to access the smart data information , among other things it will display the hard disk serial number, then you can figure out which of the disks is dying by inspecting the hardware and comparing the serial numbers.
If smartd does not work with this controller, try the (usually awful) tools provided by the controller's manufacturer.
Thanks, SMART unfortunately doesn't work here. The controller does HW RAID 10, which openSUSE seas just as sda, so operation system IMHO doesn't know much about real physical disks... I hope the utility from IBM is not so terrible... ;-)
ps: I suggest inmediate action ;P
Hehe... All the best, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
El 01/04/13 13:39, Vojtěch Zeisek escribió:
Thanks, SMART unfortunately doesn't work here.
You may be wrong, smartd can usually access the data from the controller... checkout the documentation. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Po 1. dubna 2013 13:43:28, Cristian Rodríguez napsal(a):
El 01/04/13 13:39, Vojtěch Zeisek escribió:
Thanks, SMART unfortunately doesn't work here.
You may be wrong, smartd can usually access the data from the controller... checkout the documentation.
I tried, I wasn't able to get it to work and I only found note it require something very special from IBM to work, but I wasn't able to figure it out... Have a nice day, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
El 01/04/13 13:49, Vojtěch Zeisek escribió:
Dne Po 1. dubna 2013 13:43:28, Cristian Rodríguez napsal(a):
El 01/04/13 13:39, Vojtěch Zeisek escribió:
Thanks, SMART unfortunately doesn't work here.
You may be wrong, smartd can usually access the data from the controller... checkout the documentation.
I tried, I wasn't able to get it to work and I only found note it require something very special from IBM to work, but I wasn't able to figure it out... Have a nice day, Vojtěch
OK, then your controller probably has a sort of "firmware-based GUI" that can be accessed after presssing "some" key on boot. that will probably also show you the serial number of the disk that has gone swimming with the fishes. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Op 01-04-13 16:37, Vojtěch Zeisek schreef:
Hello, ... I don't have so much experience with such HW, so I rather ask before taking action. It is still running. ;-) So am I correct, that one disc is dying? How to recognize which one? And can I really just open running machine the case, exchange HDD and wait little bit? ;-) Have a nice day, Vojtěch
Hi, FWIW, Do the drives have LED's ? My servers' drives (HP) have them and are supposed to indicate which drive has problems. Regards, Koenraad Lelong -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Would lsscsi -v -l help? (More options on man page.) On Tue, 02 Apr 2013, Vojtěch Zeisek wrote:
Hello, openSUSE experts, recently I have got unpleasant news from logs:
end_device-0:10: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 [...] end_device-0:63: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 phy-0:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff880272981c00) port-0:10: mptsas: ioc0: delete port 10, sas_addr (0x922210b547a6049) [...] port-0:63: mptsas: ioc0: delete port 63, sas_addr (0x922210b547a6049) audit: audit_backlog=326 > audit_backlog_limit=320 audit: audit_backlog=327 > audit_backlog_limit=320 audit: audit_lost=189 audit_rate_limit=0 audit_backlog_limit=320 [...] audit: audit_lost=423 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit_log_start: 32 callbacks suppressed audit_log_start: 671 callbacks suppressed mptbase: ioc0: PhysDisk is now failed, out of sync mptbase: ioc0: PhysDisk is now missing, out of sync mptbase: ioc0: PhysDisk is now online, out of sync mptbase: ioc0: SMART data received, ASC/ASCQ = 5dh/10h mptbase: ioc0: volume is now degraded, enabled mptbase: ioc0: volume is now degraded, enabled, resync in progress mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0 id=4 mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 scsi 0:0:11:0: Attached scsi generic sg3 type 0 scsi 0:0:11:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 [...] scsi 0:0:64:0: Attached scsi generic sg3 type 0 scsi 0:0:64:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 scsi target0:0:10: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 [...] scsi target0:0:63: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049
Once, I had to hard reset it because of kernel panic during RAID syncing error. It is tower server IBM X3200 M3 with 4 x 500 GB hot-swap HDD http://www-03.ibm.com/systems/x/hardware/tower/x3200m3/ It serves as LAMP and SFTP. I used built-in SAS utility to create HW RAID 10 which openSUSE 12.2 sees as one disk. If it is important, it has UEFI, but I use legacy BIOS mode (and GRUB 1). I don't have so much experience with such HW, so I rather ask before taking action. It is still running. ;-) So am I correct, that one disc is dying? How to recognize which one? And can I really just open running machine the case, exchange HDD and wait little bit? ;-) Have a nice day, Vojtěch
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Dne Út 2. dubna 2013 07:06:17, Michael Hamilton napsal(a):
Would lsscsi -v -l help? (More options on man page.)
No, it doesn't help. At least not now. It seems the disk haven't totally died yet (those messages repeat periodically): kernel: [186501.590977] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0 id=4 kernel: [186501.590983] mptbase: ioc0: PhysDisk is now missing, out of sync kernel: [186501.569485] mptbase: ioc0: PhysDisk is now failed, out of sync kernel: [186501.581134] mptbase: ioc0: volume is now degraded, enabled kernel: [186504.479282] mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0 id=4 kernel: [186504.479288] mptbase: ioc0: PhysDisk is now online, out of sync kernel: [186504.979881] mptbase: ioc0: volume is now degraded, enabled, resync in progress So I'll see tomorrow in the controller utility. Hopefully. And yes, it has LEDs to indicate disks state. I hope it works. :-)
On Tue, 02 Apr 2013, Vojtěch Zeisek wrote:
Hello, openSUSE experts, recently I have got unpleasant news from logs:
end_device-0:10: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 [...] end_device-0:63: mptsas: ioc0: removing sata device: fw_channel 0, fw_id 4, phy 3,sas_addr 0x922210b547a6049 phy-0:3: mptsas: ioc0: delete phy 3, phy-obj (0xffff880272981c00) port-0:10: mptsas: ioc0: delete port 10, sas_addr (0x922210b547a6049) [...] port-0:63: mptsas: ioc0: delete port 63, sas_addr (0x922210b547a6049) audit: audit_backlog=326 > audit_backlog_limit=320 audit: audit_backlog=327 > audit_backlog_limit=320 audit: audit_lost=189 audit_rate_limit=0 audit_backlog_limit=320 [...] audit: audit_lost=423 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded audit_log_start: 32 callbacks suppressed audit_log_start: 671 callbacks suppressed mptbase: ioc0: PhysDisk is now failed, out of sync mptbase: ioc0: PhysDisk is now missing, out of sync mptbase: ioc0: PhysDisk is now online, out of sync mptbase: ioc0: SMART data received, ASC/ASCQ = 5dh/10h mptbase: ioc0: volume is now degraded, enabled mptbase: ioc0: volume is now degraded, enabled, resync in progress mptbase: ioc0: RAID STATUS CHANGE for PhysDisk 0 id=4 mptbase: ioc0: RAID STATUS CHANGE for VolumeID 0 mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 scsi 0:0:11:0: Attached scsi generic sg3 type 0 scsi 0:0:11:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 [...] scsi 0:0:64:0: Attached scsi generic sg3 type 0 scsi 0:0:64:0: Direct-Access ATA ST3500320NS SN05 PQ: 0 ANSI: 5 scsi target0:0:10: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049 [...] scsi target0:0:63: mptsas: ioc0: delete device: fw_channel 0, fw_id 4, phy 3, sas_addr 0x922210b547a6049
Once, I had to hard reset it because of kernel panic during RAID syncing error. It is tower server IBM X3200 M3 with 4 x 500 GB hot-swap HDD http://www-03.ibm.com/systems/x/hardware/tower/x3200m3/ It serves as LAMP and SFTP. I used built-in SAS utility to create HW RAID 10 which openSUSE 12.2 sees as one disk. If it is important, it has UEFI, but I use legacy BIOS mode (and GRUB 1). I don't have so much experience with such HW, so I rather ask before taking action. It is still running. ;-) So am I correct, that one disc is dying? How to recognize which one? And can I really just open running machine the case, exchange HDD and wait little bit? ;-) Have a nice day, Vojtěch
Have a nice day, Vojtěch -- Vojtěch Zeisek Komunita openSUSE GNU/Linuxu Community of the openSUSE GNU/Linux http://www.opensuse.org/ http://trapa.cz/
participants (4)
-
Cristian Rodríguez
-
Koenraad Lelong
-
Michael Hamilton
-
Vojtěch Zeisek