
On 2010-07-22 I wrote:
There definitely seems to be some kind of problem on the new machine, most likely associated with a drive or motherboard port.
Difficult if you cannot access it anymore. I'm normally logging to a remote machine so I can always see syslog output.
That's a good idea. I'll set that up before I run my next experiment. Thanks. At the moment suse is on another partition on the drive that fails. I might move suse to a different drive to separate things.
I did reboot and found the same symptoms as before - one specific drive not responding. After a powercycle, it's OK again. I've swapped it with the drive next to it to see whether the problem follows the drive or stays with the bay/controller port.
So ... here I am again :( You may remember I had an intermittent disk failure. It's been alright since then but it finally failed again and this time I managed to capture an error log. I had had a problem on a particular disk, which is part of a RAID. I put the disk back in a different bay in the disk cage (I swapped it with one of the other disks in the RAID) in order to see whether it was the disk or the port that had the problem. I let it reintegrate the RAID and then I set it back to the task of loading all my data onto it. It's been doing that solidly for the past two plus weeks but has now crashed again. I also set up a network log, as Pit suggested and that has come up trumps! The problem has moved with the disk. The last bit of the log is below; does it mean anything to anyone? I'll post more details tomorrow. Cheers, Dave Aug 7 02:36:13 scop4 kernel: [1341328.180960] ata4.00: exception Emask 0x0 SAct 0x1f SErr 0x0 action 0x6 frozen Aug 7 02:36:13 scop4 kernel: [1341328.180980] ata4.00: cmd 61/08:00:2b:00:dd/00:00:16:00:00/40 tag 0 ncq 4096 out Aug 7 02:36:13 scop4 kernel: [1341328.180982] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 7 02:36:13 scop4 kernel: [1341328.180994] ata4.00: status: { DRDY } Aug 7 02:36:13 scop4 kernel: [1341328.181004] ata4.00: cmd 61/08:08:e3:4b:dd/00:00:16:00:00/40 tag 1 ncq 4096 out Aug 7 02:36:13 scop4 kernel: [1341328.181006] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 7 02:36:13 scop4 kernel: [1341328.181017] ata4.00: status: { DRDY } Aug 7 02:36:13 scop4 kernel: [1341328.181026] ata4.00: cmd 61/08:10:cb:c0:8e/00:00:48:00:00/40 tag 2 ncq 4096 out Aug 7 02:36:13 scop4 kernel: [1341328.181028] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 7 02:36:13 scop4 kernel: [1341328.181039] ata4.00: status: { DRDY } Aug 7 02:36:13 scop4 kernel: [1341328.181048] ata4.00: cmd 61/08:18:e7:08:85/00:00:00:00:00/40 tag 3 ncq 4096 out Aug 7 02:36:13 scop4 kernel: [1341328.181050] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 7 02:36:13 scop4 kernel: [1341328.181061] ata4.00: status: { DRDY } Aug 7 02:36:13 scop4 kernel: [1341328.181070] ata4.00: cmd 61/08:20:ef:08:85/00:00:00:00:00/40 tag 4 ncq 4096 out Aug 7 02:36:13 scop4 kernel: [1341328.181072] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 7 02:36:13 scop4 kernel: [1341328.181083] ata4.00: status: { DRDY } Aug 7 02:36:13 scop4 kernel: [1341328.181091] ata4: hard resetting link Aug 7 02:36:19 scop4 kernel: [1341333.536388] ata4: link is slow to respond, please be patient (ready=0) Aug 7 02:36:23 scop4 kernel: [1341338.228803] ata4: COMRESET failed (errno=-16) Aug 7 02:36:23 scop4 kernel: [1341338.228817] ata4: hard resetting link Aug 7 02:36:29 scop4 kernel: [1341343.583285] ata4: link is slow to respond, please be patient (ready=0) Aug 7 02:36:33 scop4 kernel: [1341348.275690] ata4: COMRESET failed (errno=-16) Aug 7 02:36:33 scop4 kernel: [1341348.275704] ata4: hard resetting link Aug 7 02:36:39 scop4 kernel: [1341353.630304] ata4: link is slow to respond, please be patient (ready=0) Aug 7 02:37:08 scop4 kernel: [1341383.316777] ata4: COMRESET failed (errno=-16) Aug 7 02:37:08 scop4 kernel: [1341383.316791] ata4: limiting SATA link speed to 1.5 Gbps Aug 7 02:37:08 scop4 kernel: [1341383.316798] ata4: hard resetting link Aug 7 02:37:13 scop4 kernel: [1341388.366208] ata4: COMRESET failed (errno=-16) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org