[Bug 643392] New: software raid crash and go into read only mode when copying big files - regression from 11.1
https://bugzilla.novell.com/show_bug.cgi?id=643392 https://bugzilla.novell.com/show_bug.cgi?id=643392#c0 Summary: software raid crash and go into read only mode when copying big files - regression from 11.1 Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: i686 OS/Version: openSUSE 11.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: tthidney@seznam.cz QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (compatible; Konqueror/4.5; Linux; X11) KHTML/4.5.1 (like Gecko) SUSE I have 11.1 i586 installation with root on /dev/sda2. There are two more disks /dev/sdb and /dev/sdc. They are both 750G SAMSUNG HD753LJ. In 11.1 I create software raid with sdb and sdc. I install new 11.3 (I did not done an update) and mount raid disk). Then I copy several big files (each 20G) and after one or two files I got error that files can`t be writen in read only mode. I was able to reproduce this several times. I tried to follow /var/log/messages during copyig. At begining there were those messages: Oct 3 11:07:55 linux-vr7z kernel: [ 322.139405] ata3.01: failed command: READ DMA EXT Oct 3 11:07:55 linux-vr7z kernel: [ 322.139410] ata3.01: cmd 25/00:50:c1:2e:dd/00:00:30:00:00/f0 tag 0 dma 40960 in Oct 3 11:07:55 linux-vr7z kernel: [ 322.139412] res 51/84:3f:d2:2e:dd/84:00:30:00:00/f0 Emask 0x30 (host bus error) Oct 3 11:07:55 linux-vr7z kernel: [ 322.139414] ata3.01: status: { DRDY ERR } Oct 3 11:07:55 linux-vr7z kernel: [ 322.139416] ata3.01: error: { ICRC ABRT } Oct 3 11:07:55 linux-vr7z kernel: [ 322.139424] ata3.00: hard resetting link Oct 3 11:07:56 linux-vr7z kernel: [ 322.444017] ata3.01: hard resetting link Oct 3 11:07:56 linux-vr7z kernel: [ 322.900109] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 3 11:07:56 linux-vr7z kernel: [ 322.900121] ata3.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 3 11:07:56 linux-vr7z kernel: [ 322.928335] ata3.00: configured for UDMA/133 Oct 3 11:07:56 linux-vr7z kernel: [ 322.947241] ata3.01: configured for UDMA/133 Oct 3 11:07:56 linux-vr7z kernel: [ 322.947249] ata3: EH complete Oct 3 11:08:30 linux-vr7z kernel: [ 357.127607] ata3.01: limiting SATA link speed to 1.5 Gbps Oct 3 11:08:30 linux-vr7z kernel: [ 357.127613] ata3.01: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x0 Oct 3 11:08:30 linux-vr7z kernel: [ 357.127616] ata3.01: SError: { UnrecovData 10B8B BadCRC } Oct 3 11:08:30 linux-vr7z kernel: [ 357.127620] ata3.01: failed command: READ DMA EXT Oct 3 11:08:30 linux-vr7z kernel: [ 357.127626] ata3.01: cmd 25/00:40:01:19:01/00:00:31:00:00/f0 tag 0 dma 32768 in Oct 3 11:08:30 linux-vr7z kernel: [ 357.127627] res 51/84:0f:32:19:01/84:00:31:00:00/f0 Emask 0x30 (host bus error) Oct 3 11:08:30 linux-vr7z kernel: [ 357.127630] ata3.01: status: { DRDY ERR } Oct 3 11:08:30 linux-vr7z kernel: [ 357.127632] ata3.01: error: { ICRC ABRT } Oct 3 11:08:30 linux-vr7z kernel: [ 357.127640] ata3.00: hard resetting link Oct 3 11:08:31 linux-vr7z kernel: [ 357.432013] ata3.01: hard resetting link Oct 3 11:08:36 linux-vr7z kernel: [ 362.937027] ata3.00: link is slow to respond, please be patient (ready=0) Oct 3 11:08:40 linux-vr7z kernel: [ 367.170027] ata3.00: SRST failed (errno=-16) Oct 3 11:08:40 linux-vr7z kernel: [ 367.170033] ata3.00: hard resetting link Oct 3 11:08:41 linux-vr7z kernel: [ 367.475013] ata3.01: hard resetting link Oct 3 11:08:46 linux-vr7z kernel: [ 372.980029] ata3.00: link is slow to respond, please be patient (ready=0) Oct 3 11:08:50 linux-vr7z kernel: [ 377.213008] ata3.00: SRST failed (errno=-16) Oct 3 11:08:50 linux-vr7z kernel: [ 377.213015] ata3.00: hard resetting link Oct 3 11:08:51 linux-vr7z kernel: [ 377.518011] ata3.01: hard resetting link Oct 3 11:08:51 linux-vr7z kernel: [ 377.974053] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 3 11:08:51 linux-vr7z kernel: [ 377.974065] ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 3 11:08:51 linux-vr7z kernel: [ 378.003710] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 3 11:08:51 linux-vr7z kernel: [ 378.003713] ata3.01: revalidation failed (errno=-5) Oct 3 11:08:56 linux-vr7z kernel: [ 382.974012] ata3.00: hard resetting link Oct 3 11:08:56 linux-vr7z kernel: [ 383.279011] ata3.01: hard resetting link Oct 3 11:08:57 linux-vr7z kernel: [ 383.735052] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 3 11:08:57 linux-vr7z kernel: [ 383.735063] ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 3 11:08:57 linux-vr7z kernel: [ 383.764697] ata3.01: failed to IDENTIFY (I/O error, err_mask=0x2) Oct 3 11:08:57 linux-vr7z kernel: [ 383.764699] ata3.01: revalidation failed (errno=-5) Oct 3 11:09:02 linux-vr7z kernel: [ 388.735012] ata3.00: hard resetting link Oct 3 11:09:02 linux-vr7z kernel: [ 389.040013] ata3.01: hard resetting link Oct 3 11:09:03 linux-vr7z kernel: [ 389.496087] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 3 11:09:03 linux-vr7z kernel: [ 389.496099] ata3.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Those messages appeard one per cca 20s. After while, log get filled with: Oct 3 11:09:08 linux-vr7z kernel: [ 395.266562] end_request: I/O error, dev sdb, sector 822155585 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266641] sd 2:0:1:0: [sdb] Unhandled error code Oct 3 11:09:08 linux-vr7z kernel: [ 395.266645] sd 2:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 3 11:09:08 linux-vr7z kernel: [ 395.266649] sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 0e 31 03 41 00 04 00 00 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266658] end_request: I/O error, dev sdb, sector 238093121 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266661] Buffer I/O error on device md1, logical block 59519264 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266663] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266667] Buffer I/O error on device md1, logical block 59519265 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266669] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266672] Buffer I/O error on device md1, logical block 59519266 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266674] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266676] Buffer I/O error on device md1, logical block 59519267 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266678] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266681] Buffer I/O error on device md1, logical block 59519268 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266683] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266686] Buffer I/O error on device md1, logical block 59519269 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266688] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266690] Buffer I/O error on device md1, logical block 59519270 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266692] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266695] Buffer I/O error on device md1, logical block 59519271 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266697] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266700] Buffer I/O error on device md1, logical block 59519280 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266702] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266705] Buffer I/O error on device md1, logical block 59519281 Oct 3 11:09:08 linux-vr7z kernel: [ 395.266707] lost page write due to I/O error on md1 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288713] sd 2:0:1:0: [sdb] Unhandled error code Oct 3 11:09:08 linux-vr7z kernel: [ 395.288715] sd 2:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 3 11:09:08 linux-vr7z kernel: [ 395.288718] sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 0e 31 07 41 00 01 60 00 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288726] end_request: I/O error, dev sdb, sector 238094145 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288801] sd 2:0:1:0: [sdb] Unhandled error code Oct 3 11:09:08 linux-vr7z kernel: [ 395.288803] sd 2:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 3 11:09:08 linux-vr7z kernel: [ 395.288806] sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 0e 31 08 a9 00 04 00 00 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288814] end_request: I/O error, dev sdb, sector 238094505 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288959] sd 2:0:1:0: [sdb] Unhandled error code Oct 3 11:09:08 linux-vr7z kernel: [ 395.288961] sd 2:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Oct 3 11:09:08 linux-vr7z kernel: [ 395.288964] sd 2:0:1:0: [sdb] CDB: Write(10): 2a 00 0e 31 0c a9 00 04 00 00 Oct 3 11:09:08 linux-vr7z kernel: [ 395.288972] end_request: I/O error, dev sdb, sector 238095529 Oct 3 11:09:08 linux-vr7z kernel: [ 395.289095] sd 2:0:1:0: [sdb] Unhandled error code After those, copying stoped with error message (cant write to read only file system) and raid was mounted in read only. I am using kernel for online update: kernel-default-2.6.34.7-0.3.1.i586 hwinfo disk: 34: IDE 501.0: 10600 Disk [Created at block.243] UDI: /org/freedesktop/Hal/devices/storage_serial_SATA_SAMSUNG_HD753LJ61291A161A0KDU Unique ID: _kuT.VSn9ddtxzU7 Parent ID: w7Y8.je6JVuJ1710 SysFS ID: /class/block/sdc SysFS BusID: 5:0:1:0 SysFS Device Link: /devices/pci0000:00/0000:00:1f.2/host5/target5:0:1/5:0:1:0 Hardware Class: disk Model: "SAMSUNG HD753LJ" Vendor: "SAMSUNG" Device: "HD753LJ" Revision: "1AA0" Serial ID: "61291A161A0KDU" Driver: "ata_piix", "sd" Driver Modules: "ata_piix" Device File: /dev/sdc Device Files: /dev/sdc, /dev/disk/by-id/scsi-SATA_SAMSUNG_HD753LJ61291A161A0KDU, /dev/disk/by-id/ata-SAMSUNG_HD753LJ_61291A161A0KDU, /dev/disk/by-path/pci-0000:00:1f.2-scsi-1:0:1:0, /dev/disk/by-id/edd-int13_dev82 Device Number: block 8:32-8:47 BIOS id: 0x82 Geometry (Logical): CHS 91201/255/63 Size: 1465149168 sectors a 512 bytes Geometry (BIOS EDD): CHS 1453521/16/63 Size (BIOS EDD): 1465149168 sectors Geometry (BIOS Legacy): CHS 1024/255/63 Config Status: cfg=no, avail=yes, need=no, active=unknown Attached to: #17 (IDE interface) When I do same operation on 11.1 system, everything is correct and data are copyed correctly. Reproducible: Always Steps to Reproduce: 1. boot 11.3, 2. copy several big filse cca 10x 20G 3. after 3-5 file error appear -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c1
Tony Jones
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c2
Tejun Heo
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c3
--- Comment #3 from Thidney Thidney
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c4
--- Comment #4 from Thidney Thidney
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=643392
https://bugzilla.novell.com/show_bug.cgi?id=643392#c5
Greg Kroah-Hartman
participants (1)
-
bugzilla_noreply@novell.com