[Bug 840413] New: mke2fs with discard on MD Raid5 breaks RAID array
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c0 Summary: mke2fs with discard on MD Raid5 breaks RAID array Classification: openSUSE Product: openSUSE 12.3 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: martin.wilck@ts.fujitsu.com QAContact: qa-bugs@suse.de Found By: --- Blocker: --- Created an attachment (id=557958) --> (http://bugzilla.novell.com/attachment.cgi?id=557958) script session demonstrating the problem User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0 Create a MD RAID5 array and run mke2fs on it - breaks. I am running in a virtual machine on virtio "disks" (not sure if that makes a difference). $ uname -a Linux osc.mittagstun.de 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux $ mdadm --version mdadm - v3.2.6 - 25th October 2012 $ losetup -a /dev/loop8: [64784]:42 (/var/tmp/mdtest8) /dev/loop9: [64784]:43 (/var/tmp/mdtest9) /dev/loop10: [64784]:32 (/var/tmp/mdtest10) /dev/loop11: [64784]:33 (/var/tmp/mdtest11) /dev/loop12: [64784]:34 (/var/tmp/mdtest12) $ mdadm -C /dev/md/r5 -l5 -n5 /dev/loop{8,9,10,11,12} --auto=yes $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid5 loop12[5] loop11[3] loop10[2] loop9[1] loop8[0] 260096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] $ mke2fs -j /dev/md/r5 mke2fs 1.42.6 (21-Sep-2012) Discarding device blocks: failed - Input/output error Warning: could not erase sector 2: Attempt to write block to filesystem resulted in short write Allocating group tables: done Warning: could not read block 0: Attempt to read block from filesystem resulted in short read [...] Warning: could not erase sector 0: Attempt to write block to filesystem resulted in short write Writing inode tables: Could not write 5 blocks in inode table starting at 259: Attempt to write block to filesystem resulted in short write $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid5 loop12[5](F) loop11[3](F) loop10[2](F) loop9[1](F) loop8[0](F) 260096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/0] [_____] Reproducible: Always Steps to Reproduce: $ uname -a Linux osc.mittagstun.de 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux $ mdadm --version mdadm - v3.2.6 - 25th October 2012 $ losetup -a /dev/loop8: [64784]:42 (/var/tmp/mdtest8) /dev/loop9: [64784]:43 (/var/tmp/mdtest9) /dev/loop10: [64784]:32 (/var/tmp/mdtest10) /dev/loop11: [64784]:33 (/var/tmp/mdtest11) /dev/loop12: [64784]:34 (/var/tmp/mdtest12) $ mdadm -C /dev/md/r5 -l5 -n5 /dev/loop{8,9,10,11,12} --auto=yes $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid5 loop12[5] loop11[3] loop10[2] loop9[1] loop8[0] 260096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] $ mke2fs -j /dev/md/r5 mke2fs 1.42.6 (21-Sep-2012) Discarding device blocks: failed - Input/output error Warning: could not erase sector 2: Attempt to write block to filesystem resulted in short write Allocating group tables: done Warning: could not read block 0: Attempt to read block from filesystem resulted in short read [...] Warning: could not erase sector 0: Attempt to write block to filesystem resulted in short write Writing inode tables: Could not write 5 blocks in inode table starting at 259: Attempt to write block to filesystem resulted in short write $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md127 : active raid5 loop12[5](F) loop11[3](F) loop10[2](F) loop9[1](F) loop8[0](F) 260096 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/0] [_____] Actual Results: The RAID array is broken, any IO on it results in errors. dmesg is full of messages like this: [ 670.401472] sector=1f880 i=0 (null) (null) (null) (null) 1 [ 670.401472] ------------[ cut here ]------------ [ 670.401473] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.7.10/linux-3.7/drivers/md/raid5.c:352 get_active_stripe+0x656/0x750 [raid456]() [ 670.401473] Hardware name: KVM [ 670.401482] Modules linked in: raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx loop bnep bluetooth rfkill fuse af_packet snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device snd floppy virtio_net soundcore i2c_piix4 microcode virtio_balloon serio_raw snd_page_alloc pcspkr button autofs4 processor thermal_sys cirrus ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh virtio_pci ata_generic virtio_blk virtio virtio_ring ata_piix [ 670.401482] Pid: 3809, comm: mke2fs Tainted: G W 3.7.10-1.16-desktop #1 [ 670.401482] Call Trace: [ 670.401484] [<ffffffff81004818>] dump_trace+0x88/0x300 [ 670.401486] [<ffffffff8158af33>] dump_stack+0x69/0x6f [ 670.401488] [<ffffffff81045249>] warn_slowpath_common+0x79/0xc0 [ 670.401490] [<ffffffffa02af016>] get_active_stripe+0x656/0x750 [raid456] [ 670.401495] [<ffffffffa02b454c>] make_request+0x3bc/0x660 [raid456] [ 670.401500] [<ffffffff8143e84b>] md_make_request+0xfb/0x240 [ 670.401502] [<ffffffff8129e3d2>] generic_make_request+0xb2/0x100 [ 670.401504] [<ffffffff8129f021>] submit_bio+0x61/0x130 [ 670.401505] [<ffffffff812a58ef>] blkdev_issue_discard+0x17f/0x250 [ 670.401507] [<ffffffff812a65ec>] blkdev_ioctl+0x39c/0x7d0 [ 670.401509] [<ffffffff811a551c>] block_ioctl+0x3c/0x50 [ 670.401511] [<ffffffff811805bf>] do_vfs_ioctl+0x8f/0x530 [ 670.401512] [<ffffffff81180b00>] sys_ioctl+0xa0/0xc0 [ 670.401514] [<ffffffff8159ecbf>] tracesys+0xe1/0xe6 [ 670.401516] [<00007ff1ec021f27>] 0x7ff1ec021f26 [ 670.401517] ---[ end trace dc0f7e009f4936db ]--- Expected Results: No problem with mke2fs strace of mke2fs indicates that the problem is related to BLKDISCARD ioctls: ioctl(3, BLKDISCARD, {0, 0}) = 0 write(1, "Discarding device blocks: ", 26Discarding device blocks: ) = 26 write(1, " 1024/260096", 13 1024/260096) = 13 write(1, "\10\10\10\10\10\10\10\10\10\10\) = 13 ioctl(3, BLKDISCARD, {100000, 0}) = -1 EIO (Input/output error) write(1, " ", 13 ) = 13 write(1, "\10\10\10\10\10\10\10\10\10\10\) = 13 write(1, "failed - ", 9failed - ) = 9 write(1, "Input/output error", 18Input/output error) = 18 Indeed, mke2fs -j -E nodiscard runs without problems. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c1 --- Comment #1 from Martin Wilck <martin.wilck@ts.fujitsu.com> 2013-09-16 19:38:25 UTC --- I just reproduced the same problem on bare metal. The behavior is a bit different $ mdadm -C /dev/md/r5 -l5 -n5 /dev/loop{8,9,10,11,12} $ mke2fs -j /dev/md/r5 => works $ mdadm -C /dev/md/r5 -l5 -n5 /dev/loop{8,9,10,11,12} $ mke2fs -j /dev/md/r5 -E discard => Fails in just the same way as the test on the VM So the main difference appears to be that mke2fs uses "discard" by default in the VM but not in the bare metal case; the main problem is that mke2fs with "discard" fails and ruins the array. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c2 --- Comment #2 from Martin Wilck <martin.wilck@ts.fujitsu.com> 2013-09-23 20:14:56 UTC --- Dan Williams posted the following hint on linux-raid: This sounds like the issue addressed by: 66c28f9 [SCSI] sd: Update WRITE SAME heuristics 5026d7a md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nfbrown@suse.com AssignedTo|kernel-maintainers@forge.pr |spargaonkar@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c3 --- Comment #3 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-02 12:07:45 UTC --- Looking into recreating this and also into the patches listed in comment #2. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c4 --- Comment #4 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-02 14:08:07 UTC --- What does the command lsmod | grep -i raid return? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c5 --- Comment #5 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-02 14:50:35 UTC --- I loaded raid456.ko and attempted mdadm command to create a sw raid device and it failed. Trying to debug why it failed to write metadata. $ sudo mdadm -C /dev/md0 -l5 -n5 /dev/loop{1,2,3,4,5} --auto=yes mdadm: Defaulting to version 1.2 metadata mdadm: Failed to write metadata to /dev/loop1 $ lsmod | grep raid raid456 65757 0 async_memcpy 12529 1 raid456 async_raid6_recov 12795 1 raid456 async_pq 12912 1 raid456 raid6_pq 97812 2 async_pq,async_raid6_recov async_xor 12855 2 async_pq,raid456 async_tx 13330 5 async_pq,raid456,async_xor,async_memcpy,async_raid6_recov $ sudo losetup -a /dev/loop0: [0805]:3444372 (/home/shirish/dd0) /dev/loop1: [0805]:3444702 (/home/shirish/dd1) /dev/loop2: [0805]:3444745 (/home/shirish/dd2) /dev/loop3: [0805]:3444747 (/home/shirish/dd3) /dev/loop4: [0805]:3439411 (/home/shirish/dd4) /dev/loop5: [0805]:3444771 (/home/shirish/dd5) Used this command to create dd1 through dd5 $ sudo dd if=/dev/zero of=~/dd0 bs=1k count=1000 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c6 --- Comment #6 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-02 18:56:05 UTC --- I can't recreate this problem. I could create a filesysetem using -E discard option on a raid5 device and mount it. cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid1] md0 : active raid5 mybdrv5[5] mybdrv4[3] mybdrv3[2] mybdrv2[1] mybdrv1[0] 1046528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU] unused devices: <none> /dev/md0 on /mnt/mdmntpt type ext3 (rw) These are all five ramdisks. Is there any error/data logged in either /var/log/messages or /var/log/syslog during mke2fs with -E discard? Looking in the code around BLKDISCARD! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c7 --- Comment #7 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-03 16:58:44 UTC --- I tried it again, this time without ramdisk devices. Created five using command like dd of=~/dd[1-5] bs=1k seek=100000 count=0 losetup /dev/loop[1-5] ~/dd[1-5] mdadm -C /dev/md0 -l5 -n5 /dev/loop{1,2,3,4,5} --auto=yes mke2fs -j -E discard /dev/md0 No errors or warnings during execution of any of the above commands. Everything ran successfully to completion. BLKDISCARD has something to do with the underlying block device. Need to figure out the relation between BLKDISCARD and virtio disks. I have do not idea how to create virtio disks, working on figuring out that too. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c8 --- Comment #8 from Martin Wilck <martin.wilck@ts.fujitsu.com> 2013-10-08 12:49:56 UTC --- virtio is standard if you create a VM under OpenSUSE 12.3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c10 --- Comment #10 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-10-16 19:57:10 UTC --- Working on assembling the setup. Will update this bug in a day or so. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c11 --- Comment #11 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-11-18 23:43:14 UTC --- I have a vm setup. Will create virtio disks and recreate this problem as a first step. Will update soon. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c12 --- Comment #12 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-11-19 23:15:15 UTC --- Not able to recreate. I guess I am doing something different. Can you please display the output of your fdisk command? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c13 --- Comment #13 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-11-19 23:20:48 UTC --- Just to clarify, on my vm setup, this command did not fail (as it does on your setup). So fdisk -l </dev/vdx> will help for me to get an idea the type of disks. mke2fs -j /dev/md0 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c14 --- Comment #14 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-11-20 15:51:14 UTC --- I have five partitions/disks (/dev/vda5-9) which are unformated and unmounted of Id 83, System Linux. lsmod | grep virtio lists loaded modules as virtio_net, virtio_balloon, virtio_pci, virtio_blk, virtio, and virtio_ring. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c17 --- Comment #17 from Martin Wilck <martin.wilck@ts.fujitsu.com> 2013-11-21 22:04:38 UTC --- I just tried to reproduce this, but I couldn't. I don't know why, this used to be 100% reproducable over several boots. Anyway, I can't reproduce it now. I am very sorry for having wasted your time. Thank you for your efforts. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c18 --- Comment #18 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-11-22 15:02:33 UTC --- No problem, this was a bug and it did get fixed and was posted in openSUSE-12.3 branch. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=840413 https://bugzilla.novell.com/show_bug.cgi?id=840413#c19 Shirish Pargaonkar <spargaonkar@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED --- Comment #19 from Shirish Pargaonkar <spargaonkar@suse.com> 2013-12-05 15:17:52 UTC --- As stated in comment #15 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com