[Bug 559391] New: mdadm: unable to remove failed device
http://bugzilla.novell.com/show_bug.cgi?id=559391 http://bugzilla.novell.com/show_bug.cgi?id=559391#c0 Summary: mdadm: unable to remove failed device Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: jnelson-suse@jamponi.net QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091103 SUSE/3.5.5-1.1.2 Firefox/3.5.5 A software raid device, /dev/md1, experienced a component failure. The component was /dev/sdf, an external USB disk. The disk was removed, and a new one replaced it which became /dev/sdg. However, now mdadm won't allow me to remove the faulty device: mdadm /dev/md1 --fail /dev/sdf because /dev/sdf doesn't actually exist. However, mdadm claims it is still present (but faulty): turnip:~ # mdadm -D /dev/md1 /dev/md1: Version : 1.01 Creation Time : Thu Aug 27 17:19:56 2009 Raid Level : raid1 Array Size : 72612920 (69.25 GiB 74.36 GB) Used Dev Size : 72612920 (69.25 GiB 74.36 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Nov 30 20:02:42 2009 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Name : turnip:1 (local to host turnip) UUID : da9317ab:e1991da4:864ac8f9:4d1412b3 Events : 31265 Number Major Minor RaidDevice State 0 9 12 0 active sync /dev/md12 1 0 0 1 removed 2 8 80 - faulty writemostly spare 8,80 corresponds to /dev/sdf and I can't add /dev/sdg as /dev/md1 is still "busy": turnip:~ # mdadm /dev/md1 --add /dev/sdg mdadm: add new device failed for /dev/sdg as 3: Device or resource busy turnip:~ # So I stop the array, and re-start it (--assemble --scan). I am now able to remove /dev/sde However, now whenever I add /dev/sdg to the array, it immediately enters failed state and stays there: turnip:~ # mdadm -D /dev/md1 /dev/md1: Version : 1.01 Creation Time : Thu Aug 27 17:19:56 2009 Raid Level : raid1 Array Size : 72612920 (69.25 GiB 74.36 GB) Used Dev Size : 72612920 (69.25 GiB 74.36 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Nov 30 20:13:03 2009 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : turnip:1 (local to host turnip) UUID : da9317ab:e1991da4:864ac8f9:4d1412b3 Events : 31319 Number Major Minor RaidDevice State 0 9 12 0 active sync /dev/md12 1 0 0 1 removed turnip:~ # add it: mdadm /dev/md1 --re-add /dev/sdg (or mdadm /dev/md1 --add /dev/sdg) turnip:~ # mdadm -D /dev/md1 /dev/md1: Version : 1.01 Creation Time : Thu Aug 27 17:19:56 2009 Raid Level : raid1 Array Size : 72612920 (69.25 GiB 74.36 GB) Used Dev Size : 72612920 (69.25 GiB 74.36 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Nov 30 20:13:18 2009 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Name : turnip:1 (local to host turnip) UUID : da9317ab:e1991da4:864ac8f9:4d1412b3 Events : 31322 Number Major Minor RaidDevice State 0 9 12 0 active sync /dev/md12 1 0 0 1 removed 2 8 96 - faulty spare /dev/sdg turnip:~ # /var/log/messages: Nov 30 20:13:03 turnip kernel: [131611.321367] md: unbind<sdg> Nov 30 20:13:03 turnip kernel: [131611.332050] md: export_rdev(sdg) Nov 30 20:13:18 turnip kernel: [131626.791407] md: bind<sdg> The only way I'm able to add it is if I zero the superblock first: mdadm --zero-superblock /dev/sdg then --add works. Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=559391 http://bugzilla.novell.com/show_bug.cgi?id=559391#c Jon Nelson <jnelson-suse@jamponi.net> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |nfbrown@novell.com |ovo.novell.com | -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=559391 http://bugzilla.novell.com/show_bug.cgi?id=559391#c1 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #1 from Neil Brown <nfbrown@novell.com> 2009-12-03 00:05:36 UTC --- 1/ When a disk is removed it doesn't have a name, which makes it awkward to give a name to mdadm to remove it. So you can use mdadm /dev/mdX --remove failed to remove all devices which have failed, and mdadm /dev/mdX --remove detached to remove all devices which have been detached and don't have names any more. 2/ The fact that sdg always gets added as faulty is clearly wrong. I think I can imagine what is happening. I'll see if I can sort out exactly what and fix it. Thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=559391 http://bugzilla.novell.com/show_bug.cgi?id=559391#c2 Neil Brown <nfbrown@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |UPSTREAM --- Comment #2 from Neil Brown <nfbrown@novell.com> 2009-12-08 06:30:10 UTC --- I have fixed mdadm upstream so that it will not try to re-add devices that are marked as faulty. That is what was causing the problem. As there is a trivial work around which you discovered (--zero-superblock first), I am not going to submit an update for 11.2. The next openSUSE release will get a new upstream release, so this will get in to Factory and then opensuses X+1 in due course. So I'll resolve the bug as UPSTREAM. Thanks for the report. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=559391 http://bugzilla.novell.com/show_bug.cgi?id=559391#c3 --- Comment #3 from Jon Nelson <jnelson-suse@jamponi.net> 2009-12-08 13:07:58 UTC --- Hey - thanks! That looks like it will work just fine. I have to remember about "failed" and "detached" as pseudo device names. I wonder if it wouldn't be worthwhile to add a message to mdadm such that if somebody specifies a device that doesn't exist that perhaps mdadm could say "You can remove all failed and detached devices by using 'failed' or 'detached' as pseudo device names." but that's probably too much. Again, thanks! -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com