[Bug 441852] New: SW RAID crashes every few weeks
https://bugzilla.novell.com/show_bug.cgi?id=441852 Summary: SW RAID crashes every few weeks Product: openSUSE 11.1 Version: Beta4 Platform: x86-64 OS/Version: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: fred.blaise@gmail.com QAContact: qa@suse.de Found By: --- I have 3 disks, with two raid1 out of 3 disks, and one raid5 with LVM on top. The 3 disks are WesternDigital of 250GB. 1 of them has 16M of cache, the 2 others have 8M, if that matters any. For a couple times now, my system has failed during the day (when there was nearly no activity) and I found it restarted with a raid error, such as the following, for both raid1 and raid5: RAID array is not clean -- starting background reconstruction RAID 1: RAID set md0 active with 3 out of 3 mirrors md0: bitmap file is out of date (8 < 9) -- forcing full recovery md0: bitmap file is out of date, doing full recovery md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) md: perf->run() failed ... mdadm: failed to RUN_ARRAY /dev/md0: input/output error There is no way to re-assemble the array, even starting with degraded disks, or forcing, or anything. The only way is to zero-ize the raid superblock on the member partitions, and re-create the same array over the devices. mdadm is intelligent enough to see there was already something. It works fine for raid1, but I haven't been so lucky for raid5 (maybe I specified the wrong chunk size when re-creating over it?). In the member partitions of the failing raid1, I could mount manually the partition (/) on all disks. The partitions hosted on the RAID seem to be all fine, just the RAID stuff seems corrupted. All disks pass SMART tests just fine, there is no suspicious noise, nothing. It just happens for no obvious reasons. I have re-installed beta4 with the same disk setup, but this will surely happen again. I have no idea where it comes from. What options can I enabled to debug this? What direction should I be investigating? Thank you for your guidance. Cheers fred -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=441852
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c2
roland kletzing
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c3
--- Comment #3 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c4
--- Comment #4 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c5
--- Comment #5 from roland kletzing
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c6
--- Comment #6 from roland kletzing
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c7
--- Comment #7 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
Michal Marek
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c8
--- Comment #8 from roland kletzing
I boot onto it from time to time. They haven't been affected by the crashes. ok - just wondering if windows boot corrupts anything. as we know windows kills mbr on installation, upgrades ntfs without asking, touches disks/partitions automagically..... i don`t trust windows, so avoid intermediate windows boot just to make sure that there NEVER was a windows boot between those crashes.
However, until now, md0 is the one that never crashed yet :) md1 and md2 were the ones. ? but your initial posting tells something different. (md0: bitmap initialisation failed: -5 md0: failed to create bitmap (-5) )
So the md0 component size is a common denominator, I guess that's why it is working fine? i have to admit that i don`t have a clue for the moment what really goes wrong here. ok, we have collected some data - now time for a real expert with deep knowledge of raid/md internals.
one last thing coming to my mind is "silent data corruption" (faulty power supply , bus errors...) - but i would wonder if only raid is affected by that.... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c9
--- Comment #9 from Fred Blaise
What options can I enabled to debug this?
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c10
--- Comment #10 from roland kletzing
https://bugzilla.novell.com/show_bug.cgi?id=441852
User roland.kletzing@materna.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c11
--- Comment #11 from roland kletzing
https://bugzilla.novell.com/show_bug.cgi?id=441852
User nfbrown@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c12
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c13
Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c14
--- Comment #14 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c15
--- Comment #15 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User nfbrown@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c16
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c17
Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User nfbrown@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c18
Neil Brown
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c19
--- Comment #19 from Fred Blaise
https://bugzilla.novell.com/show_bug.cgi?id=441852
User fred.blaise@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=441852#c20
Fred Blaise
participants (1)
-
bugzilla_noreply@novell.com