[Bug 497029] New: very long md RAID resync forced on boot
http://bugzilla.novell.com/show_bug.cgi?id=497029 Summary: very long md RAID resync forced on boot Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.1 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: pgnet.trash@gmail.com QAContact: qa@suse.de Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0 i've an openSUSE Dom0 & DomU, running, uname -ri 2.6.27.21-29-xen x86_64 in the DomU, I've 2 RAID10 arrays, mdadm --detail --scan ARRAY /dev/md0 level=raid10 num-devices=4 metadata=1.02 name=nas:0 UUID=bf... ARRAY /dev/md1 level=raid10 num-devices=4 metadata=1.02 name=nas:1 UUID=37... in the process of stress-testing the arrays, i note on DomU reboot, ... md: bind<sdb1> md: bind<sdc1> md: bind<sdd1> md: bind<sda1> md: md0: raid array is not clean -- starting background reconstruction raid10: raid set md0 active with 4 out of 4 devices md0: bitmap initialized from disk: read 24/24 pages, set 522694 bits created bitmap (373 pages) for device md0 md: resync of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. md: using 128k window, over a total of 1562497024 blocks. md: md1 stopped. md: bind<sdb2> md: bind<sdc2> md: bind<sdd2> md: bind<sda2> md: md1: raid array is not clean -- starting background reconstruction raid10: raid set md1 active with 4 out of 4 devices md1: bitmap initialized from disk: read 12/12 pages, set 381470 bits created bitmap (187 pages) for device md1 md: delaying resync of md1 until md0 has finished (they share one or more physical units) ... where it sits for quite a few minutes (this time, ~ 15 mins), then -- evnetually -- continues to successfully boot. i've found, "md: hung task timout during data check" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=489608 which references the "delaying resync" message, has a workaround for the 'noise', > > If it bothers you, run the command > > echo 0 > /proc/sys/kernel/hung_task_timeout_secs and, a committed fix, Version: 2.6.28-1 fixed by commit 9744197c3d7b329590c2be33ad7b17409bd798fe which is shipped in 2.6.28 and will reach lenny through lenny+half. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdif... is this the same issue? if so, has this fix been applied/backported to SL111 branch kernel as yet? or does it need to be? &/or is this workaround valid/appropriate? the Reproducible: Always Steps to Reproduce: 1. 2. 3. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497029
User pgnet.trash@gmail.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c1
--- Comment #1 from pgnet _
No, the name is the same, but AFAICT it wasn't compiled in. You can check your kernel config:
it seems the req'd flag is not set in the kernel, zgrep CONFIG_DETECT_SOFTLOCKUP /proc/config.gz # CONFIG_DETECT_SOFTLOCKUP is not set -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497029
Leon Wang
http://bugzilla.novell.com/show_bug.cgi?id=497029
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=497029
User nfbrown@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c2
Neil Brown
http://bugzilla.novell.com/show_bug.cgi?id=497029
User nfbrown@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c3
--- Comment #3 from Neil Brown
http://bugzilla.novell.com/show_bug.cgi?id=497029
User pgnet.dev@gmail.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c4
--- Comment #4 from pgnet Dev
Ping?
sorry. re: this particular RAID-using scenario -- namely, a NAS-in-a-DomU with the RAID array attached via a h/w passsed-thru pci card, i've "been away". i tried -- valiantly, no less ;-) -- and failed to deploy a "much easier than MD/LVM on Linux" ZFS on OpenSolaris/DomU on Linux Dom0. suffice it to say, i'm "back". learned my lesson. so, now can start 'listening' again, here. although, many updates have been promoted -- kernel, mdadm, and other -- since. i'll see what i can see, and report asap ... thanks. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=497029
User nfbrown@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c5
--- Comment #5 from Neil Brown
http://bugzilla.novell.com/show_bug.cgi?id=497029
User nfbrown@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=497029#c6
Neil Brown
participants (1)
-
bugzilla_noreply@novell.com