https://bugzilla.novell.com/show_bug.cgi?id=402515 Summary: Unable to boot with software-raid (background reconstruction fails) Product: openSUSE 11.0 Version: Final Platform: x86-64 OS/Version: openSUSE 11.0 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: wandinger.andreas@wres.de QAContact: qa@suse.de Found By: --- I installed openSUSE 11.0 on a Fujitsu Siemens Scaleo T D1605 AMD 64 3400+ 1 GiB Ram 2 x 120 GiB Maxtor IDE (sda, sdb) using the following partition setup sda1, sdb1 containing md0 (raid 1) /boot sda2, sdb2 each containing swap sda3, sdb3 extended partition sda5, sdb5 containing md1 (raid 1) / sda6, sdb6 containing md2 (raid 1) /home Grub was installed in the MBR of each disk. This setup worked fine and without any boot-problems until today. Today morning openSUSE refused to boot (complete freeze) producing this error (output of dmesg): md: md1 stopped. md: bind<sdb5> md: bind<sda5> md: md1: raid array is not clean -- starting background reconstruction raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap file is out of date (8 < 9) -- forcing full recovery md1: bitmap file is out of date, doing full recovery md1: bitmap initialisation failed: -5 md1: failed to create bitmap (-5) md: pers->run() failed ... Unfortunately, i don't know the reason why the array became dirty - the system was shut down normally. So I can not exactly say how to reproduce the error. Then I booted using the "recovery system" option of the install-DVD. A check of both drives turned out that there were no errors at all. So I tried to assemble /dev/md1 manually and initiate a resync: $ mdadm -Af /dev/md1 /dev/sda5 /dev/sdb5 mdadm: failed to RUN_ARRAY /dev/md1: Input/output error The assemble process itself was successful, but I was unable to run the array. Surprisingly this command worked fine for both /dev/md0 and /dev/md2. I also tried several update-options (resync, summaries) and I tried to use /dev/sda5 without /dev/sdb5 and vice versa, but nothing worked. Every try was rejected producing an Input/output error. TO SOLVE THE PROBLEM I DID THE FOLLOWING Boot the "recovery system" from the openSUSE 10.3 DVD which uses another md-driver. It's Kernel did not complain at all and synchronized /dev/md1 automatically in the background without any user interaction. After completing the resync I did a normal boot into openSUSE 11.0 and this also worked as expected. /proc/mdstat showed that both disks were in sync and active. A check of the file-system turned out that there were no errors. MORE DETAILS I figured out that there is a difference in "drivers/md/bitmap.c" - which I think is responsible for the problem - between 10.3 and 11.0, resp. between their two kernel-versions. The new code (Kernel 2.6.25.4) (line 954) write_page(bitmap, page, 1); ret = -EIO; if (bitmap->flags & BITMAP_WRITE_ERROR) { /* release, page not in filemap yet */ put_page(page); goto err; } causes a jump to "err", while the old code (Kernel 2.6.22.5) (line 923) ret = write_page(bitmap, page, 1); if (ret) { /* release, page not in filemap yet */ put_page(page); goto out; } does not. I don't know in how far this behaviour is intended, but with regard to the fact that my array was automatically rebuild in the latter case, this may indicate a bug. I used the following versions: openSUSE 11.0 (Kernel 2.6.25.4) openSUSE 10.3 (Kernel 2.6.22.5) mdadm 2.6.4 mdadm 2.6.2 kind regards, A.Wandinger -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.