[Bug 568319] New: DM lockup causes filesystem failure
http://bugzilla.novell.com/show_bug.cgi?id=568319 http://bugzilla.novell.com/show_bug.cgi?id=568319#c0 Summary: DM lockup causes filesystem failure Classification: openSUSE Product: openSUSE 11.2 Version: Final Platform: i686 OS/Version: openSUSE 11.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: carlos.e.r@opensuse.org QAContact: qa@suse.de Found By: --- Blocker: --- I reported Bug 567912 upstream, to SGI: http://oss.sgi.com/bugzilla/show_bug.cgi?id=860 They say it is not an XFS problem, but a DM lockup, a DM bug. Will you please have a look at it? This problem was reported on 2007 and not solved yet. Older info: Bug 345039 Guess: it is caused by a suse/novell addition to the kernel: Per Comment 43 there kernel vanilla did not have this problem. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c
Jeff Mahoney
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c1
--- Comment #1 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c
Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c2
Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c3
--- Comment #3 from Carlos Robinson
Does your setup have loop devices for files or disk/partition? If file-backed loop device, is it over XFS?
Yes and yes. I wrote the setup on a previous report, but I'll copy it again here. /etc/crypttab: crmm_dvd_f1x /mnt/TMPBig/imgs/crypta_f1_dvd.mm.xfs none noauto /etc/fstab: /dev/mapper/crmm_dvd_f1x /mnt/crypta.mm_dvd1.x xfs \ noatime,noauto,nofail,nobarrier 1 5 This means that they are loop mounted. And /mnt/TMPBig is xfs, too: bombadillo:~ # mount | grep /mnt/TMPBig /dev/sda9 on /mnt/TMPBig type xfs (rw,nosuid,nodev,_netdev,noatime,nodiratime)
Also earlier vanilla kernel didn't support barriers in loop devices. Barrier support for loop driver was added to mainline kernel in ~2.6.29.
(remember that this problem appeared first on oS 10.3)
So can you test with vanilla kernel in 11.2?
Both vanilla and default are 2.6.31.8, so shouldn't default kernel have that support included? Ok, I'll try. Perhaps this weekend, earlier if I have time. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c4
Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c5
--- Comment #5 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c6
--- Comment #6 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c7
Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c8
--- Comment #8 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c9
--- Comment #9 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c10
--- Comment #10 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c11
--- Comment #11 from Carlos Robinson
A kernel dump? I'm investigating how I could do that. For one thing, I only have an ADSL internet connection, upload speed is 300 bits per second.
Wow! Not 300 bits/s, but 300 kbit/s. I guess I'm not to be trusted too much when I'm tired O:-) (In reply to comment #10)
Yes kernel dump would be huge and let us not pursue that route now.
If after all it is needed, I could burn it to a dvd and mail it.
If you can provide more info on the files being opened by mld_hash and par2, that could help my understanding a bit.
Video, avi files. Par2 is very cpu intensive, mld_hash not so. Previously (oS 10.3, 11.0, 11.1) I had this problem with simply copying files, any files. Now it is harder to reproduce, but I'll see if it does by simply copying files. That would be easier to track.
Expecially whether they are working on files in the underlying xfs mount or the exncrypted xfs mount. Just output of commands when you hit this deadlock,
The working should be on the encrypted xfs filesystem. They reside on files (of exactly dvd size), loop mounted, on an underlaying xfs filesystem. I will get the output of the commands this weekend, right now the system is busy and I can't risk crashing it at this moment. I can also provide the exact procedure I used to create these filesystems (at the moment, it is in a mail in Spanish).
mount dmsetup table lsof
This weekend, time permitting :-)
Also whether you can continue to read/write to the underlying xfs partition? Say create a new file on underlying xfs/encrypted xfs after the deadlock?
As far as I remember, no, it locks. I can't even umount them, I have to use: umount -l mountpoint & or it locks that xterm or console.
Thanks for taking the effort to report this issue and providing good amount of info.
And thank you for trying to solve it, I appreciate it. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c12
--- Comment #12 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c13
Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c14
--- Comment #14 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c15
--- Comment #15 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c16
--- Comment #16 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c17
--- Comment #17 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c18
--- Comment #18 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c19
Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c20
--- Comment #20 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c21
Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c22
--- Comment #22 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c23
--- Comment #23 from Carlos Robinson
Ah.. I am very very sorry. I hope you have not lost any unrecoverable data. Thanks a lot for testing this. At least we have identified the cause of the deadlock. May be the fix or my back-porting has some bug, that lead to the crash. We should report this to sgi and work for a proper fix to xfs. Please attach the log file.
Once again sorry for the trouble.
I haven't lost valuable data, because that was one of my 3 test partitions (the final setup is not installed yet). A day lost reinstalling and reconfiguring, yes. As I was testing things, I hadn't made a backup. I attach the log. The last line is not logged, my halt.local file has this code: MESSAGE=`uptime` # syslog may not be working DATE=`date --rfc-3339=seconds` echo "$DATE - Halting the system now - uptime: $MESSAGE" >> /var/log/messages and that line is missing from the log. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c24
--- Comment #24 from Carlos Robinson
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c25
--- Comment #25 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c26
--- Comment #26 from Nikanth K
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c29
--- Comment #29 from Carlos Robinson
commit 77d7a0c2eeb285c9069e15396703d0cb9690ac50 in the xfs-dev tree has been queued for .33
I have committed the patch to openSUSE-11.2 and should be available in the next maintenance update.
Thankyou! However, before testing that one, I will have to create a full backup... -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c30
Swamp Workflow Management
http://bugzilla.novell.com/show_bug.cgi?id=568319
http://bugzilla.novell.com/show_bug.cgi?id=568319#c31
--- Comment #31 from Nikanth K
participants (1)
-
bugzilla_noreply@novell.com