https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c0
Summary: NFS: soft lockup - CPU#0 stuck caused by simultaneous read/write on same file Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: x86-64 OS/Version: openSUSE 11.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: rhg@marxmeier.com QAContact: qa@suse.de Found By: --- Blocker: ---
Created an attachment (id=385013) --> (http://bugzilla.novell.com/attachment.cgi?id=385013) syslog messages after nfs soft lockup
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
I have a case where a soft lockup can be always reproduced.
System: HP Compaq dc5800 Microtower, Intel Core2 Duo CPU E8500 @ 3.16GHz openSUSE 11.3 x86_64, all patches applied Linux rhg-lx 2.6.34-12-default #1 SMP 2010-06-29 02:39:08 +0200 x86_64 x86_64 x86_64 GNU/Linux
To reproduce, a NFS mount is necessary.
Reproducible: Always
Steps to Reproduce: 1) I cd to a NFS mounted directory.
cd /some/nfs/mounted/directory
2) There, I do a recursive search for some text pattern with fgrep -r, such as:
fgrep -r 'Some Text' . >x1
As you see, by accident I write the fgrep result to a file which is located in the same directory where I do the recursive search. This is what causes the issue.
Note that the search pattern must match the text in at least some of the files located below the directory where I do the recursive search, so that something is written to the x1 file.
Almost immediately, the whole X session freezes, and after a minute a soft lockup message is written to the syslog (attached, see nfs-soft-lockup.txt file).
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c
Suresh Jayaraman sjayaraman@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c2
Suresh Jayaraman sjayaraman@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.pr |sjayaraman@novell.com |ovo.novell.com |
--- Comment #2 from Suresh Jayaraman sjayaraman@novell.com 2010-08-30 17:42:02 UTC --- Created an attachment (id=386295) --> (http://bugzilla.novell.com/attachment.cgi?id=386295) Proposed patch
Does the attached patch fixes the problem?
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c3
Jiri Slaby jslaby@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO InfoProvider| |rhg@marxmeier.com
--- Comment #3 from Jiri Slaby jslaby@novell.com 2010-08-31 09:44:01 UTC --- (In reply to comment #2)
Does the attached patch fixes the problem?
If I'm looking correctly, the patch is part of 2.6.34.1. Could you Roland try a kernel from: http://download.opensuse.org/repositories/Kernel:/openSUSE-11.3/openSUSE_11.... ?
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c4
--- Comment #4 from Suresh Jayaraman sjayaraman@novell.com 2010-08-31 09:50:04 UTC --- (In reply to comment #3)
(In reply to comment #2)
Does the attached patch fixes the problem?
If I'm looking correctly, the patch is part of 2.6.34.1. Could you Roland try a kernel from: http://download.opensuse.org/repositories/Kernel:/openSUSE-11.3/openSUSE_11.... ?
Yes, I seem to have missed the fact that this patch has made it to -stable kernel.
Please report if you are able to reproduce the problem with the recent 11.3 kernels from Comment #3.
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c5
--- Comment #5 from Roland Genske rhg@marxmeier.com 2010-08-31 09:57:47 UTC --- Thank you for your info, I will do this later today or tomorrow.
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c6
Roland Genske rhg@marxmeier.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED InfoProvider|rhg@marxmeier.com |
--- Comment #6 from Roland Genske rhg@marxmeier.com 2010-08-31 15:09:41 UTC --- The updated kernel version appears to solve the problem. I could no longer reproduce it.
uname -a Linux rhg-lx 2.6.34.4-6-default #1 SMP 2010-08-20 19:21:29 +0200 x86_64 x86_64 x86_64 GNU/Linux
Thank you very much for your quick help.
Questions: Is this kernel version safe to use in production? Is it supported, i.e., are security updates provided through YAST Online Update, given that I have included this repository in YAST?
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c7
--- Comment #7 from Jiri Slaby jslaby@novell.com 2010-08-31 15:13:28 UTC --- (In reply to comment #6)
Questions: Is this kernel version safe to use in production? Is it supported, i.e., are security updates provided through YAST Online Update, given that I have included this repository in YAST?
I wouldn't use this repo on production systems. These are daily snapshots which are not qa.
It was already submitted into the 11.3-test update repository, so should appear in official updates soon.
https://bugzilla.novell.com/show_bug.cgi?id=633978
https://bugzilla.novell.com/show_bug.cgi?id=633978#c8
Suresh Jayaraman sjayaraman@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED
--- Comment #8 from Suresh Jayaraman sjayaraman@novell.com 2010-09-01 05:03:29 UTC --- Marking this FIXED as the fix has already submitted to the repo. Please reopen if the problem re-appears in the upcoming update. Thanks!