[Bug 630422] New: system freeze during remote file transfer using kcryptd
http://bugzilla.novell.com/show_bug.cgi?id=630422 http://bugzilla.novell.com/show_bug.cgi?id=630422#c0 Summary: system freeze during remote file transfer using kcryptd Classification: openSUSE Product: openSUSE 11.3 Version: Final Platform: i686 OS/Version: openSUSE 11.3 Status: NEW Severity: Normal Priority: P5 - None Component: Other AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: carlos.lange@ualberta.ca QAContact: qa@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100714 SUSE/3.5.11-0.1.1 Firefox/3.5.11 Installed 11.3 on my Lenovo X60 Tablet with an encrypted user home of 300GB. During file transfer from my desktop with rsync and ssh the system freezes after a short while. No particular messages in /var/log/messages or NetworkManager before the freeze. Tested with KDE and LXDE. Top shows alternating rsync, sshd and kcryptd on the top during the file transfer, but at freeze time kcryptd in not listed among the top 30 processes, so I assume it crashed and caused the freeze. Reproducible: Always Steps to Reproduce: 1.create large encrypted home (300GB) 2.transfer large amount of files with rsync over ssh 3.system freezes after a few minutes Just ran it under CLI mode (init 3) and the following error message was written above "top" while kcryptd was still the top job: [ 210.054005] Uhhuh. MNI received for unknown reason a1 on CPU0. [ 210.054005] You have some hardware problem, likely on the PCI bus. [ 210.054005] Dazed and confused, but trying to continue. System was running until yesterday with similar encrypted home under 11.1 without indication of hardware problem. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=630422 http://bugzilla.novell.com/show_bug.cgi?id=630422#c1 Carlos Lange <carlos.lange@ualberta.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|system freeze during remote |system freeze during remote |file transfer using kcryptd |file transfer --- Comment #1 from Carlos Lange <carlos.lange@ualberta.ca> 2010-08-11 17:42:19 UTC --- Same NMI happens during file transfer to a new user without encrypted home. So, I am changing the summary, since problem is not related with kcryptd. Any way I can test the PCI bus? -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=630422 http://bugzilla.novell.com/show_bug.cgi?id=630422#c yang xiaoyu <xyyang@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |xyyang@novell.com AssignedTo|bnc-team-screening@forge.pr |anicka@novell.com |ovo.novell.com | -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=630422 http://bugzilla.novell.com/show_bug.cgi?id=630422#c Anna Bernathova <anicka@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|anicka@novell.com |kernel-maintainers@forge.pr | |ovo.novell.com -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugzilla.novell.com/show_bug.cgi?id=630422 http://bugzilla.novell.com/show_bug.cgi?id=630422#c2 --- Comment #2 from Carlos Lange <carlos.lange@ualberta.ca> 2010-08-15 18:00:09 UTC --- I installed 11.1 back in a new partition and there was no error during installation and update and I was able to transfer 15 GB of data without incident or PCI bus error. So, I think 11.3 is reporting a hardware error incorrectly. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c3 --- Comment #3 from Jiri Slaby <jslaby@novell.com> 2010-08-17 18:02:52 UTC --- It might be that somebody tried to write to some weird address on the bus (e.g. incorrectly driven HW), so the PCI bus raised an exception. And this bug can be introduced even by changes done in 11.3. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c4 --- Comment #4 from Carlos Lange <carlos.lange@ualberta.ca> 2010-08-17 18:21:45 UTC --- More information that points to a software problem. I reinstalled 11.3 from scratch in my laptop and kept my home unencrypted. When I try to copy with scp from the remote machine to my laptop, I get every time the MNI error after approximately 2.4GB transfer. This time the reason reported is "b0 on CPU0". But when I scp from the laptop to bring the data from the remote machine, then I am able to transfer without any problems. The largest folder I copied was 4GB. And it was the same folder over the same place where the remote copy starting from the remote had failed. Definitely not a hard disk problem. How can I give more diagnostic? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c5 --- Comment #5 from Carlos Lange <carlos.lange@ualberta.ca> 2010-08-18 17:06:34 UTC --- Created an attachment (id=383902) --> (http://bugzilla.novell.com/attachment.cgi?id=383902) screen capture of fatal error message -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c6 --- Comment #6 from Carlos Lange <carlos.lange@ualberta.ca> 2010-08-18 17:07:43 UTC --- Sorry, the error is not as predictable as I thought in Comment #4. It also happened twice when starting the remote copy procedure from the local laptop. The last time I got a much longer error message, which mentions a CPU synchronization error and states "This is not a software problem!" Since the error message does not get saved to a file, I took a picture and I am sending it as attachment. The message says in part: Machine check: Processor context corrupt Kernel panic - not syncing: Fatal machine check on current CPU Pid: 0, comm: swapper Tainted: G M 2.6.34-12-desktop #1 I went back to the 11.1 installation and stress tested the machine copying my remote data several times over and I did not get any problems! (no errors related with the kernel in /var/log/messages.) The 11.1 installation uses kernel version 2.6.27.48-0.2. Is the new kernel seeing a real hardware problem that the old kernel does not see? Is there any boot flag I can set to make the new kernel more fault tolerant? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c7 --- Comment #7 from Carlos Lange <carlos.lange@ualberta.ca> 2010-09-02 14:33:15 UTC --- I updated the kernel to kernel-desktop-2.6.36-rc2 from Kernel:HEAD/openSUSE_11.3 and I was able to transfer 20GB without problems. However, KVM in this kernel does not recognize the tablet screen in my X60 Tablet, so I will test different types of the 2.6.34 kernel and see if one of them is stable. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c8 Carlos Lange <carlos.lange@ualberta.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|Other |Kernel --- Comment #8 from Carlos Lange <carlos.lange@ualberta.ca> 2010-09-02 17:17:57 UTC --- I noticed that I tried kernel-desktop flavour of 2.6.36, so I now also tested the desktop and compared with the default flavour of the kernel. Kernel error happened after 5GB transferred with the "default" version. No kernel error happened after >45GB transfer with the "desktop" version of the 2.6.34-12 kernel. And both .34 versions support well my tablet, so I am sticking with the desktop version and declaring this a kernel bug. I am willing to help find the culprit in the default kernel, if you want me to. Or else, we can declare the issue solved. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=630422 https://bugzilla.novell.com/show_bug.cgi?id=630422#c9 Carlos Lange <carlos.lange@ualberta.ca> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME --- Comment #9 from Carlos Lange <carlos.lange@ualberta.ca> 2011-02-26 22:55:35 UTC --- Works with the "desktop" flavour of the 2.6.34-12 kernel, as described above. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com