[Bug 448007] New: 11.1 Beta 5.2 hard-locks.
https://bugzilla.novell.com/show_bug.cgi?id=448007 Summary: 11.1 Beta 5.2 hard-locks. Product: openSUSE 11.1 Version: Factory Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: robin.listas@telefonica.net QAContact: qa@suse.de Found By: --- The machine had been up for about an hour (doing nothing for half an hour while I made tea and beagle run in the backgroun). Gnome, gkrelm, 4 terms, thunderbird, firefox. I was going to report on some bugzillas that are waiting my feedback, when I was hit by bug 444780 (no firefox master password). So, instead, I started reading and then reporting on that one. Lock! Full system lock. No mouse, no clock update, keyboard caps-lock key does not activate led. Router led for this machine does not blink (ie, no network activity). No logs. No nothing. Root is reiserfs. This machine is rock solid under 11.0 For me, this blocks any further factory testing till solved. Sorry. This is also my production machine and I can't risk data loss. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User aj@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c1
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c2
--- Comment #2 from Carlos Robinson
Do you have any additional information about this? Please read the information at http://en.opensuse.org/Bugs/Kernel on how to give it.
Notice that the machine is rock solid under 11.0, it is factory that crashes. Nothing there says how I can generate more info for a lock (Soft lockups and Hard lockups paragraphs). It is locked! There is no way to obtain info that I know of. The machine is entirely frozen.
With the information given I fear we cannot help at all.
If you tell me what exactly I can do, I'll try. For example, do you think that the kernel can be made to automatically log everything to a serial console, even if it crashes (or at least up to the crash)? It is not documented in your link above, it talks about oops. Was the problem that froze 11.0 systems with reiserfs after beagle activity entirely solved? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c3
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c4
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c5
Carlos Robinson
Are you able to reproduce this using RC2?
Er... how do I know if I have RC2? Afaik what I have is RC1 plus "zypper up" updates. I have "kernel-pae-2.6.27.7-4.4", which reports itself as built on Fri Nov 28 18:16:56 2008 NOT_nimrodel:~ # cat /etc/SuSE-release /etc/SuSE-brand openSUSE 11.1 (i586) VERSION = 11.1 openSUSE VERSION = 11.1 I can try with what I have, but I still do not know how to get useful info, oops or whatever from a locked machine. I have another machine here which I could connect via serial port, but I do not know how to tell the kernel to dump all klogd info there - if that is what is needed. I need help there. I did a kernel recompile (of the previous kernel) changing an option so that in case of locks the kernel logs extra info. But how can I read it? It locks! Do I configure inittab to accept a serial port connection, log in, and then run - what do I run? When I programmed on MsDos, we had a remote debugger via serial port. Does anything similar exists for the kernel that I could use? The only info I can currently produce is learning whether it locks within a reasonable time of usage. Even if it doesn't lock within three hours it is no proof. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c6
--- Comment #6 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User aj@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c7
Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c8
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c9
--- Comment #9 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c10
--- Comment #10 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User moby@pcsn.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c11
Mobeen Azhar
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c12
--- Comment #12 from Carlos Robinson
If you wouldn't mind capturing it once. I know it seems obvious because it keeps happening to you, but I've been trying to reproduce this unsuccessfully for months. I converted my workstation back to using all reiserfs filesystems, with Beagle, etc running and I couldn't reproduce it. Some users hit it multiple times a day, others not at all.
My main reason for asking for a trace is to ensure this actually *is* the bug that I'm already chasing instead of a different regression.
Ok, I will have a go tomorrow, time permitting. It is 3:43 AM here, and I'm to sleepy to fiddle with serial cables in the back of the computers, plus serial port settings in the console. O:-) Hope I find the cable... it is ages since I do this. It will be interesting :-) Question: You say: "hook up the serial link and boot with console=tty0 console=ttyS0,<speed>". Where do I type that, grub kernel line perhaps? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c13
--- Comment #13 from Carlos Robinson
Hope I find the cable... it is ages since I do this. It will be interesting :-)
I have the cable connected, both sides with minicom, and talking. So I'm ready. But I have another problem:
Question:
You say: "hook up the serial link and boot with console=tty0 console=ttyS0,<speed>".
Where do I type that, grub kernel line perhaps?
The problem is that I get no grub menu, so I can't enter any option to the kernel. Time ago I hibernated the machine, hibernation crashed on restore (bugzilla somenumber, solved), and now grub is set to auto-boot from the default kernel entry or something, without even displaying the grub menu. http://lists.opensuse.org/opensuse/2008-12/msg01626.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c14
--- Comment #14 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c15
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c16
--- Comment #16 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User auxsvr@yahoo.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c17
John McManaman
https://bugzilla.novell.com/show_bug.cgi?id=448007
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User msa@gaugusch.at added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c18
Markus Gaugusch
https://bugzilla.novell.com/show_bug.cgi?id=448007
User wied@x42.info added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c19
Markus Wied
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c20
--- Comment #20 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c21
--- Comment #21 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User klaus@vink-slott.dk added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c22
--- Comment #22 from Klaus Slott
https://bugzilla.novell.com/show_bug.cgi?id=448007
User klaus@vink-slott.dk added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c23
Klaus Slott
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c24
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c25
--- Comment #25 from Carlos Robinson
Klaus, your problem is the same one as reported in bug #399966.
Carlos, if it's just hard hanging like that, it's probably the same problem as reported in comment #18.
I don't get kernel oopses. It simply freezes.
I have a fix for that already in the git repository, and it will be included in the next update (soon). Please remember to do "klogconsole r0 -l8"
I did that.
when using the serial console. I don't think your minicom log contained all of the kernel messages.
That log is all that came out. You can see the firewall messages still getting logged, an indication that the kernel is still somewhat alive, _after_ the lock-crash.
In the interim, could both of you try reproducing with kernel-debug from http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/ ?
This bug has been reported several times in different products and a comment in one of the other reports leads me to believe that memory corruption is the problem with the BUG at fs/reiserfs/journal.c:10xx reports. The -debug kernel enables SLAB debugging. The kotd kernel also contains the fix for comment #18.
If you're using external modules, you'll run into some symbol dependency problems that will need to be resolved. That can't be helped.
I tried: zypper ar "http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i586/" "KOD_head" but it doesn't work: NOT_nimrodel:~ # zypper se --sort-by-repo --details kernel-debug Error building the cache: [KOD_head|http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/i586/] Repository type can't be determined. Warning: Disabling repository 'KOD_head' because of the above error. Loading repository data... Reading installed packages... S | Name | Type | Version | Arch | Repository --+--------------------+---------+--------------+------+----------- | kernel-debug | package | 2.6.27.7-9.1 | i586 | Main-OSS | kernel-debug-base | package | 2.6.27.7-9.1 | i586 | Main-OSS | kernel-debug-extra | package | 2.6.27.7-9.1 | i586 | Main-OSS which means I will have to download and install manually the needed packages by rpm. What are the exact packages I need to install? This is what I have installed: NOT_nimrodel:~ # rpm -qa | grep -i kernel kernel-pae-base-2.6.27.7-9.1 linux-kernel-headers-2.6.27-2.28 kernel-pae-extra-2.6.27.7-9.1 kernel-pae-2.6.27.7-9.1 kernel-source-2.6.27.7-9.1 kerneloops-0.12-29.11 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c26
--- Comment #26 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c27
--- Comment #27 from Carlos Robinson
The ftp site isn't an actual repository.
http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_Factory/ contains a repository, but it's synced daily instead of after each commit.
Ok, I can install from http://ftp.suse.com/pub/projects/kernel/kotd/HEAD/, but please some one tell me the list of rpms I have to install. I simply do not know, because I don't see the exact names I have currently installed. I do not know what you want me to install.
If it's just locking up, it could still be the same thing as comment #18. That describes a recursive spinlock. I'm not sure why it wouldn't output the oops, but the symptom is the same.
My symptoms is a complete apparent freeze. Remote sessions freeze, local session freeze. Only thing that keeps working is the firewall. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c28
--- Comment #28 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c29
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c30
--- Comment #30 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c31
--- Comment #31 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User aj@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c32
--- Comment #32 from Andreas Jaeger
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c33
--- Comment #33 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c34
--- Comment #34 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c35
--- Comment #35 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c36
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c37
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c38
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c39
--- Comment #39 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c40
--- Comment #40 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c41
--- Comment #41 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c42
--- Comment #42 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c43
--- Comment #43 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c44
--- Comment #44 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c45
Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c46
--- Comment #46 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c47
--- Comment #47 from Carlos Robinson
Ok, that confirms that you were running into the problem in comment #18, that has since been fixed. I suggest you run a reiserfsck on that partition.
[1919251317 1634026030 0x656c67 UNKNOWN] ... is very wrong.
You mean that I have an error in the filesystem, and that was triggering the fault? It is possible, that as I had so many crashes, there have been errors there, but surely, every time I return to 11.0 it runs an fsck on that partition (I see it flashing past). But I admit that a manual reiserfsck detects more errors than the automatic, on-boot, fsck. And... do you really want me to correct the error? I ask because once I get it corrected, I might not be able to reproduce this bug and check if it is really solved or not. I don't mind withholding the fsck some time, this is a test partition. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c48
Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c49
--- Comment #49 from Carlos Robinson
https://bugzilla.novell.com/show_bug.cgi?id=448007
User robin.listas@telefonica.net added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c50
--- Comment #50 from Carlos Robinson
Ok, that confirms that you were running into the problem in comment #18, that has since been fixed. I suggest you run a reiserfsck on that partition.
[1919251317 1634026030 0x656c67 UNKNOWN] ... is very wrong.
It appears there is no corruption in hdd14: nimrodel:~ # time nice reiserfsck /dev/hdd14 reiserfsck 3.6.19 (2003 www.namesys.com) .. Will read-only check consistency of the filesystem on /dev/hdd14 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Tue Feb 3 23:52:58 2009 ########### Replaying journal: Done. Reiserfs journal '/dev/hdd14' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. finished Comparing bitmaps..finished Checking Semantic tree: finished No corruptions found There are on the filesystem: Leaves 41984 Internal nodes 287 Directories 26308 Other files 239580 Data block pointers 2078827 (3168 of them are zero) Safe links 0 ########### reiserfsck finished at Wed Feb 4 00:00:36 2009 ########### real 7m40.966s user 0m28.842s sys 0m19.257s nimrodel:~ # -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c51
--- Comment #51 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User Ulrich.Windl@rz.uni-regensburg.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c52
Ulrich Windl
https://bugzilla.novell.com/show_bug.cgi?id=448007
User alvinga@gmail.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c53
alvin george
https://bugzilla.novell.com/show_bug.cgi?id=448007
User lpechacek@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c54
Libor Pecháček
https://bugzilla.novell.com/show_bug.cgi?id=448007
User lpechacek@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c55
Libor Pecháček
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c56
--- Comment #56 from Jeff Mahoney
https://bugzilla.novell.com/show_bug.cgi?id=448007
User Ulrich.Windl@rz.uni-regensburg.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c57
Ulrich Windl
https://bugzilla.novell.com/show_bug.cgi?id=448007
User Ulrich.Windl@rz.uni-regensburg.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c58
--- Comment #58 from Ulrich Windl
EIP; d893a38b
<=====
EBX; 00209d50
Trace; d893cdbb
EIP; d893a38b
<=====
3 warnings issued. Results may not be reliable. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=448007
User Ulrich.Windl@rz.uni-regensburg.de added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c59
--- Comment #59 from Ulrich Windl
https://bugzilla.novell.com/show_bug.cgi?id=448007
User jeffm@novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=448007#c60
Jeff Mahoney
participants (1)
-
bugzilla_noreply@novell.com