https://bugzilla.novell.com/show_bug.cgi?id=478482
Summary: total freeze Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: 32bit OS/Version: openSUSE 11.1 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: gh55@heinzel.name QAContact: qa@suse.de Found By: ---
After updating 10.3 to 11.1 two totally different and independent computers freeze after some hours. My desktop computer (used to be running 24/7 on opensuse 10.3) freezes at about 5 to 6 am (different every day), with no trace in the logfiles. The time is different every day and no cronjob is running at this time. The laptop freezes erratically after a few hours with the CAPS LOCK LED blinking. Both computers were running fine with 10.3. After the freeze described here, power off is the only remedy. HELP!
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c1
--- Comment #1 from Gerhard Heinzel gh55@heinzel.name 2009-02-23 00:27:28 MST --- Created an attachment (id=274553) --> (https://bugzilla.novell.com/attachment.cgi?id=274553) hwinfo and process list when freeze occurs
included are - hwinfo_desktop of desktop computer - hwinfo_laptop of laptop - dump contains process list when freeze occurs these are last 3 entries from output of the following script
while date; do { date >>dump uptime >>dump top -b -n 1 >>dump sleep 60; }; done
https://bugzilla.novell.com/show_bug.cgi?id=478482
User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c2
Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P5 - None |P0 - Crit Sit CC| |Ulrich.Windl@rz.uni-regensb | |urg.de Severity|Critical |Blocker
--- Comment #2 from Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de 2009-02-23 01:04:26 MST --- I see the same problem: Four freezes within one hour. The system ran fine with 11.0 before. Three freezes happened while I was typing into a textarea of this bugzilla. There was disk activity in the background while I was typing, so I suspect it could be related to ReiserFS (which I'm using here). My home is mounted via /dev/mapper/cryptotab_loop0 (just in case it matters). To be able to type this report I've shut down cron and beagle. Raising the priority of this issue to max.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User stbinner@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c3
Stephan Binner stbinner@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Priority|P0 - Crit Sit |P5 - None Severity|Blocker |Major
--- Comment #3 from Stephan Binner stbinner@novell.com 2009-02-23 02:05:41 MST --- Please read http://en.opensuse.org/Bugs/Definitions
https://bugzilla.novell.com/show_bug.cgi?id=478482
User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c4
--- Comment #4 from Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de 2009-02-23 03:05:54 MST --- On my comment #2: The measures seemed to actually extend the uptime (no freeze since then) On comment #3: If the system freezes so frequently that you cannot even send a bug report before it freezes, it's a blocker (IMHO). BTW: Reports on Internet seem to give the impression that openSUSE 11.1 is completely instable. It may even affect to soon to come SLE 11.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c5
--- Comment #5 from Gerhard Heinzel gh55@heinzel.name 2009-02-26 00:16:08 MST --- Stopping the beagle demon doesn't help. Still freezing every night...
https://bugzilla.novell.com/show_bug.cgi?id=478482
User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c6
Libor Pecháček lpechacek@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO CC| |lpechacek@novell.com Info Provider| |gh55@heinzel.name Severity|Major |Critical
--- Comment #6 from Libor Pecháček lpechacek@novell.com 2009-02-26 01:52:40 MST --- Gerhard, Ulrich, would you be so kind as to attach your /var/log/messages to this bug? I'm looking for kernel oops there. Gerhard, if you switch to console 10, where kernel messages are logged, with Ctrl-Alt-F10 *before* the machine freezes, is there anything interesting after the freeze?
You can also take a look at http://en.opensuse.org/Bugs/Kernel for tips about hunting kernel bugs. TIA
https://bugzilla.novell.com/show_bug.cgi?id=478482
User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c7
--- Comment #7 from Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de 2009-02-26 05:53:12 MST --- In reply to comment #6: I'm not seeing any messages related to the freeze; specifically no OOPS. Also, a "freeze" means that I cannot even move the mouse any more, not to talk about console switching. On comment #3 and comment #4: Meanwhile it works even with cron re-enabled, and yesterday I even ran some file sharing. So far it works.
So I suspect this: On a uniprocessor (my machine) with ReiserFS (and encrypted partitions?) the kernel freezes while having disk activity (and maybe is short of memory). The problem could be interrupts disabled permanently, or simply some bad code (race condition).
https://bugzilla.novell.com/show_bug.cgi?id=478482
User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c8
--- Comment #8 from Libor Pecháček lpechacek@novell.com 2009-02-26 06:42:01 MST --- Ulrich, I understand that you cannot switch to console 10 after your machine freezes, for that reason I asked for switching there before the freeze to see kernel messages.
@Ulrich, would you be able to set up serial console or netconsole to capture the kernel messages, please?
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c9
Gerhard Heinzel gh55@heinzel.name changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|gh55@heinzel.name |
--- Comment #9 from Gerhard Heinzel gh55@heinzel.name 2009-02-27 00:28:27 MST --- Created an attachment (id=275923) --> (https://bugzilla.novell.com/attachment.cgi?id=275923) see text
In response to comment 6:
I attach: - /var/log/messages (from the last reboot to the freeze) - hwinfo - process list in the last minute before the freeze - xorg.conf
I had switched to the kernel message console with Ctrl-Alt-F10 last evening, but when I cam this morning the screen was black and no reaction on mouse, keyboard or anything else was possible, so I had to power down same as every morning in the last weeks.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c10
--- Comment #10 from Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de 2009-02-27 00:44:22 MST --- I have some news on this issue: Yesterday I installed the new kernel, and things look better now. In the boot messages I noticed a difference regarding "reiserfs: using flush barriers". After a little while (when beagle became active?), I noticed these messages in syslog: kernel: REISERFS warning (device dm-2): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (user.Beagle) associated with [1919251317 1634026030 0x656c67 UNKNOWN] kernel: REISERFS warning (device dm-2): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (user.Beagle) associated with [1919251317 1634026030 0x656c67 UNKNOWN] [...] So I conclude that there is some type of data corruption on ReiserFS mounted via cryptotab_loop0. That could mean: 1) The data got corrupted, but undetected in openSUSE 11.0 (unlikely IMHO) 2) The data got corrupted, causing a freeze in openSUSE 11.1 (which would be bad) 3) The data got corrupted as a consequence of the freeze (which would be also bad) 4) The old kernel did not detect (or report) this type of corruption, but froze instead
I don't know if the problem was really fixed in the new kernel, but obviously the problem is related to beagle-index, reiserfs, and cryptotab_loop
My /home is mounted as: /dev/mapper/cryptotab_loop0 on /home type reiserfs (rw,acl,user_xattr)
In reply to comment #8: I'll have to see whether the current kernel (2.6.27.19-3.2-pae #1 SMP 2009-02-25 15:40:44 +0100 i686 i686 i386 GNU/Linux) still freezes.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c11
--- Comment #11 from Gerhard Heinzel gh55@heinzel.name 2009-02-27 02:34:31 MST --- In reply to Comment #10: In may case, I have no encrypted file system, and stopping the beagle demon didn't help (comment #5, I had checked that ps -A|grep beagle gives nothing).
https://bugzilla.novell.com/show_bug.cgi?id=478482
User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c12
--- Comment #12 from Libor Pecháček lpechacek@novell.com 2009-02-27 03:20:10 MST --- Many thanks for the update Ulrich!
Gerhard, I can see Beagle daemon running on your system too (PID 4445 in the log attached in comment #9, probably run from /etc/cron.daily). I can also see that some ReiserFS filesystem is most probably mounted. I am also interested in the output of mount(8) command on your host to see what FSs are mounted and where.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c13
--- Comment #13 from Gerhard Heinzel gh55@heinzel.name 2009-02-27 03:23:23 MST --- Yes, I had stooped beagle for one night but restarted after it didn't prevent the freeze.
ghh@HWS125:~> mount /dev/sda4 on / type reiserfs (rw,acl,user_xattr) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/sda2 on /tmp type reiserfs (rw,acl,user_xattr) /dev/sda3 on /var type reiserfs (rw,acl,user_xattr) fusectl on /sys/fs/fuse/connections type fusectl (rw) /proc on /var/lib/ntp/proc type proc (ro)
https://bugzilla.novell.com/show_bug.cgi?id=478482
User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c14
Libor Pecháček lpechacek@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |gh55@heinzel.name
--- Comment #14 from Libor Pecháček lpechacek@novell.com 2009-02-27 05:47:51 MST --- Sounds much like ReiserFS+xattr issue. Ulrich, could your problem be related to bug 448007?
Gerhard, what is your kernel package version and release, please?
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c15
Gerhard Heinzel gh55@heinzel.name changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|gh55@heinzel.name |
--- Comment #15 from Gerhard Heinzel gh55@heinzel.name 2009-02-27 09:15:43 MST --- I upgraded thi morning to
uname -a
Linux HWS125 2.6.27.19-3.2-pae #1 SMP 2009-02-25 15:40:44 +0100 i686 i686 i386 GNU/Linux
https://bugzilla.novell.com/show_bug.cgi?id=478482
User gh55@heinzel.name added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c16
--- Comment #16 from Gerhard Heinzel gh55@heinzel.name 2009-02-27 09:57:37 MST --- The laptop which froze twice today was not updated and is running
Linux ghhnote 2.6.27.7-9-pae #1 SMP 2008-12-04 18:10:04 +0100 i686 i686 i386 GNU/Linux
I am pretty sure that this is what was installed on the desktop as well until this morning.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User Ulrich.Windl@rz.uni-regensburg.de added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c17
--- Comment #17 from Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de 2009-03-02 00:44:30 MST --- In reply to comment #14: Yes the problem described in bug 448007 sounds a lot like this one.
https://bugzilla.novell.com/show_bug.cgi?id=478482
User lpechacek@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=478482#c18
Libor Pecháček lpechacek@novell.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE
--- Comment #18 from Libor Pecháček lpechacek@novell.com 2009-03-02 02:09:27 MST --- OK, I'll close this bug as duplicate then. Feel free to reopen bug 448007 if you observe similar freezes with the latest kernel.
Please open a new report for the problem described in comment 10, if it cannot be fixed with reiserfsck.
Thanks!
*** This bug has been marked as a duplicate of bug 448007 *** https://bugzilla.novell.com/show_bug.cgi?id=448007