[Bug 917411] New: with kernel 3.16.7-7.1 system freezes
http://bugzilla.opensuse.org/show_bug.cgi?id=917411 Bug ID: 917411 Summary: with kernel 3.16.7-7.1 system freezes Classification: openSUSE Product: openSUSE Distribution Version: 13.2 Hardware: Other OS: Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: bluedzins@wp.pl QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- After upgrading to 3.16.7-7.1 (64-bit) from default version (shipped with OS 13.2) I encounter total freezes. One I accounted for accident (it happens), but the second one in a week is too much for accident. Programs running at the moment of freeze: Firefox, VirtualBox, Geany, Konsole (KDE3), maybe Kmail (KDE3), VLC with KDE3.5 as desktop. First freeze occured when I working with VBox, the second with Firefox. The report might be related to: https://bugzilla.opensuse.org/show_bug.cgi?id=913590 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #3 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #4 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #6 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #7 from Takashi Iwai
I wrote "freeze", not "crash" -- those are 2 different things.
Then you have to write more clearly. Does the system respond to anything, e.g. remote login or response to network ping?
Where can I find kernel trace.
In the kernel messages. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
I wrote "freeze", not "crash" -- those are 2 different things.
Then you have to write more clearly.
Please give me an example, because obviously "freeze" for denoting "freeze" was not clear enough. I am closing this report, I don't know how to perform the task you ask for, besides after 3 freezes I am not risking another one. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #9 from Takashi Iwai
I wrote "freeze", not "crash" -- those are 2 different things.
Then you have to write more clearly.
Please give me an example, because obviously "freeze" for denoting "freeze" was not clear enough.
What's needed is what kind of freeze you got. For example, if the machine doesn't respond to network ping, it's likely a complete freeze. Meanwhile, often a desktop problem is just a graphics freeze, and the remote login still works. Or, at least, if ping returns, it implies that the kernel still alives but the desktop environment is frozen by some reasons. In the case of complete freeze, usually it's a kernel panic, and the kernel always tries to leave some dying messages. If you have luck, it'll be recorded and left intact in the kernel messages on disk. If you have no lock, you'll have to catch the message remotely, at best via a serial console, or via netconsole, etc. If you have no such capability, at least, try to enable the magic sysrq in sysctl (via /usr/lib/sysctl.d/*). Set kernel.sysrq=1 will enable the all sysrq controls. Then, at freeze, give alt-sysrq-s to sync, alt-sysrq-t to get the stack trace of all running tasks. Also, you can try alt-sysrq-k, alt-sysrq-u, and finally alt-sysrq-b to reboot. This might leave something better on disk, again, if you have luck.
I am closing this report, I don't know how to perform the task you ask for, besides after 3 freezes I am not risking another one.
OK, feel free to reopen if you face the same problem and have some more information. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
For example, if the machine doesn't respond to network ping, it's likely a complete freeze.
Ok, I do it next time.
and the remote login still works.
Errm, I would not like to open remote loging, because the freeze is not that often.
In the case of complete freeze, usually it's a kernel panic, and the kernel always tries to leave some dying messages.
Where? journalctl does not show anything related. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #11 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
Takashi Iwai
I am sorry, but I have to reopen it, because the system keeps freezing. What I found out so far: * the old kernel also freezes
OK, this is very good to know at least. Did you have a working kernel beforehand?
* journalctl does not show any error or trace of problem, simply there is no messages during "freeze" period (until hard reset, reboot is recorded, but this message could come from next boot)
As mentioned, here we need to know how "hard" the freeze is.
For example, if the machine doesn't respond to network ping, it's likely a complete freeze.
Ok, I do it next time.
and the remote login still works.
Errm, I would not like to open remote loging, because the freeze is not that often.
I didn't ask to keep opening the remote session. Instead, just try remote login when the machine hangs.
In the case of complete freeze, usually it's a kernel panic, and the kernel always tries to leave some dying messages.
Where? journalctl does not show anything related.
It's usually in journal. In a better case, you could get via serial console. Or, even via netconsole, sometimes. (In reply to macias - from comment #11)
PS. When you write about SysRq you have console in mind? Because without freeze I tested out those shortcuts, and they work in console, but NOT in GUI desktop. SysRq is commented out by default, so there is no such key at all:
https://github.com/xkbcommon/libxkbcommon/blob/master/test/data/keycodes/ evdev
And when system freezes I cannot switch to console.
I guess you misunderstood SysRq. What's here referred to is the "Magic SysRq" keys. The key combo Alt+SysRq+something (on a laptop, it's often with Alt+Fn+Print+something) is captured directly by kernel for special tasks. So it must work no matter which desktop GUI is used. Note that not all sysrq key combos are enabled as default, as already mentioned. This has to be enabled via sysctl.conf stuff. BTW, are you using BTRFS? If so, is the disk space still really free? A frequent seen problem with btrfs is that snapshots exhaust the free disk space without noticing. This leads to the system crash, and you won't see always the log because it can't write more. Try to remove old snapshots in the case of btrfs, to be sure. In anwyay, please clear NEEDINFO once when you get more information. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #13 from macias -
Did you have a working kernel beforehand?
Yes, the same machine on previous installation. I installed OS 13.2 as soon as it was launched, no problem for ~4 months, then I reinstalled it, and give or take few packages, the only major difference is now I used encrypted LVM partition for root. Before reinstall, it worked flawlessly, now it freezes.
I guess you misunderstood SysRq. What's here referred to is the "Magic SysRq" keys. The key combo Alt+SysRq+something (on a laptop, it's often with Alt+Fn+Print+something) is captured directly by kernel for special tasks. So it must work no matter which desktop GUI is used.
Should it work when computer is normally working? Because it does not work, I tested it. I get some output only in console, in GUI it was like I didn't press it at all.
Note that not all sysrq key combos are enabled as default, as already mentioned. This has to be enabled via sysctl.conf stuff.
Already done.
BTW, are you using BTRFS?
No, no. Encryption, LVM, ext3/ext4, that's all. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #14 from Takashi Iwai
Did you have a working kernel beforehand?
Yes, the same machine on previous installation. I installed OS 13.2 as soon as it was launched, no problem for ~4 months, then I reinstalled it, and give or take few packages, the only major difference is now I used encrypted LVM partition for root. Before reinstall, it worked flawlessly, now it freezes.
The question is what triggered the problem. What's the exact difference from the previous installation?
I guess you misunderstood SysRq. What's here referred to is the "Magic SysRq" keys. The key combo Alt+SysRq+something (on a laptop, it's often with Alt+Fn+Print+something) is captured directly by kernel for special tasks. So it must work no matter which desktop GUI is used.
Should it work when computer is normally working? Because it does not work, I tested it. I get some output only in console, in GUI it was like I didn't press it at all.
Yes, magic sysrq works always. If not, you did something wrong. It doesn't always *show* on console, of course. The kernel log is written to its ring buffer and the daemon logs it to a file and/or show to the console. But the message itself msut be recorded in the ring buffer. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #15 from macias -
The question is what triggered the problem. What's the exact difference from the previous installation?
See attachment.
Yes, magic sysrq works always. If not, you did something wrong.
What did I do wrong? I just enabled it to level 1, it works in console.
It doesn't always *show* on console, of course. The kernel log is written to its ring buffer and the daemon logs it to a file
Is it shown using journalctl? If yes, I have nothing in it (I use alt+sysrq+t to dump the trace). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #16 from Takashi Iwai
Created attachment 623670 [details] old/new packages diff
The question is what triggered the problem. What's the exact difference from the previous installation?
See attachment.
Is this the only difference? Is the disk setup identical?
Yes, magic sysrq works always. If not, you did something wrong.
What did I do wrong? I just enabled it to level 1, it works in console.
No idea. I don't have your machine :)
It doesn't always *show* on console, of course. The kernel log is written to its ring buffer and the daemon logs it to a file
Is it shown using journalctl? If yes, I have nothing in it (I use alt+sysrq+t to dump the trace).
Doesn't it appear in dmesg, too? Also, what about alt-sysrq-s? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #17 from macias -
See attachment.
Is this the only difference? Is the disk setup identical?
As I wrote except for root partition it is identical. Root partition has the same size and placement, it was simply reformated from ext3 to encrypted+LVM+ext3. That's the only difference. If logs are missing something please let me know, I can attach screenshot from partitioner or anything helpful for you.
Also, what about alt-sysrq-s?
Finally something here. The alt here is the physical alt key, not mapped. Both dmesg and journalctl show the info. Ok, so I am prepared for freeze now :-) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #19 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #20 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #21 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #22 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #23 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #24 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #25 from macias -
Run "dmesg -n 8" once.
What was the default level of dmesg? I googled but cannot find it, and I would like to get back to the previous mode (since nothing useful is logged anyway). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #26 from Takashi Iwai
Another freeze, nothing in logs. This freeze was w/o radiotray running, however with mplayer.
I will continue not using radiotray and minimize multimedia players use (I just deleted flash player).
If it happens during media playback, a suspected culprit is the video driver, of course. What video driver are you using? If it's i915, there has been a fix for i915 freeze recently. Did you ever test with a newer upstream kernel (3.18 / 3.19)? Also, what about the newer openSUSE-13.2 kernel in OBS Kernel:openSUSE-13.2 repo? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #27 from macias -
If it happens during media playback,
Never when watching a movie, only (if) listening to some music. So there was no displaying video involved.
a suspected culprit is the video driver, of course. What video driver are you using?
How do I check this? # lspci 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) 00:1a.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 00:1a.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 00:1a.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 00:1a.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 00:1b.0 Audio device: Intel Corporation 82801JI (ICH10 Family) HD Audio Controller 00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1 00:1c.3 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 4 00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5 00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6 00:1d.0 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 00:1d.1 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 00:1d.2 USB controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 00:1d.7 USB controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller 01:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9400 GT] (rev a1) 02:00.0 Ethernet controller: Qualcomm Atheros AR8121/AR8113/AR8114 Gigabit or Fast Ethernet (rev b0) 03:00.0 SATA controller: JMicron Technology Corp. JMB361 AHCI/IDE (rev 02) 03:00.1 IDE interface: JMicron Technology Corp. JMB361 AHCI/IDE (rev 02) 04:00.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6315 Series Firewire Controller 06:00.0 Ethernet controller: Qualcomm Atheros AR5212/AR5213 Wireless Network Adapter (rev 01)
Did you ever test with a newer upstream kernel (3.18 / 3.19)?
As I wrote, yes, those 2-3 last freezes was on 3.19 in debug mode.
Also, what about the newer openSUSE-13.2 kernel in OBS Kernel:openSUSE-13.2 repo?
What do you mean, there is something even newer than 3.19? I can only see 3.16 there. http://download.opensuse.org/repositories/Kernel:/openSUSE-13.2/standard/x86... How to turn back dmesg to its default level? I.e. I know how to set it, I don't what level of messages was the default on installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #28 from Takashi Iwai
If it happens during media playback,
Never when watching a movie, only (if) listening to some music. So there was no displaying video involved.
The video is always involved as long as the device exists, no matter whether you see it or not. But, for checking whether it's really an audio issue, it's simpler. Try blacklist the audio driver (snd-hda-intel) and test without audio for a while whether you get the same problem.
a suspected culprit is the video driver, of course. What video driver are you using?
How do I check this?
Check for "VGA".
01:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9400 GT] (rev a1)
So it's Nvidia. Are you using nouveau driver, or nvidia binary-only driver? In either way, it's too bad for debugging. It's almost impossible to debug these (especially without logs).
Did you ever test with a newer upstream kernel (3.18 / 3.19)?
As I wrote, yes, those 2-3 last freezes was on 3.19 in debug mode.
OK.
Also, what about the newer openSUSE-13.2 kernel in OBS Kernel:openSUSE-13.2 repo?
What do you mean, there is something even newer than 3.19? I can only see 3.16 there.
http://download.opensuse.org/repositories/Kernel:/openSUSE-13.2/standard/ x86_64/
No, I asked it just because I couldn't find any text you were testing with 3.19 for the final results.
How to turn back dmesg to its default level? I.e. I know how to set it, I don't what level of messages was the default on installation.
Just drop the dmesg command you added. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #29 from macias -
Try blacklist the audio driver (snd-hda-intel) and test without audio for a while whether you get the same problem.
I can do only on weekends, but sure, I will do. What I should execute to disable it, and then to enable it.
01:00.0 VGA compatible controller: NVIDIA Corporation G96 [GeForce 9400 GT] (rev a1)
So it's Nvidia. Are you using nouveau driver, or nvidia binary-only driver? In either way, it's too bad for debugging. It's almost impossible to debug these (especially without logs).
Ok, but we are forgetting, I didn't change hardware or video setup. For several months this video card with the driver was OK (audio card didn't change as well, but the setup might have changed). I was/am using nouveau driver, nvidia is nowhere on the list.
How to turn back dmesg to its default level? I.e. I know how to set it, I don't what level of messages was the default on installation.
Just drop the dmesg command you added.
Wait, there is something wrong here. You asked me to 'Run "dmesg -n 8" once'. So I run dmesg once, and restarted computer. I didn't add it anywhere. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #30 from Takashi Iwai
How to turn back dmesg to its default level? I.e. I know how to set it, I don't what level of messages was the default on installation.
Just drop the dmesg command you added.
Wait, there is something wrong here. You asked me to 'Run "dmesg -n 8" once'. So I run dmesg once, and restarted computer. I didn't add it anywhere.
I meant to run the command once per boot. The setup doesn't retain after reboot. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #31 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #32 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #33 from Takashi Iwai
Ok, weekend is coming, how do I disable and then reenable audio?
Blacklisting the sound driver modules should suffice for disabling (at the next boot). e.g. put blacklist snd-hda-intel to somewhere in /etc/modprobe.d/*.conf. Remove the blacklist for reenabling (at the next boot). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #34 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #35 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #36 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #37 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #38 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #39 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #40 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #41 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #42 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #43 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #44 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #45 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #47 from macias -
- Try the newer upstream kernel, available in OBS Kernel:stable repo. You have a better chance there, as often the bug has been already fixed in the upstream.
I already tried that kernel (3.19), but OK.
- Enable kdump to get the crash vmcore and dmesg. Install yast2-kdump, and enable kdump. It might be that yast2 kdump module can't set the right kernel parameter. In that case, edit /etc/default/grub manually and append a boot option like "crashkernel=256M".
Yast did everything automatically -- I have in total: showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe crashkernel=512M-:256M
After the crash, you'll get dmesg.txt in the crash dump directory (if you're lucky).
Well, it is not a crash, but we will see... I keep my fingers crossed ;-). -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #49 from Takashi Iwai
Yast did everything automatically -- I have in total: showopts apm=off noresume edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe crashkernel=512M-:256M
This is for the failsafe mode. Does the normal boot options also have crashkernel option, too? Just to be sure. If crashkernrel is set but not recorded, and you have no other way to access the machine at crash, and there is no log recorded mentioning the crash... No way to analyze more, unfortunately. If so, I'd suggest to do hardware tests and a clean installation from the scratch. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #50 from macias -
This is for the failsafe mode. Does the normal boot options also have crashkernel option, too? Just to be sure.
Oh, sorry: splash=verbose quiet showopts crashkernel=512M-:256M
If so, I'd suggest to do hardware tests
What tests you have in mind.
and a clean installation from the scratch.
It was clean install, not an update. But I thought about the same :-( -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #51 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
--- Comment #52 from macias -
http://bugzilla.opensuse.org/show_bug.cgi?id=917411
macias -
participants (1)
-
bugzilla_noreply@novell.com