[Bug 984955] New: System freezes & reboots frequently after applying official patches with upgrades from kernel-default-4.1.13-5.1 to newer kernel
http://bugzilla.opensuse.org/show_bug.cgi?id=984955 Bug ID: 984955 Summary: System freezes & reboots frequently after applying official patches with upgrades from kernel-default-4.1.13-5.1 to newer kernel Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.1 Hardware: x86-64 OS: openSUSE 42.1 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: michaelof@rocketmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Additional information: - Using KDE/Plasma as GUI - current (again leading to freezes) kernel is 4.1.26-4.g294632f-default from OBS Kernel:openSUSE-42.1 repo I was asked to create a new bug - maybe a duplicate of bug 971695. Symptoms are looking very at least very similar: - System has been rock solid stable up to and including kernel-default-4.1.13-5.1. - After upgrading to newer kernels, offered by KDE's "Software Updates" applet/plasmoid, system randomly freezes from a couple times a day to over a week. - After beeing freezed for a minute or so, the system reboots on its own. - There are no signs of an upcoming freeze -- everything just stops, Symptoms of the freeze: Screen stays with the same image. I didn't see anything interesting in the system journal - it just abruptly ends. I have also noticed that my system rebooted on occasion when nobody was using it or even logged in. NOT reproducible but my strong "feeling": Freezes are happening, when a lot of disk activity is in the background on an USB 3.0 external HDD, and user mouse activity in the foreground (USB mouse). These phases of high external USB 3,0 HDD activity are e.g. daily "Back in time" backups. External USB 3.0 HDD is NTFS. UNFORTUNATELY these freezes are nearly always leading to a CORRUPTED NTFS file system. NOT detected by any linux tool I'm aware of, But all following "Back in time" backups are unsuccessful due to various reasons. MS Windows based CHKDSK /F detects hundreds, sometimes thousands of errors. On a nearly new HDD, no bad blocks, and ALWAYS checked and repaired after these freezes. Two cycles, to sure that all corruptions are fixed. As these are my by backups, this bug is more than SERIOUS for me. To be on the save side I'm making full "dd" based image backups of the whole internal drive (SSD) after booting OpenSuse 13.2 from an live stick. Initially searching for solution I've found the mentioned bug 971695. I've added some "me 2" comments, installed yast's kernel-kdump and was able to get a crash dump for kernel 4.1.15-8-default. Not sure if this is of interest. I've got the hint to upgrade to kernel 4.1.25 (https://bugzilla.opensuse.org/show_bug.cgi?id=971695#c19). Following this, I've got no freezes anymore for 4.1.25 and as far as I know two following kernels from OBS Kernel:openSUSE-42.1 repo. Current kernel is now 4.1.26-4.g294632f-default, as very frequently kernels are offered from OBS Kernel:openSUSE-42.1 repo. And 4.1.26-4 reintroduces the freezes :-( Twice yesterday, and I've reactivated the kernel-kdump option, which I've deactivated after the problem seemed to be solved. Please advice how to proceed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
Michael from Offenbach Germany
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c1
Andreas Stieger
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c2
Michael from Offenbach Germany
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c3
--- Comment #3 from Michael from Offenbach Germany
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c9
--- Comment #9 from Michael from Offenbach Germany
Some Intel Haswell CPUs are known to have issues with the package C-State PC6 which temporarily disables PCIe, USB, memory controllers - i.e. everything what Intel calls the "uncore". Please try the kernel boot parameter "intel_idle.max_cstate=1" and check if the system becomes stable this way.
My system is a small ZBOX nano with an Atom Celeron (N2930, http://ark.intel.com/de/products/81073/Intel-Celeron-Processor-N2930-2M-Cach...), as far as I know these CPUs are "Bay Trail", not "Haswell". Currently I'm working with kernel 4.1.26-1.g8a776b0-default, as advised by Oliver Neukum. No freezes since then, seems to be stable now/again. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c10
--- Comment #10 from Oliver Neukum
Currently I'm working with kernel 4.1.26-1.g8a776b0-default, as advised by Oliver Neukum. No freezes since then, seems to be stable now/again.
With the c-states limited or not? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c11
--- Comment #11 from Michael from Offenbach Germany
With the c-states limited or not?
No, I haven't changed any kernel parameters. I thought it would be more productive to do this only on your advice. Especially because I'm not sure if the HASWELL issue happens for BAY TRAIL also. So I'm using currently the intermediate kernel you told me (4.1.26-1.g8a776b0-default). And, nice to see, I haven't had any crashes with this kernel. Are the changes you've made to this test kernel 4.1.26-1.g8a776b0-default going to be applied to the mainline/trunk of Leap 42.1's maintenance? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c12
--- Comment #12 from Oliver Neukum
Are the changes you've made to this test kernel 4.1.26-1.g8a776b0-default going to be applied to the mainline/trunk of Leap 42.1's maintenance?
No. The change doesn't fix the crash. It just makes it more unlikely by a factor of 10000 or so. Could you please test the current distro kernel with the cstate limit? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c13
--- Comment #13 from Michael from Offenbach Germany
Could you please test the current distro kernel with the cstate limit?
I've used the distro 4.1.26-21-default for 10 days now WITHOUT the cstate limit. Originally I've decided to give it a try without the limit, until the first freeze. But, I love it, not one freeze with 4.1.26-21-default. Tested with high USB HDD i/o - no freeze. Only thing changed is the fs on my usb hdd: switched from ntfs to ext4, as the kernel crashes always corrupted ntfs. Going now to 4.1.27-24, offered as security update. I've the strong feeling that my bug has been fixed somewhere on the way to 4.1.26-21, let's see... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c14
--- Comment #14 from Michael from Offenbach Germany
http://bugzilla.opensuse.org/show_bug.cgi?id=984955
http://bugzilla.opensuse.org/show_bug.cgi?id=984955#c15
Oliver Neukum
This bug has disappeared with the later kernel updates for 42.1.
Just upgraded to 42.2.
Bug re-appeared with 42.2, again freezes. this time without an automatic reboot. I've had to hard switch off / power off my machine.
Please advice how to proceed, stay in this bug or create a new one for 42.2.
Please create a new bug and close this one. Add a refrence to this bug. We need to check for missed stable patches. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com