[Bug 1034357] New: Kernel 4.10.9: Computer Intermittently Reboots
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 Bug ID: 1034357 Summary: Kernel 4.10.9: Computer Intermittently Reboots Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: jshand2013@gmail.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 721379 --> http://bugzilla.opensuse.org/attachment.cgi?id=721379&action=edit DMESG with errors in certain areas i was doing git downloads, watching youtube, chatting on skype and a few other things and i got these errors after the computer suddenly desired to reboot by itself. i have very little understanding on the problem itself, but i have a few error codes. mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: b600000000070f0f mce: [Hardware Error]: TSC 0 ADDR fea00040 mce: [Hardware Error]: PROCESSOR 2:610f31 TIME 1492398286 SOCKET 0 APIC 0 microcode 6001119 i also checked the mcelog and got: jshand@linux-zkok:~> sudo mcelog --client Memory errors SOCKET 0 CHANNEL 0 DIMM 0 DMI_NAME "Node0_Dimm1" DMI_LOCATION "Node0_Bank0" corrected memory errors: 0 total 0 in 24h uncorrected memory errors: 0 total 0 in 24h dmesg info is added as a file i hope this helps -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c1 --- Comment #1 from John Shand <jshand2013@gmail.com> --- Created attachment 721382 --> http://bugzilla.opensuse.org/attachment.cgi?id=721382&action=edit Hardware Information -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c2 --- Comment #2 from John Shand <jshand2013@gmail.com> --- let me know if there is any other information you need -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c3 --- Comment #3 from John Shand <jshand2013@gmail.com> --- Created attachment 721384 --> http://bugzilla.opensuse.org/attachment.cgi?id=721384&action=edit journalctl -b information -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c4 --- Comment #4 from John Shand <jshand2013@gmail.com> --- Hardware event. This is not a software error. mcelog[1020]: MCE 0 mcelog[1020]: CPU 0 BANK 4 mcelog[1020]: ADDR fea00040 mcelog[1020]: TIME 1492398286 Mon Apr 17 15:04:46 2017 mcelog[1020]: MC4 Error: Watchdog timeout due to lack of progress. mcelog[1020]: cache level: generic, mem/io: generic, mem-tx: generic error, part-proc: generic participation (request timed out) mcelog[1020]: STATUS b600000000070f0f MCGSTATUS 0 mcelog[1020]: MCGCAP 107 APICID 0 SOCKETID 0 mcelog[1020]: CPUID Vendor AMD Family 21 Model 3 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c6 Borislav Petkov <bpetkov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jshand2013@gmail.com Flags| |needinfo?(jshand2013@gmail. | |com) --- Comment #6 from Borislav Petkov <bpetkov@suse.com> --- Can you disable the wlan card in your BIOS (if possible) and use the machine *without* the wifi - i.e., use eth0 and see if you can reproduce. Can you even reproduce reliably? Also, please send /proc/iomem. Thanks. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c7 --- Comment #7 from John Shand <jshand2013@gmail.com> --- not too sure if you were asking me or Takashi. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c9 --- Comment #9 from John Shand <jshand2013@gmail.com> --- iomem contents: 00000000-00000000 : reserved 00000000-00000000 : System RAM 00000000-00000000 : reserved 00000000-00000000 : PCI Bus 0000:00 00000000-00000000 : PCI Bus 0000:00 00000000-00000000 : Video ROM 00000000-00000000 : reserved 00000000-00000000 : System ROM 00000000-00000000 : System RAM 00000000-00000000 : reserved 00000000-00000000 : ACPI Tables 00000000-00000000 : ACPI Non-volatile Storage 00000000-00000000 : reserved 00000000-00000000 : System RAM 00000000-00000000 : ACPI Non-volatile Storage 00000000-00000000 : System RAM 00000000-00000000 : reserved 00000000-00000000 : System RAM 00000000-00000000 : RAM buffer 00000000-00000000 : pnp 00:01 00000000-00000000 : PCI Bus 0000:00 00000000-00000000 : 0000:00:01.0 00000000-00000000 : PCI Bus 0000:01 00000000-00000000 : 0000:01:00.0 00000000-00000000 : r8169 00000000-00000000 : 0000:01:00.0 00000000-00000000 : r8169 00000000-00000000 : PCI MMCONFIG 0000 [bus 00-ff] 00000000-00000000 : pnp 00:00 00000000-00000000 : PCI Bus 0000:02 00000000-00000000 : 0000:02:00.0 00000000-00000000 : rtl_pci 00000000-00000000 : 0000:00:01.0 00000000-00000000 : 0000:00:14.2 00000000-00000000 : ICH HD audio 00000000-00000000 : 0000:00:01.1 00000000-00000000 : ICH HD audio 00000000-00000000 : 0000:00:16.2 00000000-00000000 : ehci_hcd 00000000-00000000 : 0000:00:16.0 00000000-00000000 : ohci_hcd 00000000-00000000 : 0000:00:14.5 00000000-00000000 : ohci_hcd 00000000-00000000 : 0000:00:13.2 00000000-00000000 : ehci_hcd 00000000-00000000 : 0000:00:13.0 00000000-00000000 : ohci_hcd 00000000-00000000 : 0000:00:12.2 00000000-00000000 : ehci_hcd 00000000-00000000 : 0000:00:12.0 00000000-00000000 : ohci_hcd 00000000-00000000 : 0000:00:11.0 00000000-00000000 : ahci 00000000-00000000 : amd_iommu 00000000-00000000 : reserved 00000000-00000000 : IOAPIC 0 00000000-00000000 : reserved 00000000-00000000 : pnp 00:03 00000000-00000000 : reserved 00000000-00000000 : HPET 0 00000000-00000000 : PNP0103:00 00000000-00000000 : pnp 00:03 00000000-00000000 : reserved 00000000-00000000 : pnp 00:03 00000000-00000000 : Local APIC 00000000-00000000 : pnp 00:03 00000000-00000000 : reserved 00000000-00000000 : pnp 00:03 00000000-00000000 : System RAM 00000000-00000000 : Kernel code 00000000-00000000 : Kernel data 00000000-00000000 : Kernel bss 00000000-00000000 : RAM buffer -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c10 --- Comment #10 from John Shand <jshand2013@gmail.com> --- i have tried eth0 and it was fine. the problems seems to be wireless, however i have been unable to reproduce it for some time -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c12 --- Comment #12 from John Shand <jshand2013@gmail.com> --- how do i got about getting that information for you? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c13 --- Comment #13 from John Shand <jshand2013@gmail.com> --- Created attachment 722212 --> http://bugzilla.opensuse.org/attachment.cgi?id=722212&action=edit iomem contents i hope this helps -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c15 --- Comment #15 from John Shand <jshand2013@gmail.com> --- can you forward this information to larry, as it may fix an intermittent issue? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c16 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Larry.Finger@lwfinger.net Flags| |needinfo?(Larry.Finger@lwfi | |nger.net) --- Comment #16 from Takashi Iwai <tiwai@suse.com> --- Larry, any clue about this issue? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c17 --- Comment #17 from John Shand <jshand2013@gmail.com> --- thanks mate -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c19 Larry Finger <Larry.Finger@lwfinger.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(Larry.Finger@lwfi | |nger.net) | --- Comment #19 from Larry Finger <Larry.Finger@lwfinger.net> --- (In reply to Takashi Iwai from comment #16)
Larry, any clue about this issue?
No. Reports of such crashes have not been reported to me. In addition, the basic PCI setup has not been changed since the first inclusion of driver rtl8192ce. If this error is causing the reboot, then it is not always fatal. In the attached dmesg output, three such errors were logged 0.25 sec after the clock was started, That is far earlier than the PCI bus scan. The Realtek PCI driver was loaded at 20 sec. The next mce event is at 316 sec. Even then the machine is still running with the firewall logging a packet drop 83 seconds later. Each processor on my fastest system is 5800 BogoMIPS, thus John's is faster at 7400, but I do not expect that to have any effect. Larry -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c21 --- Comment #21 from John Shand <jshand2013@gmail.com> --- (In reply to Borislav Petkov from comment #18)
You keep avoiding answering the question: how often did this happen and can you reproduce it?
The funny thing about this issue is that a few months ago it used to happen at least 3 times a week, then it went away until the last update. Since the last update it has only happened twice without any reason i can think of -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c23 --- Comment #23 from John Shand <jshand2013@gmail.com> --- (In reply to Borislav Petkov from comment #22)
(In reply to John Shand from comment #21)
The funny thing about this issue is that a few months ago it used to happen at least 3 times a week, then it went away until the last update.
By "update" you mean kernel update? What exactly did get updated, making the issue go away?
Since the last update it has only happened twice without any reason i can think of
Ok, so it was a good thing I kept insisting on that question: this is an important piece of information.
Btw, your board has a newer BETA BIOS:
http://www.gigabyte.com/Motherboard/GA-F2A55M-DS2-rev-10#support-dl
Description says it updates AGESA which is the CPU part of the BIOS support. So while it doesn't say it fixes some wifi chip issues, it would still be worth to try...
yeah i did the kernel updates as per normal until kernel 4.10.9, then i had the issue mentioned. yeah, i double checked with my motherboard and i have a revision 1.2 i am unsure how to update the BIOS myself. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c25 --- Comment #25 from John Shand <jshand2013@gmail.com> --- (In reply to Borislav Petkov from comment #24)
(In reply to John Shand from comment #23)
yeah i did the kernel updates as per normal until kernel 4.10.9, then i had the issue mentioned.
You said:
The funny thing about this issue is that a few months ago it used to happen at least 3 times a week,
Yes i did. i can't remember the kernel version.
With which kernel did you get it 3 times a week?
no. more like a different one when updates were available.
then it went away until the last update.
What exactly did you update/change to make the issue go away? Change/update to what version?
Since the last update it has only happened twice without any reason i can think of
And that is with 4.10.9, correct?
yes that's correct
Basically, I'd like to find out what you did to cause the issue to happen and what you did to make it go away.
All i did was update when stable updates were available, which went very smoothly. kernel 4.10.8 was the last version that didn't have this issue that i'm aware of. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c26 --- Comment #26 from John Shand <jshand2013@gmail.com> --- issue is still current with kernel 4.10.10 and has happened more than with kernel 4.10.9 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c27 --- Comment #27 from Borislav Petkov <bpetkov@suse.com> --- Well, the only thing I could think of is try updating your BIOS. Then, I guess I'm all out of ideas and I'd look in Larry's direction. Ok, maybe one practical idea: if the wifi card is one you can replace and it was cheap, I'd go get a different one which doesn't have the issue. HTH. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c28 --- Comment #28 from John Shand <jshand2013@gmail.com> --- the new kernel 4.10.12 seems to have fixed this problem. i have had to change no hardware as a result -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1034357 http://bugzilla.opensuse.org/show_bug.cgi?id=1034357#c29 John Shand <jshand2013@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #29 from John Shand <jshand2013@gmail.com> --- this issue seems to have been resolved -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com