http://bugzilla.suse.com/show_bug.cgi?id=970762 http://bugzilla.suse.com/show_bug.cgi?id=970762#c17 Borislav Petkov <bpetkov@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(novell@tower-net. | |de) --- Comment #17 from Borislav Petkov <bpetkov@suse.com> --- Looks like your GPU is causing a cache error in some of the fifos: [ 9855.144800] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002 and shortly afterwards the bus queue stalls, which is reported with an MCE: Hardware event. This is not a software error. CPU 1 BANK 0 TSC 53a5c1336f6 TIME 1459340183 Wed Mar 30 14:16:23 2016 MCG status:MCIP MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS error: -1 1 Level-0 Local-CPU-originated-request Generic Memory-access Request-did-not-timeout BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE BQ_DCU_READ_TYPE is a Bus Queue read error, BQ_ERR_HARD_TYPE is probably saying that it is a hard error in the sense, not recoverable. timeout BINIT (ROB timeout). No micro-instruction retired for some time and this says your CPU has had a timeout retiring instructions, basically some sort of a livelock or so. STATUS f200004000000800 MCGSTATUS 4 CPUID Vendor Intel Family 6 Model 23 SOCKET 0 APIC 1 microcode 60f Run the above through 'mcelog --ascii' Hardware event. This is not a software error. The following MCEs simply say that some internal timer is expiring without any forward progress. So, to make a long story short, it is either nouveau misprogramming the GPU or the GPU memory starting to go bad and thus causing cache errors in the fifo or... something in that area.
From looking at nv04_fifo_cache_error() in the nouveau code, it doesn't really tell me much: "mthd" is some cache method, "data" is the data read, "ch" is channel...
Btw, I see a bunch of fixes to the fifo handling in nouveau since 4.5 so the only thing I could think of for you to try is check whether your MCE triggers with the upstream kernel: http://kernel.opensuse.org/packages/stable That's 4.8.2 and already has the fixes AFAICT. Please install that kernel and try to reproduce it with it too. HTH. -- You are receiving this mail because: You are on the CC list for the bug.