[Bug 1056535] New: DATA CORRUPTION on Lenovo ThinkPad P70 with socket use
http://bugzilla.opensuse.org/show_bug.cgi?id=1056535 Bug ID: 1056535 Summary: DATA CORRUPTION on Lenovo ThinkPad P70 with socket use Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.3 Hardware: x86-64 OS: openSUSE 42.3 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: rlk@alum.mit.edu QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 738932 --> http://bugzilla.opensuse.org/attachment.cgi?id=738932&action=edit List of miscompared bytes in comparison of 9 files, each 1-2 GB in size On a Lenovo ThinkPad P70, Xeon E3-1505Mv5, nVidia Quadro M4000M, 32 GB, I get sporadic data corruption with any form of file transfer involving use of sockets (IP or UNIX domain). This includes rsync (local, local->remote, or remote->local), scp, ftp, or tar over ssh. It does not include file transfers not using sockets (specifically including cp -r or tar cf - | tar xf - (those complete with no errors). Rsync detects errors in the data stream; in some cases it retries the transfer; in other cases it results in a protocol error that aborts the transfer. The local->remote rsync also gets the same types of error (the remote machine has otherwise been reliable). I note that ssh itself does not report any failures, as might be expected if an encrypted data stream suffered corruption. There are no messages in the log files indicating errors of any kind. One pattern I've observed in the miscompares is that the address with the erroneous data always ends in the six bits 011111 (byte 31 of 64 byte chunk). There appears to be some loose clustering (see attached). I have run a full pass of memtest86, full Lenovo diagnostics, and 7 hours (to date) of prime95 with no errors. I have removed each of the two DIMMs, using only 16 GB, using different slots, with no change in behavior. I put the two DIMMs in the other pair of slots, also with no change in behavior. I have used two different SSDs (SanDisk X400 512GB and Crucial MX300 1TB) in separate M.2 slots with no change in behavior. I have not seen any evidence of any such issues under Windows 10. I also do not observe any such issues with Knoppix 7.7.1 (kernel 4.7.9). This occurs with both the stock 4.4.79-default kernel (I first noticed it when the download of the update RPMs had errors) and the 4.12.9-1.gf2ab6ba-default kernel. It also occurs with the 4.12.9-1.gf2ab6ba-vanilla kernel. I haven't tried a vanilla 4.4 kernel, but my understanding is that that does not properly support Skylake processors. The one combination I have found that does not get these errors is using the vanilla kernel with the xf86-video-nouveau package removed (the presence/absence of the nv and proprietary nVidia packages does not appear to have any effect). This isn't a viable combination for me, since I still don't have the proprietary nVidia package working for reasons TBD (have not determined whether this is a bug or user error). However, it happens even at runlevel 3 with the X server never started. It also doesn't matter whether I set the BIOS to use only the nVidia graphics driver or the hybrid (Optimus) setup. I have a few more weeks before I have to decide whether to return the laptop (and eat the 15% restocking fee), so I'm under some time pressure to find a resolution. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1056535 http://bugzilla.opensuse.org/show_bug.cgi?id=1056535#c1 --- Comment #1 from Robert Krawitz <rlk@alum.mit.edu> --- Also, BIOS is up to date (including the Skylake/Kaby Lake ucode patch) and I have run with hyperthreading both enabled and disabled, with no change in behavior. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1056535 http://bugzilla.opensuse.org/show_bug.cgi?id=1056535#c2 --- Comment #2 from Robert Krawitz <rlk@alum.mit.edu> --- I cannot be positive that using the vanilla kernel in fact improved matters. I only ran two passes of the rsync comparison. This morning I tried turning off swap and also got a couple of good passes, but I later tried again and got the same errors. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1056535 http://bugzilla.opensuse.org/show_bug.cgi?id=1056535#c3 --- Comment #3 from Robert Krawitz <rlk@alum.mit.edu> --- Reproduced with kernel-default-4.4.79-19.1.x86_64.rpm -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1056535 http://bugzilla.opensuse.org/show_bug.cgi?id=1056535#c4 Robert Krawitz <rlk@alum.mit.edu> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #4 from Robert Krawitz <rlk@alum.mit.edu> --- Have been able to reproduce on Windows, albeit with more difficulty. That indicates that this is not a problem with openSUSE; it's almost surely a hardware glitch with the laptop. Hence closing. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com