[opensuse-support] DELL XPS13 freezing

Hi *, for a long time I got messages like i915 0000:00:02.0: [drm] *ERROR* CPU pipe A FIFO underrun i915 0000:00:02.0: [drm] *ERROR* Atomic update failure on pipe B (start=315366 end=315367) time 312 us, min 1431, max 1439, scanline start 1417, end 1444 from time to time, but this didn't seem to have any further impact on the system. But starting this week about three times a day the machine freezes completely with no apparent reason besides more of the above mentioned messages. No other error messages are shown in the journal or any other logs. This does not happen with Windows 10 running on the same machine, but from an external thunderbolt ssd instead of the internal M.2 ssd. Machine is a Dell Inc. XPS 13 9370/0F6P3V, BIOS 1.13.1 07/08/2020. Does anyone else face this problem? Any idea what to check or do against it? TIA. Bye. Michael.

On 14/11/2020 21.36, mh@mike.franken.de wrote:
Ah, ok. Still I think you should verify the "disk". If it fails suddenly there would be no logs about the problem. If the machine responds a bit, maybe console 10 might have some information. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)

On Samstag, 14. November 2020 21:47:28 CET Carlos E. R. wrote:
Yes, that was my first idea, too. But neither fscks for the filesystems nor a DELL HW check didn't find anything 8-< And no, no console shows any information - they are all pitch black 8-( At the moment, I suspect RAM to be the culprit. I added the memtest parameter to the kernel boot options and sometimes, but not every time on boot I get error messages for certain memory blocks. Despite that, all HW checks including DELL's own one refuse to show any errors 8-/ Thx! bye. Michael.

On 15/11/2020 10.03, mh@mike.franken.de wrote:
On Samstag, 14. November 2020 21:47:28 CET Carlos E. R. wrote:
Don't forget smartctl.
And no, no console shows any information - they are all pitch black 8-(
Tsk :-(
There is a memtest tool that is usually accessible from the boot menu of the install disk. You have to leave it running for a day or so. Others are more knowledgeable on the tool than me. If that tool signals an error in the RAM, then you do have a problem with the RAM no matter what Dell says. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)

Le 15/11/2020 à 13:04, Carlos E. R. a écrit :
*any error* mean the ram chip is good for trash :-( if the faulty chip is high in the memory, the error may come randomly jdd -- http://dodin.org

On Sonntag, 15. November 2020 13:04:06 CET Carlos E. R. wrote: [...]
If you mean memtest86, that doesn't exist any longer on install disks, AFAIK because of being a 16bit program it doesn't work on EFI machines. But the memtest kernel parameter should do almost the same. You can tell it, how many passes it should run for with memtest=n. Thx and bye. Michael.

Michael composed on 2020-11-15 21:24 (UTC+0100):
On Sonntag, 15. November 2020 13:04:06 CET Carlos E. R. wrote:
My UEFI systems include this Grub stanza: menuentry "memtest86 8.3 EFI" { search --no-floppy --label --set=root <VOLUMELABEL> chainloader /mt83x64.efi } I get the binary from https://www.memtest86.com/. Do not confuse FOSS memtest86 with non-FOSS memtest86+. -- Evolution as taught in public schools, like religion, is based on faith, not on science. Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/

On Sonntag, 15. November 2020 21:49:00 CET Felix Miata wrote:
Michael composed on 2020-11-15 21:24 (UTC+0100):
[...]
Yep, but this isn't free software AFAIK. Perhaps this is the reason, why this version of memtest is also missing on the distro images. Thx and bye. Michael.

On Mon, 16 Nov 2020 10:37:04 +0100 mh@mike.franken.de wrote:
There are details of the development history and a bit about the licensing at https://www.memtest86.com/memtest86.html TL;DR - it's free but not open source There's also memtest86+ http://memtest.org/ which is GPL and the source is available from that page. Last update this year, not abandonware AFAICT.

On Saturday, November 14, 2020 1:00:25 PM CST mh@mike.franken.de wrote:
You don't say if this Leap or Tumbleweed. You could remove the xf86-video-intel and just use the modesetting driver. I don't think the intel drivers are seeing much love. Mark

On Samstag, 14. November 2020 22:14:53 CET Mark Petersen wrote:
sorry, it is Tumbleweed - latest snapshot.
You could remove the xf86-video-intel and just use the modesetting driver. I don't think the intel drivers are seeing much love.
I never had problems with the Intel driver, but a few with the modesetting driver. Especially the problem in this case seemed to occur out of nothing, i.e. without an update or config change before.
Mark
Thx and bye. Michael.

Moin, On Sun, 15 Nov 2020, 10:09:35 +0100, mh@mike.franken.de wrote:
I see this from time to time on one of my systems, too. This is an HTPC using an Intel CPU i5-8400 CPU @ 2.80GHz. As the PC is really dedicated to be used for playing back all sorts of Multimedia files, it has only happened when running Kodi. Sometimes audio gets stuck and replays a small audio frame in an endless loop, but video is completely stuck, too; the system cannot be reached over the network and switching console ttys does not work either. Similar to what you have observed, it doesn't happen for a few days (sometimes even weeks), but then it suddenly happens several times a day. There is no difference which kernel is used (I see this behaviour on Leap 15.1 and 15.2, using either the Leap kernel or the one from Kernel:stable); the xf86-video-intel gave me more problems compared to modesetting, but the lockups happened with both drivers. And, yes, it doesn't occur here on Win 10 running Kodi, too. Do you see a particular use case when it happens for you?
Thx and bye. Michael.
Cheers. l8er manfred

On Sonntag, 15. November 2020 12:40:59 CET Manfred Hollstein wrote:
no, it happens during completely different actions. On those bad days it usually starts with problems resuming from hibernation. After a reboot (or a power down and power up cycle) the system seems to be usable as expected. After a while you start firefox, open a website, check mails with KMail or any other action and it freezes.
Bye. Michael.

Hi again, finally the root cause revealed oneself - the CPU has died 8-< Yesterday the machine didn't reboot at all, instead the battery LED was flashing yellow two times, followed by a white flash. According to DELL this indicates a CPU failure. Support will change the mainboard the day after tomorrow. Bye. Michael. On Samstag, 14. November 2020 20:00:25 CET mh@mike.franken.de wrote:

On 14/11/2020 21.36, mh@mike.franken.de wrote:
Ah, ok. Still I think you should verify the "disk". If it fails suddenly there would be no logs about the problem. If the machine responds a bit, maybe console 10 might have some information. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)

On Samstag, 14. November 2020 21:47:28 CET Carlos E. R. wrote:
Yes, that was my first idea, too. But neither fscks for the filesystems nor a DELL HW check didn't find anything 8-< And no, no console shows any information - they are all pitch black 8-( At the moment, I suspect RAM to be the culprit. I added the memtest parameter to the kernel boot options and sometimes, but not every time on boot I get error messages for certain memory blocks. Despite that, all HW checks including DELL's own one refuse to show any errors 8-/ Thx! bye. Michael.

On 15/11/2020 10.03, mh@mike.franken.de wrote:
On Samstag, 14. November 2020 21:47:28 CET Carlos E. R. wrote:
Don't forget smartctl.
And no, no console shows any information - they are all pitch black 8-(
Tsk :-(
There is a memtest tool that is usually accessible from the boot menu of the install disk. You have to leave it running for a day or so. Others are more knowledgeable on the tool than me. If that tool signals an error in the RAM, then you do have a problem with the RAM no matter what Dell says. -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)

Le 15/11/2020 à 13:04, Carlos E. R. a écrit :
*any error* mean the ram chip is good for trash :-( if the faulty chip is high in the memory, the error may come randomly jdd -- http://dodin.org
participants (7)
-
Carlos E. R.
-
Dave Howorth
-
Felix Miata
-
jdd@dodin.org
-
Manfred Hollstein
-
Mark Petersen
-
mh@mike.franken.de