PC hard crash regularly: need some help to pinpoint the problem
Hello, Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940). For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running. With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. I need to press the powerbuttom to stop the pc. The logfiles doesn't say much. Now and them I see something about the GPU is fallen of the socket. NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. The error code Xid 79 indicates that it can be a hardware problem but also a driver problem. It's a dual boot system windows 10 and Tumbleweed. Strange enough so far I haven't encounter the crash problem under W10. I suppose if it's a hardware problem I should happens under W10 too, isn't it? Any thought about this? Specs of Dell XPS 8940 Operating System: openSUSE Tumbleweed 20231005 KDE Plasma Version: 5.27.8 KDE Frameworks Version: 5.110.0 Qt Version: 5.15.10 Kernel Version: 6.5.4-1-default (64-bit) Graphics Platform: X11 Processors: 16 × 11th Gen Intel® Core™ i7-11700 @ 2.50GHz Memory: 15.3 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3060 Ti/PCIe/SSE2 driver NVIDIA-SMI 535.113.01 Manufacturer: Dell Inc. Product Name: XPS 8940 Dual boot system Windows 10 / Tumbleweed NVMe SK hynix 512GB (only Wndows 10) sda 1,8 TiB Segate Barracuda sda1 /boot/efi (127,7 MB) sda2 /mnt/data (886,3 GB) NTFS sda3 / (100,2 GB) EXT4 sda4 /home (859,4 GB) EXT4 Regards, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
On Oct 7, 2023, at 5:32 PM, Martin /Nightowl/ Byttebier <nightowl@telenet.be> wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so.
I need to press the powerbuttom to stop the pc. The logfiles doesn't say much. Now and them I see something about the GPU is fallen of the socket.
Thoroughly test the memory. If you have multiple memory sticks try removing them one at at time to try and isolate the problem. I had this issue with my PC and found a bad memory stick. I replaced it and the locking went away. Ken
Op Sat, 7 Oct 2023 21:56:31 +0000 kschneider bout-tyme.net <kschneider@bout-tyme.net> schreef:
On Oct 7, 2023, at 5:32 PM, Martin /Nightowl/ Byttebier <nightowl@telenet.be> wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so.
I need to press the powerbuttom to stop the pc. The logfiles doesn't say much. Now and them I see something about the GPU is fallen of the socket.
Thoroughly test the memory. If you have multiple memory sticks try removing them one at at time to try and isolate the problem. I had this issue with my PC and found a bad memory stick. I replaced it and the locking went away.
I ran Memtest86+ for almost 6 hours (5 pass) no bad memory found. So I think I can rule out memory problems. CU, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
On Sat, 7 Oct 2023 at 22:32, Martin /Nightowl/ Byttebier <nightowl@telenet.be> wrote:
It's a dual boot system windows 10 and Tumbleweed. Strange enough so far I haven't encounter the crash problem under W10. I suppose if it's a hardware problem I should happens under W10 too, isn't it?
That suggests to me that it is not a RAM or CPU issue. Which in turn points to firmware problems. I suggest you use Windows to check your firmware is current and if not update it. From a *very* quick Google the latest seems to be: Version - 2.0.13, 2.0.13 Release date - 08 Jun 2021 https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=gyg3... -- Liam Proven ~ Profile: https://about.me/liamproven Email: lproven@cix.co.uk ~ gMail/gTalk/FB: lproven@gmail.com Twitter/LinkedIn: lproven ~ Skype: liamproven IoM: (+44) 7624 277612: UK: (+44) 7939-087884 Czech [+ WhatsApp/Telegram/Signal]: (+420) 702-829-053
Op Sun, 8 Oct 2023 13:11:58 +0100 Liam Proven <lproven@gmail.com> schreef:
On Sat, 7 Oct 2023 at 22:32, Martin /Nightowl/ Byttebier <nightowl@telenet.be> wrote:
It's a dual boot system windows 10 and Tumbleweed. Strange enough so far I haven't encounter the crash problem under W10. I suppose if it's a hardware problem I should happens under W10 too, isn't it?
That suggests to me that it is not a RAM or CPU issue.
Which in turn points to firmware problems. I suggest you use Windows to check your firmware is current and if not update it.
From a *very* quick Google the latest seems to be:
Version - 2.0.13, 2.0.13
Release date - 08 Jun 2021
The bios is up-to-date and so is the Dell firmware # dmidecode 3.5 Getting SMBIOS data from sysfs. SMBIOS 3.2.0 present. Table at 0x000E0000. Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Dell Inc. Version: 2.14.0 Release Date: 08/08/2023 Thanks, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
Am 07.10.23 um 23:32 schrieb Martin /Nightowl/ Byttebier:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. ... Temperature problem when the load increases? Once I had a similar problem, after some experiments I opened the case and found one of the processor fan cooler clamps brocken. The processor got too hot and shut down. Replacing the clamp solved the problem.
cu Peter
On 2023-10-08 14:46, Peter McD wrote:
Am 07.10.23 um 23:32 schrieb Martin /Nightowl/ Byttebier:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. ... Temperature problem when the load increases? Once I had a similar problem, after some experiments I opened the case and found one of the processor fan cooler clamps brocken. The processor got too hot and shut down. Replacing the clamp solved the problem.
Yes, but in this case the computer apparently locks, doesn't shut down. -- Cheers / Saludos, Carlos E. R. (from openSUSE 15.5 (Laicolasse))
Am 08.10.23 um 14:49 schrieb Carlos E. R.:
On 2023-10-08 14:46, Peter McD wrote:
Am 07.10.23 um 23:32 schrieb Martin /Nightowl/ Byttebier:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. ... Temperature problem when the load increases? Once I had a similar problem, after some experiments I opened the case and found one of the processor fan cooler clamps brocken. The processor got too hot and shut down. Replacing the clamp solved the problem.
Yes, but in this case the computer apparently locks, doesn't shut down.
In the Message; Subject : Re: AMD ryzen 1700 zenstates C6 Message-ID : <4f9479d4-dc5e-0713-d1f4-36929b77d2bb@gmx.net> Date & Time: Wed, 5 Jul 2023 12:08:19 +0200 [PM] == Peter McD <peter.posts@gmx.net> has written:
[...] PM> disabling cstate 6. I think I should change with Yast the present PM> boot-loader settings from: PM> splash=verbose noresume security=apparmor PM> to: PM> splash=verbose noresume security=apparmor processor.max_cstate=5 PM> Would that be the right location?
This is the ChatGPT's answer: To set the processor.max_cstate=5 on Linux, you can add the following line to the /etc/default/grub file: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash processor.max_cstate=5" After that, you need to run the following command: $ sudo update-grub This will update the grub configuration file and apply the changes Test with less /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="splash=verbose noresume security=apparmor
Well, usual procedure: Throw a solution at a problem an hope it disappears. What about the zenstates? A solution to that from my local archive below the line: cu Peter ------------- Re: AMD ryzen 1700 zenstates C6 Betreff: Re: AMD ryzen 1700 zenstates C6 Von: Peter McD <peter.posts@gmx.net> Datum: 05.07.23, 14:39 An: users@lists.opensuse.org Am 05.07.23 um 12:47 schrieb Masaru Nomiya: processor.max_cstate=5 mitigations=auto"
Op Sun, 8 Oct 2023 16:01:57 +0200 Peter McD <peter.posts@gmx.net> schreef:
Am 08.10.23 um 14:49 schrieb Carlos E. R.:
On 2023-10-08 14:46, Peter McD wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. ... Temperature problem when the load increases? Once I had a similar problem, after some experiments I opened the case and found one of the processor fan cooler clamps brocken. The
Am 07.10.23 um 23:32 schrieb Martin /Nightowl/ Byttebier: processor got too hot and shut down. Replacing the clamp solved the problem.
Yes, but in this case the computer apparently locks, doesn't shut down.
Well, usual procedure: Throw a solution at a problem an hope it disappears.
What about the zenstates? A solution to that from my local archive below the line: cu Peter ------------- Re: AMD ryzen 1700 zenstates C6 Betreff: Re: AMD ryzen 1700 zenstates C6 Von: Peter McD <peter.posts@gmx.net> Datum: 05.07.23, 14:39 An: users@lists.opensuse.org
In the Message; Subject : Re: AMD ryzen 1700 zenstates C6 Message-ID : <4f9479d4-dc5e-0713-d1f4-36929b77d2bb@gmx.net> Date & Time: Wed, 5 Jul 2023 12:08:19 +0200 [PM] == Peter McD <peter.posts@gmx.net> has written:
[...] PM> disabling cstate 6. I think I should change with Yast the PM>present boot-loader settings from: PM> splash=verbose noresume security=apparmor PM> to: PM> splash=verbose noresume security=apparmor PM>processor.max_cstate=5 Would that be the right location?
This is the ChatGPT's answer: To set the processor.max_cstate=5 on Linux, you can add the following line to the /etc/default/grub file: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash processor.max_cstate=5" After that, you need to run the following command: $ sudo update-grub This will update the grub configuration file and apply the changes Test with less /etc/default/grub GRUB_CMDLINE_LINUX_DEFAULT="splash=verbose noresume security=apparmor
Am 05.07.23 um 12:47 schrieb Masaru Nomiya: processor.max_cstate=5 mitigations=auto"
Sorry for the late reply. I tried that but alas it didn't help. I decided to do a complete new re-install without NVIDIA-drivers. So far no crash. What the culprit was, I never will know but I'm glad the pc is running again. TTFN, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
Op Sun, 8 Oct 2023 14:49:59 +0200 "Carlos E. R." <robin.listas@telefonica.net> schreef:
On 2023-10-08 14:46, Peter McD wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so. ... Temperature problem when the load increases? Once I had a similar problem, after some experiments I opened the case and found one of the processor fan cooler clamps brocken. The
Am 07.10.23 um 23:32 schrieb Martin /Nightowl/ Byttebier: processor got too hot and shut down. Replacing the clamp solved the problem.
Yes, but in this case the computer apparently locks, doesn't shut down.
Indeed he computer locks. Anyway it's not a cooling problem. Thanks, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
On 10/8/23 07:46, Peter McD wrote:
Temperature problem when the load increases?
Ahh, the "Rampaging Dust Bunny" attack :) Well taken. Check airflow inlet screens, etc.. and if this is a desktop, check the cooling fins on the heat-sink are not clogged with dust. That can trigger an overtemp and many times it's not a graceful shutdown that results. Depending on the age of the box (yours sounds newer), but 5+ and definitely 10+ year old motherboards can suffer from capacitors going bad. This results in random lockups. Really random. Can take 5 hours, or 5 weeks for the spurious voltage to be just right to trigger the problem. Cap quality has gotten better in the past decade, but it is not out of the realm of possibilities. -- David C. Rankin, J.D.,P.E.
On 10/7/23 16:32, Martin /Nightowl/ Byttebier wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so.
I need to press the powerbuttom to stop the pc. The logfiles doesn't say much. Now and them I see something about the GPU is fallen of the socket.
NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
The error code Xid 79 indicates that it can be a hardware problem but also a driver problem.
It's a dual boot system windows 10 and Tumbleweed. Strange enough so far I haven't encounter the crash problem under W10. I suppose if it's a hardware problem I should happens under W10 too, isn't it?
Any thought about this?
Specs of Dell XPS 8940
Operating System: openSUSE Tumbleweed 20231005 KDE Plasma Version: 5.27.8 KDE Frameworks Version: 5.110.0 Qt Version: 5.15.10 Kernel Version: 6.5.4-1-default (64-bit) Graphics Platform: X11 Processors: 16 × 11th Gen Intel® Core™ i7-11700 @ 2.50GHz Memory: 15.3 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3060 Ti/PCIe/SSE2 driver NVIDIA-SMI 535.113.01 Manufacturer: Dell Inc. Product Name: XPS 8940
Dual boot system Windows 10 / Tumbleweed NVMe SK hynix 512GB (only Wndows 10) sda 1,8 TiB Segate Barracuda sda1 /boot/efi (127,7 MB) sda2 /mnt/data (886,3 GB) NTFS sda3 / (100,2 GB) EXT4 sda4 /home (859,4 GB) EXT4
Regards, Martin
Many years ago I had similar problems that I decided was some of the hardware maintained state across power cycling. My hypothesis was that Windows remembered the state and Linux assumed hardware started up in the same state all the time. Cutting way down changing OS decreased my problem. This was an IBM ThinkPad T41. The Linux WiFi driver was not fully debugged and sometimes I had to boot into Windows XP to get work done that couldn't wait until I was home and could plug into a wired connection to the Internet. HTH, Jeffrey
Op Tue, 24 Oct 2023 22:15:32 -0500 Jeffrey Taylor via openSUSE Users <users@lists.opensuse.org> schreef:
On 10/7/23 16:32, Martin /Nightowl/ Byttebier wrote:
Hello,
Since a week ot two I've severe problems with the stability of my desktop pc (Dell XPS 8940).
For some reason the pc hard crash several times a day. Sometimes it take 4 hours before the system crashes but there are times (like today) the pc crashes 3 times within 2 hours. One crash occured when I wakened up he pc, the other two crashes happened when I opened a programm (Zotero, Firefox) Most of the time I had only Amarok,Libreoffice, Firefoxa and Zotero running.
With hard crash I mean that the system completley stops. No keyboard action, I can't remotely access the pc. If at the time of the crash music was playing I hear an endeless loop of the last second or so.
I need to press the powerbuttom to stop the pc. The logfiles doesn't say much. Now and them I see something about the GPU is fallen of the socket.
NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
The error code Xid 79 indicates that it can be a hardware problem but also a driver problem.
It's a dual boot system windows 10 and Tumbleweed. Strange enough so far I haven't encounter the crash problem under W10. I suppose if it's a hardware problem I should happens under W10 too, isn't it?
Any thought about this?
Specs of Dell XPS 8940
Operating System: openSUSE Tumbleweed 20231005 KDE Plasma Version: 5.27.8 KDE Frameworks Version: 5.110.0 Qt Version: 5.15.10 Kernel Version: 6.5.4-1-default (64-bit) Graphics Platform: X11 Processors: 16 × 11th Gen Intel® Core™ i7-11700 @ 2.50GHz Memory: 15.3 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3060 Ti/PCIe/SSE2 driver NVIDIA-SMI 535.113.01 Manufacturer: Dell Inc. Product Name: XPS 8940
Dual boot system Windows 10 / Tumbleweed NVMe SK hynix 512GB (only Wndows 10) sda 1,8 TiB Segate Barracuda sda1 /boot/efi (127,7 MB) sda2 /mnt/data (886,3 GB) NTFS sda3 / (100,2 GB) EXT4 sda4 /home (859,4 GB) EXT4
Regards, Martin
Many years ago I had similar problems that I decided was some of the hardware maintained state across power cycling. My hypothesis was that Windows remembered the state and Linux assumed hardware started up in the same state all the time. Cutting way down changing OS decreased my problem. This was an IBM ThinkPad T41. The Linux WiFi driver was not fully debugged and sometimes I had to boot into Windows XP to get work done that couldn't wait until I was home and could plug into a wired connection to the Internet.
I did a complet reinstall of Tumbleweed but this time with the Nouveau driver instead of NVIDIA. So far no crash at all. I think stability problem was caused by the NVIDIA driver TTFN, Martin -- Atari FTP-site: ftp://kurobox.serveftp.net:3021 Running openSUSE Tumbleweed / KDE 5.27.8
On 10/25/23 05:25, Martin /Nightowl/ Byttebier wrote:
I did a complet reinstall of Tumbleweed but this time with the Nouveau driver instead of NVIDIA. So far no crash at all. I think stability problem was caused by the NVIDIA driver
See: [What is a tainted Linux kernel?](https://unix.stackexchange.com/questions/118116/what-is-a-tainted-linux-kern...) and [Tainted kernels - The Linux Kernel documentation](https://docs.kernel.org/admin-guide/tainted-kernels.html) -- David C. Rankin, J.D.,P.E.
participants (8)
-
Carlos E. R.
-
David C. Rankin
-
David C. Rankin
-
Jeffrey Taylor
-
kschneider bout-tyme.net
-
Liam Proven
-
Martin /Nightowl/ Byttebier
-
Peter McD