[opensuse] System lockup on 3d operations (mostly games) - hardware error?
Hello, When i play savage 2 game or any of other power-demanding 3d games after +- 15minutes i got a system lockup with only hard reboot possible. Is it mobo, graphics, CPU or software problem? What can i try to resolve? My system (laptop) is: Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz 4GB RAM GeForce 8600M GT Sometimes (very rare) it happens even when not playing. running 2.6.27.7-9-default kernel X.Org X Server 1.5.2 NVIDIA 180.22 driver I have found following in my logs: /var/log/messages (last lines before reboot): Jan 24 06:02:14 saily kernel: Xorg[2956]: segfault at 8 ip 00007f14475202f3 sp 00007fff4f730630 error 4 in ld-2.9.so[7f1447513000+1e000] Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 6, PE007f Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 6, PE007f Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 7, Ch 0000007f M 00001ffc D ffffffff intr ffffffff Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 26, Ch 0000007f M 00001ffc D ffffffff intr ffffffff Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 4, Ch 0000007f acquireValue ffffffff dmaPut ffffffff dmaGetffffffff Jan 24 06:02:14 saily kernel: NVRM: Xid (0001:00): 4, Ch 0000007f SC 00000007 M 00001ffc Data ffffffff Jan 24 06:02:14 saily kernel: Uhhuh. NMI received for unknown reason b1. Jan 24 06:02:14 saily kernel: You have some hardware problem, likely on the PCI bus. Jan 24 06:02:14 saily kernel: Dazed and confused, but trying to continue /var/log/mcelog (part): MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 3 TSC 5def5ab6344 STATUS 902000010220100e MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 3 TSC 15a9e543c894 ADDR 13e705d40 STATUS 942000410001010a MCGSTATUS 0 MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 3 TSC 15cb35ec8eec STATUS 902000010120100e MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 0 STATUS f200084000000800 MCGSTATUS 0 MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 5 STATUS f200001014000e0f MCGSTATUS 0 MCE 2 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 BANK 5 STATUS f200001030000e0f MCGSTATUS 0 MCE 3 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 3 TSC 9755348808 ADDR 12d605d40 STATUS 942000410001010a MCGSTATUS 0 MCE 4 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 3 TSC eaf13b85c8 STATUS 902000010120100e MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 0 STATUS f200084000000800 MCGSTATUS 0 MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 5 STATUS f200001014000e0f MCGSTATUS 0 MCE 2 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 BANK 5 STATUS f200001030000e0f MCGSTATUS 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 THERMAL EVENT TSC 9cf9e42c4 Processor core below trip temperature. Throttling disabled STATUS 881e01c0 MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 0 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE timeout BINIT (ROB timeout). No micro-instruction retired for some time failure that caused IERR STATUS f200084000000800 MCGSTATUS 0 MCE 1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 5 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Generic Generic Generic Other-transaction Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_AERR2_TYPE BQ_ERR_AERR2_TYPE received parity error on response transaction MCE driven MCE is observed STATUS f200001034000e0f MCGSTATUS 0 MCE 2 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 BANK 5 MCG status: MCi status: Error overflow Uncorrected error Error enabled Processor context corrupt MCA: BUS Generic Generic Generic Other-transaction Request-timeout Error BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE received parity error on response transaction MCE driven STATUS f200001010000e0f MCGSTATUS 0 MCE 0 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 THERMAL EVENT TSC 9fef682fc Processor core below trip temperature. Throttling disabled STATUS 882301c0 MCGSTATUS 0 ...... Best regards Gryffus
On Saturday 24 January 2009 12:06:39 am Gryffus wrote:
HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 THERMAL EVENT TSC 9fef682fc Processor core below trip temperature. Throttling disabled STATUS 882301c0 MCGSTATUS 0
With this (despite my confusion at the word "below"), I have to wonder if your CPU fan is working correctly. If it is not functioning at maximum efficiency, it could be that whenever you work it hard enough (like many of those 3D games will do), it will overheat. If this is the case, you're lucky it continues to run. I would open up the side panel and watch the fan as I turn it on. Make sure it spins smoothly. If it does, see if your BIOS throws up a message about fan speed as it boots and post the result here. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
This is laptop... And the fan is working correctly, the core temperature doesnt go higher than 80 °C which is not too much imho... GPU temperature is the same... I have found only 1 fan in the laptop case and cleaned it... But lockups still appear... :-( Regards Gfs Constantinos Maltezos napsal(a):
On Saturday 24 January 2009 12:06:39 am Gryffus wrote:
HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 THERMAL EVENT TSC 9fef682fc Processor core below trip temperature. Throttling disabled STATUS 882301c0 MCGSTATUS 0
With this (despite my confusion at the word "below"), I have to wonder if your CPU fan is working correctly. If it is not functioning at maximum efficiency, it could be that whenever you work it hard enough (like many of those 3D games will do), it will overheat. If this is the case, you're lucky it continues to run.
I would open up the side panel and watch the fan as I turn it on. Make sure it spins smoothly. If it does, see if your BIOS throws up a message about fan speed as it boots and post the result here.
On Sunday 25 January 2009 3:19:31 am Gryffus wrote:
This is laptop... And the fan is working correctly, the core temperature doesnt go higher than 80 °C which is not too much imho... GPU temperature is the same... I have found only 1 fan in the laptop case and cleaned it... But lockups still appear... :-(
Regards Gfs
Okay, second try... These games generally use a lot of memory - more than is usually used. Depending on how much RAM you have, it might be stopping because of bad RAM or bad hard disk sectors. Try running memtest86 to test the RAM, and if that's good, boot from a rescue disk (you can use the openSuSE install disk) and run fsck on the partition where your swap is. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (2)
-
Constantinos Maltezos
-
Gryffus