Joop Beris wrote:
Hello listmates,
I hope some of you can help me by shedding some light on the current situation. In short, I'd like to know if my machine is dying or if there is a software issue. The machine in question is about 7 years old and has recently started experiencing lock-ups, kernel errors and kernel-oops'.
In /var/log/messages, I find the following things:
----------------------
Oct 23 00:00:04 magrathea kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000f60 Oct 23 00:00:04 magrathea kernel: printing eip: Oct 23 00:00:04 magrathea kernel: c017d37f Oct 23 00:00:04 magrathea kernel: *pde = 00000000 Oct 23 00:00:04 magrathea kernel: Oops: 0000 [#5] Oct 23 00:00:04 magrathea kernel: SMP Oct 23 00:00:04 magrathea kernel: last sysfs file: /devices/pci0000:00/0000:00:00.0/class Oct 23 00:00:04 magrathea kernel: Modules linked in: xt_tcpudp xt_pkttype ipt_LOG xt_limit vboxdrv nfsd exportfs lockd nfs_acl sunrpc w83627hf hwmon_vid hwmon af_packet snd_pcm_oss snd_mixer_oss snd_seq_midi snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_seq ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6_tables x_tables apparmor nls_iso8859_1 nls_cp437 vfat fat fuse loop dm_mod nvidia(P) snd_emu10k1 firmware_class snd_ac97_codec snd_usb_audio snd_usb_lib ac97_bus snd_pcm button snd_timer pwc compat_ioctl32 snd_rawmidi snd_page_alloc usb_storage snd_seq_device snd_util_mem rtc_cmos videodev parport_pc via_ircc rtc_core snd_hwdep rtc_lib sr_mod snd ide_core irda crc_ccitt cdrom parport i2c_viapro ns558 shpchp emu10k1_gp v4l2_common via_agp v4l1_compat agpgart soundcore 3c59x pci_hotplug gameport i2c_core mii sg sd_mod uhci_hcd ehci_hcd ohci_hcd usbcore edd ext3 mbcache jbd fan pata_via libata aic7xx Oct 23 00:00:04 magrathea kernel: x scsi_transport_spi scsi_mod thermal processor Oct 23 00:00:04 magrathea kernel: CPU: 0 Oct 23 00:00:04 magrathea kernel: EIP: 0060:[<c017d37f>] Tainted: P N VLI Oct 23 00:00:04 magrathea kernel: EFLAGS: 00010202 (2.6.22.18-0.2-default #1) Oct 23 00:00:04 magrathea kernel: EIP is at lock_get_status+0x172/0x239 Oct 23 00:00:04 magrathea kernel: eax: 00000006 ebx: dc99901a ecx: 00000000 edx: d73bdeec Oct 23 00:00:04 magrathea kernel: esi: df7f03fc edi: 00000ec4 ebp: dc999000 esp: d73bdee4 Oct 23 00:00:04 magrathea kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Oct 23 00:00:04 magrathea kernel: Process lsof (pid: 2977, ti=d73bc000 task=d180d570 task.ti=d73bc000) Oct 23 00:00:04 magrathea kernel: Stack: dc999014 c0313c65 c031b8cc c03191ab d180d570 c0158205 df7f03fc df7f0400 Oct 23 00:00:04 magrathea kernel: 00000000 c017e2b9 c03191ab 00000000 d73bdf64 00000010 00000001 00000000 Oct 23 00:00:04 magrathea kernel: dc999000 00000400 00000400 00000400 dc999000 c01a2a00 00000400 c01a29f0 Oct 23 00:00:04 magrathea kernel: Call Trace: Oct 23 00:00:04 magrathea kernel: [<c0158205>] __alloc_pages+0x60/0x2d6 Oct 23 00:00:04 magrathea kernel: [<c017e2b9>] get_locks_status+0x50/0xf0 Oct 23 00:00:04 magrathea kernel: [<c01a2a00>] locks_read_proc+0x10/0x25 Oct 23 00:00:04 magrathea kernel: [<c01a29f0>] locks_read_proc+0x0/0x25 Oct 23 00:00:04 magrathea kernel: [<c01a134f>] proc_file_read+0x10b/0x245 Oct 23 00:00:04 magrathea kernel: [<c01a1244>] proc_file_read+0x0/0x245 Oct 23 00:00:04 magrathea kernel: [<c017150c>] vfs_read+0xa6/0x12e Oct 23 00:00:04 magrathea kernel: [<c01718ec>] sys_read+0x41/0x67 Oct 23 00:00:04 magrathea kernel: [<c0104e22>] sysenter_past_esp+0x6b/0xa9 Oct 23 00:00:04 magrathea kernel: ======================= Oct 23 00:00:04 magrathea kernel: Code: b8 31 c0 a8 01 75 05 ba cc b8 31 c0 89 d0 89 1c 24 89 44 24 08 c7 44 24 04 65 3c 31 c0 e8 a7 3d 05 00 8b 4e 18 01 c3 85 ff 74 38 <8b> 87 9c 00 00 00 8b 50 08 8b 47 20 89 4c 24 08 89 1c 24 c7 44 Oct 23 00:00:04 magrathea kernel: EIP: [<c017d37f>] lock_get_status+0x172/0x239 SS:ESP 0068:d73bdee4
-------------------------------------------
Oct 27 00:50:01 magrathea kernel: NVRM: Xid (0001:00): 12, COCOD 00000000 01011900 00000019 00000400 d73e09ff
-------------------------------------------
Oct 27 00:53:46 magrathea kernel: NVRM: Xid (0001:00): 12, COCOD 00000000 01011900 00000019 00000400 00000000
-------------------------------------------
The machine has an NVidia card, so the message above might be caused by the NVidia proprietary driver. I need this driver for serious projects (okay, Simcity 4 Rush Hour :-) )
As far as I can trace back, this started happening somewhere around the second week of October.
I'd be happy if someone can help me find out if this is a software problem, a hardware issue or perhaps both. I should mention that this machine is running openSUSE 10.3, with all the latest updates. It has been running flawlessly for several years, without any lock-ups or anything.
Kind regards,
Joop
Joop, I've been through this twice in 8 years. Both times it was hardware. Once RAM and secondly a motherboard. To help diagnose, load mcelog. It is contained on your install DVD and will help identify any machine check exceptions (MCE) you are dealing with. If it catches any, you are 99.9% assured your issue is hardware. (there are very rare instances where code can trigger a mce - possible - but so is McCain becoming the 44th president) Additionally, I don't know where I got it, but I have an SuSE Machine Check Handling on Linux document by Andi Kleen (2004) that helps explain this a little further. Your welcome to it at: http://www.3111skyline.com/download/linux/kernel/mce.pdf -- David C. Rankin, J.D., P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org