Carlos E. R. wrote:
On Friday, 2008-11-28 at 07:56 +0200, Dave Plater wrote:
There is a Yast module to set it up and it's supposed to trigger on an oops.
I saw http://en.opensuse.org/YaST/Modules/Kdump where it says:
] Proposal of new YaST module for configuration kdump
so I understood that the module didn't exist yet. Somebody forgot to update the wiki, then.
There is also an alt/sysrq/c keyboard short cut to activate it.
which will not work, system freezes. But I'll try. I'll have to write this on paper :-}
What I would do first with a problem such as your's is downgrade to a previous kernel,
I'm not sure it will work, though. I think there was a previous freeze a month ago, but i dismissed it as chance. It doesn't freeze inmedetially, anyway. It run for an hour or so.
reboot, save boot.msg and confirm that it is a kernel bug. I tried to get a crash dump with an oops I was getting when starting suspend but the problem went away when kdump was activated so I haven't actually seen it in action but it's simple to install and setup with yast.
I'll have to try... provided it doesn't crash while I run yast.
Just out of interest, what is the last line in /var/log/messages at the time your system freezes?
Nothing important. Let me see...
Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 36 Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 59 Nov 23 19:22:27 minas-morgul su: (to root) cer on /dev/pts/3 Nov 23 19:22:55 minas-morgul kernel: kjournald starting. Commit interval 5 seconds Nov 23 19:22:55 minas-morgul kernel: EXT3 FS on hdd6, internal journal Nov 23 19:22:55 minas-morgul kernel: EXT3-fs: mounted filesystem with ordered data mode.
here I had mounted the "stable" root partition
Nov 23 19:22:57 minas-morgul kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, dmapi support, no debug enabled Nov 23 19:22:57 minas-morgul kernel: SGI XFS Quota Management subsystem Nov 23 19:22:57 minas-morgul kernel: XFS mounting filesystem hda11 Nov 23 19:22:57 minas-morgul kernel: Ending clean XFS mount for filesystem: hda11
and here the stable (ie, main) home partition, to read somethings there.
Nov 23 19:25:40 minas-morgul syslog-ng[2139]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=233', processed='center(received)=189', processed='destination(newsnotice)=0', processed='destination(acpid)=3', processed='destination(firewall)=15', processed='destination(null)=3', processed='destination(mail)=2', processed='destination(mailinfo)=2', processed='destination(console)=11', processed='destination(newserr)=0', processed='destination(newscrit)=0', processed='destination(messages)=166', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(netmgm)=0', processed='destination(mailerr)=0', processed='destination(xconsole)=11', processed='destination(warn)=20', processed='source(src)=189'
Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 107 to 104 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 62 to 55 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 57 to 59 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 27 to 28 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
But the firewall log contains much later entries at Nov 23 19:51:34, so nothing above is related to the freeze.
-- Cheers, Carlos E. R.
There's a program to log exception faults called mcelog you may be able to get some more specific information from that. Read /usr/src/linux/Documentation/kernel-parameters.txt to find various boot parameters for the kernel, there is one called "debug" which apparently increases log verbosity to 10 which may help. I assume you have done a memtest as well? If after downgrading the kernel the problem disappears you can try a diff between the kernel source .configs after make cloneconfig and you may find boot options to remove any new kernel options with the problem kernel. Lastly you may get better help at opensuse-kernel list. Regards Dave P -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org