[opensuse-factory] Full system freeze - how can be investigated?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, My factory beta 5.2 freezes completely, randomly. I assume it is the kernel. There is nothing in logs. I opened a bugzilla, but they ask for more info. I have none, they don't say what I can do. Beagle was running, and main partition is reiser, but they haven't commented on this possibility; 11.0 crashed because of this when it was released - is this problem still around? How can I obtain more info? Forget about the typical memory, disk, whatever hardware issues: the system is solid under 11.0, it is 11.1 which locks, thus it is a software issue of 11.1, supossedly the kernel (disk and network lights stopped, key-cap led stuck). They pointed me to "http://en.opensuse.org/Bugs/Kernel", but there is nothing there I can use, AFAIK. I assume I could connect the machine via serial port to another, and make kernel log everything there. But how? I'm currently re-compiling the kernel, to enable "CONFIG_DETECT_SOFTLOCKUP=y", but this will only print to the screen, and which screen? As X will be locked, there is no way I can look anywhere else but the place where I was when it locked... - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkt1DAACgkQtTMYHG2NR9U8hwCePV/J2m/K5O/4NoKvxoy/GBvr 1fkAn2aT96oiIdEU+GBC3H+ZE4YfP2nB =vRTa -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Carlos E. R. wrote:
My factory beta 5.2 freezes completely, randomly. I assume it is the kernel. There is nothing in logs.
I assume I could connect the machine via serial port to another, and make kernel log everything there. But how?
I's suggest that you do here what we often have to do for installation problems: 1. On another networked machine set up a writeable share using NFS (or maybe Samba if you like). I assume you know how to do this! Permissions can be fiddly, I usually end up with 777. For installations, you will need to create a directory YaST2 too. 2. On the target machine, if installing set it to start a shell before running YaST. If installation is already complete, you could perhaps add a shell to /etc/init.d/boot.local or - if you can get to a prompt before problems appear - just do it manually. 3. Then in that shell mount the NFS share exported above over the appropriate section of /var/log (or maybe over all of it, though you may need to create an empty directory structure first). You may also need option "-o nolock" especially during installation. Again, I assume you know how to do all this. It may be worth testing before exiting the shell e.g. by touching a new file and / or appending something to /var/log/messages using cat or echo. See note above regarding permissions. A more sophisticated approach might be to mount the share elsewhere and tee specific files to it. 4. Quit the shell, and watch for the log files to appear! 5. Try to unmount it nicely if you can... Hope this is useful -- Cheers Richard (MQ) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 08:02 -0000, Richard (MQ) wrote:
Carlos E. R. wrote:
My factory beta 5.2 freezes completely, randomly. I assume it is the kernel. There is nothing in logs.
I assume I could connect the machine via serial port to another, and make kernel log everything there. But how?
I's suggest that you do here what we often have to do for installation problems:
1. On another networked machine set up a writeable share using NFS (or maybe Samba if you like). I assume you know how to do this! Permissions can be fiddly, I usually end up with 777. For installations, you will need to create a directory YaST2 too.
No. Absolutely not! This method is useless for kernel crashes. Network is down! If the kernel were able to write on a remote filesystem, it would also be able to write on a local one - and it is not, there is nothing in the logs. The only chance is the serial port, because it is a very low-level system. It is the recomended method by the kernel people... but the page I have been pointed at in the bugzilla doesn't explain it. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkuk8cACgkQtTMYHG2NR9WZXACfS07pknWxVMzFCSPWza6D7PLc A1gAn0RhYXvnw1zu4mbUf2BZP35pXxrt =ZuUu -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Carlos E. R. wrote:
The only chance is the serial port, because it is a very low-level system. It is the recomended method by the kernel people... but the page I have been pointed at in the bugzilla doesn't explain it.
Actually it does: http://en.opensuse.org/Bugs/Kernel#Using_the_serial_console -- Best Regards / S pozdravom, Pavol RUSNAK SUSE LINUX, s.r.o Package Maintainer Lihovarska 1060/12 PGP 0xA6917144 19000 Praha 9, CR prusnak[at]suse.cz http://www.suse.cz -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 14:14 +0100, Pavol Rusnak wrote:
Carlos E. R. wrote:
The only chance is the serial port, because it is a very low-level system. It is the recomended method by the kernel people... but the page I have been pointed at in the bugzilla doesn't explain it.
Actually it does:
No, it doesn't. It says: ] Once you've hooked the cable up you should add 'console=ttyS0,115200 ] console=tty0' to the command-line of the debuggee. This causes all ] console message to be sent to ttyS0 as well as well as the standard ] console. The last console= parameter determines where the console input ] should be handled from; so if you want to use the serial console to ] accept input also you'll have to exchange those parameters. AFAIK, that will dump tty0 to the serial port and the other machine. What needs to be dumped is all kernel messages, not the console number zero. I'm not working on tty0, I'm on X, anyways. I need to know if there are kernel messages, those will not go to tty0. I need everything that klogd gets. As a continuous dmesg. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkup/8ACgkQtTMYHG2NR9WwogCgmMfL8iLDuub45CUehrD8IR/G 6XkAn3AnTjf3BYCqf4o+bOu9+KkRTILg =3qV6 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Thursday 27 November 2008 10:02:03 Richard (MQ) wrote:
Carlos E. R. wrote:
My factory beta 5.2 freezes completely, randomly. I assume it is the kernel. There is nothing in logs.
I assume I could connect the machine via serial port to another, and make kernel log everything there. But how?
I's suggest that you do here what we often have to do for installation problems:
1. On another networked machine set up a writeable share using NFS (or maybe Samba if you like). I assume you know how to do this! Permissions can be fiddly, I usually end up with 777. For installations, you will need to create a directory YaST2 too.
2. On the target machine, if installing set it to start a shell before running YaST. If installation is already complete, you could perhaps add a shell to /etc/init.d/boot.local or - if you can get to a prompt before problems appear - just do it manually.
3. Then in that shell mount the NFS share exported above over the appropriate section of /var/log (or maybe over all of it, though you may need to create an empty directory structure first). You may also need option "-o nolock" especially during installation. Again, I assume you know how to do all this.
It may be worth testing before exiting the shell e.g. by touching a new file and / or appending something to /var/log/messages using cat or echo. See note above regarding permissions.
A more sophisticated approach might be to mount the share elsewhere and tee specific files to it.
4. Quit the shell, and watch for the log files to appear!
5. Try to unmount it nicely if you can...
That is such a baroque improvisation for remote logging! This wheel has been invented before, and it's a little rounder that this :-) syslogd has an option that allows it to receive logs from a remote system. See the man page of syslog and /etc/sysconfig/syslog The thing is that openSUSE uses syslog-ng since a few versions ago. I don't know how to do remote logging with syslog-ng, never needed it. You either study that or simply replace syslog-ng with syslog classic and get the job done. Yes, I know that it doesn't use real authentication and such, but it's simple and quick on a test system. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 15:17 +0200, Silviu Marin-Caea wrote:
That is such a baroque improvisation for remote logging! This wheel has been invented before, and it's a little rounder that this :-)
That implementation is done when you want to get the direct file logs that Yast creates during installation; as the installation system runs from a DVD, the logs are made to memory, and only if installation succeeds they are copied to the final installed system, on disk. An alternative is to log those to an usb stick, I think. But it is of no use for kernel debugging.
syslogd has an option that allows it to receive logs from a remote system. See the man page of syslog and /etc/sysconfig/syslog
Correct. I do receive syslog messages from my router in this system, as a matter of fact. However, during a kernel failure the network is not guaranteed to work... and it doesn't. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkuq6cACgkQtTMYHG2NR9USSQCfbDQOZXr1RgD1XIeL6Ln5hqBl JmoAoJI7/8lciBNlHcPYvxXs6YyWWe0l =6ePM -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Carlos E. R. wrote:
On Thursday, 2008-11-27 at 15:17 +0200, Silviu Marin-Caea wrote:
That is such a baroque improvisation for remote logging! This wheel has been invented before, and it's a little rounder that this :-)
That implementation is done when you want to get the direct file logs that Yast creates during installation; as the installation system runs from a DVD, the logs are made to memory, and only if installation succeeds they are copied to the final installed system, on disk. An alternative is to log those to an usb stick, I think.
As you say - /var/log is on ram-disc during installation and readily lost if things go wrong. This "baroque improvisation" is actually quite easy to implement, and it does work. I can't see how syslog-ng might be used here, the YaST messages aren't readily routed to the network are they?
But it is of no use for kernel debugging.
syslogd has an option that allows it to receive logs from a remote system. See the man page of syslog and /etc/sysconfig/syslog
Correct. I do receive syslog messages from my router in this system, as a matter of fact. However, during a kernel failure the network is not guaranteed to work... and it doesn't.
I hadn't realised that you wanted some tool to allow you to talk to the machine _after_ the kernel crash. Indeed my suggestion is quite useless for that. I wasn't aware that such a thing was even possible (and also surprised that it might be of any use, after the event). -- Cheers Richard (MQ) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 15:39 -0000, Richard (MQ) wrote:
Carlos E. R. wrote:
As you say - /var/log is on ram-disc during installation and readily lost if things go wrong. This "baroque improvisation" is actually quite easy to implement, and it does work. I can't see how syslog-ng might be used here, the YaST messages aren't readily routed to the network are they?
No, they aren't.
But it is of no use for kernel debugging.
syslogd has an option that allows it to receive logs from a remote system. See the man page of syslog and /etc/sysconfig/syslog
Correct. I do receive syslog messages from my router in this system, as a matter of fact. However, during a kernel failure the network is not guaranteed to work... and it doesn't.
I hadn't realised that you wanted some tool to allow you to talk to the machine _after_ the kernel crash. Indeed my suggestion is quite useless for that. I wasn't aware that such a thing was even possible (and also surprised that it might be of any use, after the event).
No, I do not want to talk to the machine afterwards. It is sometimes possible, but not this one. What I want is to catch the kernel panic message or the oops. These messages are usually printed to tty10 and syslog, but as the system has crashed, they are not recorded. They can be catched, though, if you are looking at console 10 at the precise moment it crashes, or if you have that console remotely through a serial port - not a network connection. Please have a look here: http://en.opensuse.org/Bugs/Kernel#Using_the_serial_console that document falls very short of explaining all that is needed for someone that has never done it, but it is the only document they have pointed me to. My only hope is that 11.1 final starts crashing on people when it gets released and thousands of people use it: then I'll get help. Too late. :-( - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkuyXcACgkQtTMYHG2NR9V3TACeId0TDxJn1bUsnhxH+h8S8bqK qLAAn0ZhbMQJksn6TDKZ3Nz7FaFJmzOQ =Fd4G -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
I think that kdump is what you want, it doesn't work with suspend but it should work in your case. There's a wiki entry on it as well, it works with kexec to load a new kernel when the running one crashes. If that fails you'll need to use a debug kernel and kdb. Regards Dave P -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Dave Plater wrote:
I think that kdump is what you want, it doesn't work with suspend but it should work in your case. There's a wiki entry on it as well, it works with kexec to load a new kernel when the running one crashes. If that fails you'll need to use a debug kernel and kdb.
Or what about netcat? firsthost# tail -f /var/log/messages | netcat otherhost 2222 otherhost$ netcat -l -p 2222 -- Cheers Richard (MQ) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 18:28 -0000, Richard (MQ) wrote:
Or what about netcat?
firsthost# tail -f /var/log/messages | netcat otherhost 2222
otherhost$ netcat -l -p 2222
That will not work on a crashed system. You are piping what is already written to the filesystem, to another computer. If something is written to that file, I would see it after rebooting, anyway... and there is nothing. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkku8o8ACgkQtTMYHG2NR9X3OwCfVTnfGCi0mZgBE5WkgjIRDhiz QkwAn0Q7b7B9j1ofOY9O26vssqL8bEc+ =bRPI -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 18:49 +0200, Dave Plater wrote:
I think that kdump is what you want, it doesn't work with suspend but it should work in your case. There's a wiki entry on it as well, it works with kexec to load a new kernel when the running one crashes. If that fails you'll need to use a debug kernel and kdb.
I don't understand. I have read the article, if it is http://en.opensuse.org/Kdump It says: 1. Add the crashkernel=size@offset option to your bootloader configuration. See the table below for recommended options for the different architectures. Then reboot. 2. Load the panic kernel with kexec -p vmlinuz --append="command_line" --initrd="initrd". The command line should include the root file system (root=), the irqpoll and reset_devices options and should end with the runlevel you would like to have in the kdump environment (preferably 1). 3. Now crash the kernel, for testing you can use Sysrq-c. 4. In the kdump environment, copy /proc/vmcore away. 5. Reboot. Step 1. Where exactly do I add the "crashkernel=64M@16M" option? Here, inside the kernel line? ###Don't change this comment - YaST2 identifier: Original name: linux### title openSUSE 11.1 Beta 5.2 - 2.6.27.7-3 root (hd0,7) kernel /vmlinuz-2.6.27.7-3-pae root=/dev/disk/by-label/160_test resume=/dev/disk/by-label/320_swap splash=silent showopts vga=0x317 initrd /initrd-2.6.27.7-3-pae Step 2. Does it means that I have to manually boot the panic kernel? In a text console, or how? What exact command do I have to type? Where? I don't understand that paragraph, to me is like reading chinese. Step 4. When the kernel freezes, ¿how do I go to the kdump environment? The kernel is frozen, nothing works. No keyboard. How do I go to that kernel? What commands do I have available? How do I "copy /proc/vmcore away"? To where? The system is frozen. That kernel only has 64mb, I assume I will not have a "dd" command, I suppose. I'm sorry, all that kdump thing looks very nice, but is only usable by people that already know how to make use of it or are kernel devs. I don't know how to use all that. I just want to provide the bugzilla kernel people the information they request, I'm not a dev. And they don't help at all :-( As it is, I'll have to wait till some big somebody has a problem and steps in. Meanwhile, I'l have to wait till 11.2. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkku97cACgkQtTMYHG2NR9WOgACePM/GkHE5pLKpV2A5djzJZdd3 aecAoIy/p6S80roziyi9koJlLSlS10yC =37It -----END PGP SIGNATURE-----
Carlos E. R. wrote:
On Thursday, 2008-11-27 at 18:49 +0200, Dave Plater wrote:
I think that kdump is what you want, it doesn't work with suspend but it should work in your case. There's a wiki entry on it as well, it works with kexec to load a new kernel when the running one crashes. If that fails you'll need to use a debug kernel and kdb.
I don't understand.
I have read the article, if it is http://en.opensuse.org/Kdump
It says:
1. Add the crashkernel=size@offset option to your bootloader configuration. See the table below for recommended options for the different architectures. Then reboot. 2. Load the panic kernel with kexec -p vmlinuz --append="command_line" --initrd="initrd". The command line should include the root file system (root=), the irqpoll and reset_devices options and should end with the runlevel you would like to have in the kdump environment (preferably 1). 3. Now crash the kernel, for testing you can use Sysrq-c. 4. In the kdump environment, copy /proc/vmcore away. 5. Reboot.
yast2-kdump module sets everything up.
Step 1.
Where exactly do I add the "crashkernel=64M@16M" option? Here, inside the kernel line?
AFAIK it's the memory space reserved for the panic kernel.
###Don't change this comment - YaST2 identifier: Original name: linux### title openSUSE 11.1 Beta 5.2 - 2.6.27.7-3 root (hd0,7) kernel /vmlinuz-2.6.27.7-3-pae root=/dev/disk/by-label/160_test resume=/dev/disk/by-label/320_swap splash=silent showopts vga=0x317 initrd /initrd-2.6.27.7-3-pae
Step 2.
Does it means that I have to manually boot the panic kernel? In a text console, or how? What exact command do I have to type? Where?
I don't understand that paragraph, to me is like reading chinese.
yast kdump sets everything up.
Step 4.
When the kernel freezes, ¿how do I go to the kdump environment? The kernel is frozen, nothing works. No keyboard. How do I go to that kernel? What commands do I have available? How do I "copy /proc/vmcore away"? To where? The system is frozen. That kernel only has 64mb, I assume I will not have a "dd" command, I suppose.
In the yast module, you can set the path for the dump. Using the yast module helps a bit to understand kdump's mechanisms.
I'm sorry, all that kdump thing looks very nice, but is only usable by people that already know how to make use of it or are kernel devs. I don't know how to use all that. I just want to provide the bugzilla kernel people the information they request, I'm not a dev. And they don't help at all :-(
As it is, I'll have to wait till some big somebody has a problem and steps in. Meanwhile, I'l have to wait till 11.2.
-- Cheers, Carlos E. R.
There is a Yast module to set it up and it's supposed to trigger on an oops. There is also an alt/sysrq/c keyboard short cut to activate it. What I would do first with a problem such as your's is downgrade to a previous kernel, reboot, save boot.msg and confirm that it is a kernel bug. I tried to get a crash dump with an oops I was getting when starting suspend but the problem went away when kdump was activated so I haven't actually seen it in action but it's simple to install and setup with yast. Just out of interest, what is the last line in /var/log/messages at the time your system freezes? Regards Dave p -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Friday, 2008-11-28 at 07:56 +0200, Dave Plater wrote:
There is a Yast module to set it up and it's supposed to trigger on an oops.
I saw http://en.opensuse.org/YaST/Modules/Kdump where it says: ] Proposal of new YaST module for configuration kdump so I understood that the module didn't exist yet. Somebody forgot to update the wiki, then.
There is also an alt/sysrq/c keyboard short cut to activate it.
which will not work, system freezes. But I'll try. I'll have to write this on paper :-}
What I would do first with a problem such as your's is downgrade to a previous kernel,
I'm not sure it will work, though. I think there was a previous freeze a month ago, but i dismissed it as chance. It doesn't freeze inmedetially, anyway. It run for an hour or so.
reboot, save boot.msg and confirm that it is a kernel bug. I tried to get a crash dump with an oops I was getting when starting suspend but the problem went away when kdump was activated so I haven't actually seen it in action but it's simple to install and setup with yast.
I'll have to try... provided it doesn't crash while I run yast.
Just out of interest, what is the last line in /var/log/messages at the time your system freezes?
Nothing important. Let me see... Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 36 Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 59 Nov 23 19:22:27 minas-morgul su: (to root) cer on /dev/pts/3 Nov 23 19:22:55 minas-morgul kernel: kjournald starting. Commit interval 5 seconds Nov 23 19:22:55 minas-morgul kernel: EXT3 FS on hdd6, internal journal Nov 23 19:22:55 minas-morgul kernel: EXT3-fs: mounted filesystem with ordered data mode. here I had mounted the "stable" root partition Nov 23 19:22:57 minas-morgul kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, dmapi support, no debug enabled Nov 23 19:22:57 minas-morgul kernel: SGI XFS Quota Management subsystem Nov 23 19:22:57 minas-morgul kernel: XFS mounting filesystem hda11 Nov 23 19:22:57 minas-morgul kernel: Ending clean XFS mount for filesystem: hda11 and here the stable (ie, main) home partition, to read somethings there. Nov 23 19:25:40 minas-morgul syslog-ng[2139]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=233', processed='center(received)=189', processed='destination(newsnotice)=0', processed='destination(acpid)=3', processed='destination(firewall)=15', processed='destination(null)=3', processed='destination(mail)=2', processed='destination(mailinfo)=2', processed='destination(console)=11', processed='destination(newserr)=0', processed='destination(newscrit)=0', processed='destination(messages)=166', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(netmgm)=0', processed='destination(mailerr)=0', processed='destination(xconsole)=11', processed='destination(warn)=20', processed='source(src)=189' Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 107 to 104 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 62 to 55 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 57 to 59 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 27 to 28 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35 But the firewall log contains much later entries at Nov 23 19:51:34, so nothing above is related to the freeze. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkv4twACgkQtTMYHG2NR9WGXACfbLyfxncKCE6tlqg8cy2FiCg8 TQMAnjs02W0M2sYIpYCTyXaiDAocpWsg =iXpz -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
Carlos E. R. wrote:
On Friday, 2008-11-28 at 07:56 +0200, Dave Plater wrote:
There is a Yast module to set it up and it's supposed to trigger on an oops.
I saw http://en.opensuse.org/YaST/Modules/Kdump where it says:
] Proposal of new YaST module for configuration kdump
so I understood that the module didn't exist yet. Somebody forgot to update the wiki, then.
There is also an alt/sysrq/c keyboard short cut to activate it.
which will not work, system freezes. But I'll try. I'll have to write this on paper :-}
What I would do first with a problem such as your's is downgrade to a previous kernel,
I'm not sure it will work, though. I think there was a previous freeze a month ago, but i dismissed it as chance. It doesn't freeze inmedetially, anyway. It run for an hour or so.
reboot, save boot.msg and confirm that it is a kernel bug. I tried to get a crash dump with an oops I was getting when starting suspend but the problem went away when kdump was activated so I haven't actually seen it in action but it's simple to install and setup with yast.
I'll have to try... provided it doesn't crash while I run yast.
Just out of interest, what is the last line in /var/log/messages at the time your system freezes?
Nothing important. Let me see...
Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 36 Nov 23 18:56:14 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 59 Nov 23 19:22:27 minas-morgul su: (to root) cer on /dev/pts/3 Nov 23 19:22:55 minas-morgul kernel: kjournald starting. Commit interval 5 seconds Nov 23 19:22:55 minas-morgul kernel: EXT3 FS on hdd6, internal journal Nov 23 19:22:55 minas-morgul kernel: EXT3-fs: mounted filesystem with ordered data mode.
here I had mounted the "stable" root partition
Nov 23 19:22:57 minas-morgul kernel: SGI XFS with ACLs, security attributes, realtime, large block numbers, dmapi support, no debug enabled Nov 23 19:22:57 minas-morgul kernel: SGI XFS Quota Management subsystem Nov 23 19:22:57 minas-morgul kernel: XFS mounting filesystem hda11 Nov 23 19:22:57 minas-morgul kernel: Ending clean XFS mount for filesystem: hda11
and here the stable (ie, main) home partition, to read somethings there.
Nov 23 19:25:40 minas-morgul syslog-ng[2139]: Log statistics; dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', processed='center(queued)=233', processed='center(received)=189', processed='destination(newsnotice)=0', processed='destination(acpid)=3', processed='destination(firewall)=15', processed='destination(null)=3', processed='destination(mail)=2', processed='destination(mailinfo)=2', processed='destination(console)=11', processed='destination(newserr)=0', processed='destination(newscrit)=0', processed='destination(messages)=166', processed='destination(mailwarn)=0', processed='destination(localmessages)=0', processed='destination(netmgm)=0', processed='destination(mailerr)=0', processed='destination(xconsole)=11', processed='destination(warn)=20', processed='source(src)=189'
Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 107 to 104 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 62 to 55 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 57 to 59 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 27 to 28 Nov 23 19:26:15 minas-morgul smartd[3727]: Device: /dev/hdd, SMART Usage Attribute: 194 Temperature_Celsius changed from 36 to 35
But the firewall log contains much later entries at Nov 23 19:51:34, so nothing above is related to the freeze.
-- Cheers, Carlos E. R.
There's a program to log exception faults called mcelog you may be able to get some more specific information from that. Read /usr/src/linux/Documentation/kernel-parameters.txt to find various boot parameters for the kernel, there is one called "debug" which apparently increases log verbosity to 10 which may help. I assume you have done a memtest as well? If after downgrading the kernel the problem disappears you can try a diff between the kernel source .configs after make cloneconfig and you may find boot options to remove any new kernel options with the problem kernel. Lastly you may get better help at opensuse-kernel list. Regards Dave P -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
On Thursday 27 November 2008 08:16:05 am Carlos E. R. wrote:
On Thursday, 2008-11-27 at 15:17 +0200, Silviu Marin-Caea wrote:
That is such a baroque improvisation for remote logging! This wheel has been invented before, and it's a little rounder that this :-)
That implementation is done when you want to get the direct file logs that Yast creates during installation; as the installation system runs from a DVD, the logs are made to memory, and only if installation succeeds they are copied to the final installed system, on disk. An alternative is to log those to an usb stick, I think.
But it is of no use for kernel debugging.
Not for kernel that breaks very early, but if you mount /var as NFS share, or USB stick, then logs will survive installation crash. -- Regards, Rajko -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thursday, 2008-11-27 at 09:55 -0600, Rajko M. wrote:
But it is of no use for kernel debugging.
Not for kernel that breaks very early, but if you mount /var as NFS share, or USB stick, then logs will survive installation crash.
Yes, if it managed to write anything, and the network is working after the oops. And it doesn't, afaik. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAkkuxywACgkQtTMYHG2NR9WsRwCeK794mhMRRaF1lDm+Brtkg5Vj EaUAnR6888vE9eVYwzsWG+dDrjnvQvMs =XuAG -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-factory+help@opensuse.org
participants (7)
-
Carlos E. R.
-
Carlos E. R.
-
Dave Plater
-
Pavol Rusnak
-
Rajko M.
-
Richard (MQ)
-
Silviu Marin-Caea