I have a system that seems to work properly but crashes after some time if the disks are in continual use. I don't know how to isolate the problem and am looking for any pointers people can give me. The system has an Intel mobo with an 875P chipset, which includes an ICH5 controller, and three 120GB disks - a ST3120022A (PATA) and two ST3120026AS (SATA). SuSE 9.1 is installed on hda1 (10GB). There are large data partitions on hda, sda and sdb. There are also 10GB swap partitions on hda and sda. The BIOS is in enhanced mode. The kernel is 2.6.4-52-smp (the later kernel doesn't work). There's a single 2.8 GHz P4. I've left it running rsync copying from a large data source and after some hours the machine just hangs (have to press the hardware reset button to get it to reboot - power-off doesn't work). After the reboot, reiser replays the logs and everything seems to be OK. I've repeated this with rsync saving the files to /dev/hda and to /dev/sda. I've also seen the same thing happen using wget instead of rsync. I also saw something similar with Suse 9.0 running the disks in legacy mode. There's nothing in /var/log/messages. Seems to me this could be a hardware fault on the mobo or perhaps a disk; it could be a hardware design fault; it could be a kernel, libata or resiserfs bug. Any other possibilities? Does anybody have any thoughts on how to track down the problem? Thanks and regards, Dave
The Tuesday 2004-09-14 at 11:42 +0100, Dave Howorth wrote:
Does anybody have any thoughts on how to track down the problem?
No... You could try to watch the kernel log file, but it might not get written, if there is some filesystem problem, as you think. You might leave your PC rsyncing on terminal F10 till it hangs - hopefully you will see something there. Perhaps add this line to '/etc/syslog.conf': kern.* /dev/tty11 to see all kernel messages in F11 instead. I also think it is possible to log to a serial port and watch on another computer: that is unmapped territory for me. -- Cheers, Carlos Robinson
Carlos E. R. wrote:
I also think it is possible to log to a serial port and watch on another computer: that is unmapped territory for me.
Of course, as soon as I started watching, nothing happened for several days. Now the system has crashed again and I'm trying to set up a serial line from the problem box to another. I have a couple of questions :) (1) It's been a while since I played with serial ports on unix. I've got the link kind of working but with an odd symptom that I'd like to fix. I've connected a serial cable between the problem box (Suse 9.1) and another box (Debian woody, sorry :) Using terminal windows, on both boxes I've typed: stty raw < /dev/ttyS0 Then on one machine I type cat > /dev/ttyS0 and on the other cat < /dev/ttyS0 So far so good. When I type a line of text on the > box, I see it (including a ^J) on the < box, repeated lots of times. Then it settles down and anything more that I type is faithfully reproduced on the other box. It doesn't matter which box I use as source. Does anybody know what causes this? (2) The next thing I need to do is configure kernel messages to be sent to the serial line. Does anybody know whether it is as simple as adding a line in /etc/syslog.conf: kern.*;*.err /dev/ttyS0 and then kill -HUP `cat /var/run/syslogd.pid` Cheers, Dave
On Thu, 2004-09-16 at 09:41, Dave Howorth wrote:
Carlos E. R. wrote:
(2) The next thing I need to do is configure kernel messages to be sent to the serial line. Does anybody know whether it is as simple as adding a line in /etc/syslog.conf:
kern.*;*.err /dev/ttyS0
and then kill -HUP `cat /var/run/syslogd.pid`
Cheers, Dave
This should work but you will also need to setup syslog on the receiving end to accept messages from other machines. Also rcsyslog reload will tell syslog to reread it's config file. Another way to send the messages if the two machines are networked (from the man page) Remote Machine This syslogd(8) provides full remote logging, i.e. is able to send messages to a remote host running syslogd(8) and to receive messages from remote hosts. The remote host won't forward the message again, it will just log them locally. To forward messages to another host, prepend the hostname with the at sign (‘‘@''). Using this feature you're able to control all syslog mes sages on one host, if all other machines will log remotely to that. This tears down administration needs. -- Ken Schneider unix user since 1989 linux user since 1994 SuSE user since 1998 (5.2) * PLEASE only reply to the list *
Ken Schneider wrote:
(2) The next thing I need to do is configure kernel messages to be sent to the serial line. Does anybody know whether it is as simple as adding a line in /etc/syslog.conf:
kern.*;*.err /dev/ttyS0 and then kill -HUP `cat /var/run/syslogd.pid`
This should work but you will also need to setup syslog on the receiving end to accept messages from other machines.
Well I was just planning to cat < /dev/ttyS0 :) KISS
Also rcsyslog reload will tell syslog to reread it's config file.
Thanks, that's useful.
Another way to send the messages if the two machines are networked
I wouldn't have be so confident that the last message will make it through the network stack as the kernel is crashing as I am that it will make it down a serial cable :) Cheers, Dave
The 2004-09-16 at 14:41 +0100, Dave Howorth wrote:
Of course, as soon as I started watching, nothing happened for several days. Now the system has crashed again and I'm trying to set up a serial line from the problem box to another. I have a couple of questions :)
Murphy's rules! ;-)
(1) It's been a while since I played with serial ports on unix. I've got the link kind of working but with an odd symptom that I'd like to fix. I've connected a serial cable between the problem box (Suse 9.1) and another box (Debian woody, sorry :) Using terminal windows, on both boxes I've typed: stty raw < /dev/ttyS0 Then on one machine I type cat > /dev/ttyS0 and on the other cat < /dev/ttyS0
Mmmm... I have to confess. I have never tried anything like that: my serial ports experiments were done with PCs running msdos, breadboards and chips, cp/m type machines, 68000 processors on vme bus... but never on unix or with cats or dogs ;-)
So far so good. When I type a line of text on the > box, I see it (including a ^J) on the < box, repeated lots of times.
Handshaking faulty, I'd guess. It's got to be bidirectional to work.
Then it settles down and anything more that I type is faithfully reproduced on the other box. It doesn't matter which box I use as source. Does anybody know what causes this?
Nope, not sure. But I would simply use "minicom" at the receiving end. It can display and log to disk (I forgot how), and that is handy.
(2) The next thing I need to do is configure kernel messages to be sent to the serial line. Does anybody know whether it is as simple as adding a line in /etc/syslog.conf:
kern.*;*.err /dev/ttyS0
That would certainly work, as long as syslog lasts a few seconds more than the kernel. I think there is a low level thing, kernel level, I mean... unfortunately, today I'm at an old machine, pentium 120 with SuSE 7.3, I can't look at a recent kernel. Look at the "kernel debugging", the last item in menuconfig. I think I saw it there. If Murphy's looking the other way, you will not need to recompile.
and then kill -HUP `cat /var/run/syslogd.pid`
Kill syslog? What for? I got lost here. [...] I suggested the serial port thing hoping somebody would know and explain to us. However, I'm founding references on the kernel documentation. For example, look at this paragraph (oops-tracing.txt): (2) Boot with a serial console (see Documentation/serial-console.txt), run a null modem to a second machine and capture the output there using your favourite communication program. Minicom works well. That looks promising. Then, serial-console.txt says: To use a serial port as console you need to compile the support into your kernel - by default it is not compiled in. For PC style serial ports it's the config option next to "Standard/generic (dumb) serial support". You must compile serial support into the kernel and not as a module. If it is compiled, a boot option enables and configures it. Read that file, it explains a lot. I haven't searched to see if SuSE compiled that option on their kernels. Also, there is klogd (Kernel Log Daemon). It can dump to a file, and that file could be a serial port, I suppose. Not mentioned at the man page. Well... I'm closing for the day, ie, I'm going to sleep. I hope one of these ideas is good enough to help you, or to suggest another. In any case, I'm interested in learning the outcome. :-) Good hunting :-) -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The 2004-09-16 at 14:41 +0100, Dave Howorth wrote:
[snip]
I suggested the serial port thing hoping somebody would know and explain to us. However, I'm founding references on the kernel documentation. For example, look at this paragraph (oops-tracing.txt):
(2) Boot with a serial console (see Documentation/serial-console.txt), run a null modem to a second machine and capture the output there using your favourite communication program. Minicom works well.
That looks promising. Then, serial-console.txt says:
To use a serial port as console you need to compile the support into your kernel - by default it is not compiled in. For PC style serial ports it's the config option next to "Standard/generic (dumb) serial support". You must compile serial support into the kernel and not as a module.
While we are at it.. lsmod gives a list of installed modules. Is there something to see the *compiled-in* modules? Maybe inside System.map? [snip] Bye, Ermanno Polli
The 2004-09-17 at 11:56 +0200, Ermanno Polli wrote:
While we are at it.. lsmod gives a list of installed modules. Is there something to see the *compiled-in* modules?
Er... all compiled modules should be installed somewhere under /lib/modules/kernel-version/* If you need to differentiate between those made as modules and those statically linked, I don't know. Notice that the main purpose of changing the "extraversion" variable at compile time is to create a different tree under a different subdirectory, so that the original modules (from the binary rpm from SuSE) and yours do not mix. If you did not take that precaution, the tree might contain modules from different sources, mixed, and thus, incorrect. If this answer is insufficient, you should post your own new question under a subject that people knowing about it can see your question at a glance. -- Cheers, Carlos Robinson
participants (4)
-
Carlos E. R.
-
Dave Howorth
-
Ermanno Polli
-
Ken Schneider