RAID (?) Installation problem
Suggestions needed: I have just installed SUSE 7.3 on two, identical servers. RAID is used (details below). Both systems are 'as installed' with no users, no new processes or whatever. I just install, wait and . . . After about 1/2 hour the 'bell' starts ringing (nowadays a whistle from the speaker). It's as if I were holding down a key on the keyboard and the keyboard buffer were overflowing. For about a minute after the whistle starts I can still use the system. There have been no new messages in /var/log/messages during this last 20 minutes. After about 1 minute of the whistling the entire machine locks up. The display seems to lose signal, there's no response from the keyboard, no disk activity. I've left it like this for > 1/2 hour - still the whistle. On a 'firm' reboot (using the RESTART button on the case) thr whistle continues, even as the machine and then linux reboot OK. Only a -power off, wait 30 secs, power on - puts the hardware back to a normal state. When it reboots the RAID system is said to be out of sync, so it spends 15 minutes re-syncing the (two) RAID partitions. If I reboot immediately after the RAID sync is complete the machine restarts OK with the RAID still in sync, but if I wait until the whistle starts again (5 minutes or so after the RAID sync has finished) and do a controlled reboot at that time (before the hardware locks up) the RAID is out of sync again when it reboots. Let me say again, this is happening on TWO identical machines, both were previously running RAID OK under RedHat 7.0, one of them then had SUSE7.2 on it with no problems (not with RAID). The installation of Suse 7.3 was complete - all disks were re-partitioned and re-formatted. The package load was 'minimum' + java2, iptables and netdate - with none of these three having been executed at all. I simply installled and left the machine idle and the whistle started. It feels as if the 'software problem' is that something starts sending characters to the hardware's keyboard buffer until it overflows, and that all the other physical things are the result of the hardware reacting - but what's causing the initial problem, 5 minutes or so after RAID resync is complete? I've been round the 'start, RAID resync, whistle, swear, reboot' cycle about 20 times now, on two machines which worked perfectly before this. Of course, it could be nothing to do with RAID, it might be something else that takes 20 minutes or so to cut in. The onset of the whistle does not seem to be related to any 'cron' activity and I can see no logged activity around the time the problem starts - none of the files in /var/log have a 'last update' timestamp around that time and there's nothing in /var/log/messages which looks unusual to me. FSTAB is (as generated by YAST2): /dev/md1 / reiserfs defaults 1 1 /dev/hda1 /boot ext3 defaults 1 2 /dev/cdrom /media/cdrom auto ro,noauto,user,exec 0 0 devpts /dev/pts devpts defaults 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0 proc /proc proc defaults 0 0 usbdevfs /proc/bus/usb usbdevfs defaults,noauto 0 0 /dev/vg0/var /var reiserfs defaults 1 2 /dev/hda2 swap swap pri=42 /dev/vgo is a Logical Volume mounted on a pair of RAID 1 partitions /dev/md1 is mounted on a (different) pair of RAID 1 partitions Make any sense to anyone? TIA Chris Haynes
participants (1)
-
Chris Haynes