Hardware: 2 PERC 4 DC controllers (LSI) and are running the megaraid2 driver v2.00.8 (Release Date: Wed Aug 27 18:50:49 EDT 2003) Software: It would be really helpful if you can provide kernel messages. You can hook up a null modem cable to the serial port and collect all messages.
Doing that tonight. We'll have to reboot to enable the output to ttyS0, so that will probably preclude any sort of hang for another 3-5 days. Any idea when the latest megaraid2 driver will be "certified" by SuSE? We've been tempted a couple times to install it, but we are kind of between a rock and a hard place. Installing it means we've moved away from the official "unbreakable" Oracle config. These machines (identical hardware) have a history of random hangs. We've slowly been eliminating various factors. Putting in Intel Pro 100 cards cured our hangs under heavy network load. Now we are dying under heavy disk IO, but only after several days. The disk IO hangs have been of two kinds. One where an entire mounted file system hung completely, if you do an ls, your process hangs and cannot be killed. The other, some processes still work on the file system, but some simple processes like cp and mv are hung and cannot be killed. Our file systems are 146 gig reiser made up of 3 logical volumes controlled by LVM. The three logical volumes are actually RAID arrays made up of 1/3 of a mirrored pair of 146 drives. Sounds strange to split up a drive into three pieces and then put it back together, but the Dell techs said this would let us get enough IO queues per drive to overcome the linux hardware raid bottleneck. I'll post again on the next hang. Thanks Andy