On Wed 20 June 2007 10:32, G T Smith wrote:
LLLActive@GMX.Net wrote:
On Wed 20 June 2007 00:56, Darryl Gregorash wrote:
On 06/19/2007 03:11 PM, LLLActive@GMX.Net wrote:
Hi all,
I must be quick, before the system locks down again.
I recalled this problem was reported last month -- thread "File system becomes read-only", to which you contributed. No solution was posted to the list, no one mentioned anything about a bug report, and I cannot find any bug report summaries that look even remotely close to the problem.
Yes, you're right. I have no solution yet, and the problem occurs on three very different systems. I mentioned them in that theread as well. It is becoming alarming now.
<snip>
You'll need to give us a lot more information about your system hardware (including the modules that are loaded for hard drive i/o),.
Here some HW details: 1. A 5 year old system that had win2K on it for 3 years and since then SuSE 9.3 and all the following. Here the problem occur with OpenSUSE 10.2. It has an ASUS mobo with ATI Radion graphic card. 2. A 2 year old system only had SuSE 10.0 that had the problem and now has WinXP without any problems. It's a Gigabyte mobo with ATI Radion graphic card. I noticed that intensive file access by Evolution caused a systrem lockup many times. 3. The latest 1 year old system showed the problem mainly with SUSE 10.0 and now with OpenSUSE 10.2. An identical system has SuSE 10.1, where the problem has till now not occured. It is a Gigabyte GA-K8N-SLi mobo with nVidia GeForce 7600 GS.
The difference between the lockups of the SUSE 10.0 and OpenSUSE 10.2 is that with 10.0 it did not allow any access to the system at all; a complete lockdown - dead - only reset got it unlocked. The OpenSUSE 10.2 reports RO FS problems by all applications. The system can be rebooted or shut down normally.
plus
information from /var/log/messages about what is happening when the filesystem goes RO
I noticed on two different systems this sort of messages in /var/logs/messages:
Jun 2 22:15:03 kakalapap kernel: hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error } Jun 2 22:15:03 kakalapap kernel: hda: task_no_data_intr: error=0x04 { DriveStatusError } Jun 2 22:15:03 kakalapap kernel: ide: failed opcode was: 0xef
For now, I need stability. I will follow Carl's suggestion and get all partitions onto one FS type. I'm a little apprehensive to use ext3 or XFS. Need I be? Patrick seems happy with ext3. Not sure what Carl uses.
I have had an on and off experience of this particular issue...
There seem to be three common elements for me...
i) Before the problem kicks in one started experiencing network connections going into CLOSE_WAIT states...
ii) famd starts becoming a CPU hog. Files start being reported as been use when not.
iii) commands and applications start reporting segment faults.
The last two are detectable if one is lucky enough to have an active terminal session available. These are probably symptoms of something at a lower level doing something it should not, but because of the I/O lockdown it is nearly impossible to monitor system status to identify what is at fault.
Oddly these conditions were most recently associated with a situation with a couple of server end IMAP mail folders hitting about 7000 messages (laziness on my part). Thunderbird started timing out. Restarting IMAP server clearing problem for a short while but after two or three restarts system became unstable with above symptoms. Even after full reboot was only a matter of time before whole thing locked up.
Since keeping folder sizes under control (i.e. under 7000 or so) this has not happened (looking for forest to touch :-)), which also proves nothing BTW (except possibly it is a failure of something to recover
from an error condition).
The problem first occurred soon after upgrading to 10.2 and seemed to be initially associated with a dud tape drive. After disconnecting tape drive I got stability for about 2-3 weeks before problem returned.
Some other things suggested to me that it may have not been the tape drive initially at fault, and at some point I intend to conduct some tests on the drive to determine whether it is really faulty (I dont have a suitable test config at moment).
The problem occurs only on a dual opteron box with 64bit OS, not on my 32bit machine. There is no SCSI on box...
With me it is both x86_64 AMD and a normal AMD Duron XP2000
BTW This is a rieserFS only box.
Wow, it is all a bit much, no commonality, BUT as you mention the IMAP and with me also a number of POP accounts have a lot of mails (also this list btw) in it with around 6K-7K mails in total. It seemed to be linked to both Evolution and Kontact. I actually switched to Kontact because I suspected Evolution's activity to cause the problem. I will close down both and see what hapens. My problem is that I work a lot with Gimp and large *.JPG & *.PNG files to edit (100 MB), and it seemed to also cause the problem once I loaded a file to edit. Then Gimp immediately reports it cannot save temp files and will not budge an inch further ... In a terminal window I notice the system jumps to /var/logs and a tail -n30 /var/logs/messages report a RO File System. Closing the tail, a message reporting the ro fs problem comes constantly to the command prompt. Is the problem associated with file system access, e.g. many or large files which causes high activity? Luckely my Servers are on SLES 10 and SLES 9, one DRBD cluster running (don't laugh) on two SATA RAID 5 Novell SUSE 10.0 flawlessly since 2005.01.01. What else than reducing disk access can be done? :-( Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org