Re: [opensuse] Help! RO File System Lock down OpenSUSE 10.2

20 Jun 2007

      On Wed 20 June 2007 10:32, G T Smith wrote:
...
LLLActive@GMX.Net wrote:
...
On Wed 20 June 2007 00:56, Darryl Gregorash wrote:
...
On 06/19/2007 03:11 PM, LLLActive@GMX.Net wrote:
...
Hi all,
I must be quick, before the system locks down again.
I recalled this problem was reported last month -- thread "File system
becomes read-only", to which you contributed. No solution was posted to
the list, no one mentioned anything about a bug report, and I cannot
find any bug report summaries that look even remotely close to the
problem.
Yes, you're right. I have no solution yet, and the problem occurs on
three very different systems. I mentioned them in that theread as well.
It is becoming alarming now.
<snip>
...
...
You'll need to give us a lot more information about your system hardware
(including the modules that are loaded for hard drive i/o),.
Here some HW details:
1. A 5 year old system that had win2K on it for 3 years and since then
SuSE 9.3 and all the following. Here the problem occur with OpenSUSE
10.2. It has an ASUS mobo with ATI Radion graphic card.
2. A  2 year old system only had SuSE 10.0 that had the problem and now
has WinXP without any problems. It's a Gigabyte mobo with ATI Radion
graphic card. I noticed that intensive file access by Evolution caused a
systrem lockup many times.
3. The latest 1 year old system showed the problem mainly with SUSE 10.0
and now with OpenSUSE 10.2. An identical system has SuSE 10.1, where the
problem has till now not occured. It is a Gigabyte GA-K8N-SLi mobo with
nVidia GeForce 7600 GS.
The difference between the lockups of the SUSE 10.0 and OpenSUSE 10.2 is
that with 10.0 it did not allow any access to the system at all; a
complete lockdown - dead - only reset got it unlocked. The OpenSUSE 10.2
reports RO FS problems by all applications. The system can be rebooted or
shut down normally.
plus
...
information from /var/log/messages about what is happening when the
filesystem goes RO
I noticed on two different systems this sort of messages
in /var/logs/messages:
Jun  2 22:15:03 kakalapap kernel: hda: task_no_data_intr: status=0x51
{ DriveReady SeekComplete Error }
Jun  2 22:15:03 kakalapap kernel: hda: task_no_data_intr: error=0x04
{ DriveStatusError }
Jun  2 22:15:03 kakalapap kernel: ide: failed opcode was: 0xef
For now, I need stability. I will follow Carl's suggestion and get all
partitions onto one FS type. I'm a little apprehensive to use ext3 or
XFS. Need I be? Patrick seems happy with ext3. Not sure what Carl uses.
I have had an on and off experience of this particular issue...
There seem to be three common elements for me...
i) Before the problem kicks in one started experiencing network
connections going into CLOSE_WAIT states...
ii) famd starts becoming a CPU hog. Files start being reported as been
use when not.
iii) commands and applications start reporting segment faults.
The last two are detectable if one is lucky enough to have an active
terminal session available. These are probably symptoms of something at
a lower level doing something it should not, but because of the I/O
lockdown it is nearly impossible to monitor system status to identify
what is at fault.
Oddly these conditions were most recently associated with a situation
with a couple of server end IMAP mail folders hitting about 7000
messages (laziness on my part). Thunderbird started timing out.
Restarting IMAP server clearing problem for a short while but after two
or three restarts system became unstable with above symptoms. Even after
full reboot was only a matter of time before whole thing locked up.
Since keeping folder sizes under control (i.e. under 7000 or so) this
has not happened (looking for forest to touch :-)), which also proves
nothing BTW (except possibly it is a failure of something to recover
...
from an error condition).
The problem first occurred soon after upgrading to 10.2 and seemed to be
initially associated with a dud tape drive. After disconnecting tape
drive I got stability for about 2-3 weeks before problem returned.
Some other things suggested to me that it may have not been the tape
drive initially at fault, and at some point I intend to conduct some
tests on the drive to determine whether it is really faulty (I dont have
a suitable test config at moment).
The problem occurs only on a dual opteron box with 64bit OS, not on my
32bit machine. There is no SCSI on box...
With me it is both x86_64 AMD and a normal AMD Duron XP2000
...
BTW This is a rieserFS only box.
Wow, it is all a bit much, no commonality, BUT as you mention the IMAP and 
with me also a number of POP accounts have a lot of mails (also this list 
btw) in it with around 6K-7K mails in total. It seemed to be linked to both 
Evolution and Kontact. I actually switched to Kontact because I suspected 
Evolution's activity to cause the problem. I will close down both and see 
what hapens. 

My problem is that I work a lot with Gimp and large *.JPG & *.PNG files to 
edit (100 MB), and it seemed to also cause the problem once I loaded a file 
to edit. Then Gimp immediately reports it cannot save temp files and will not 
budge an inch further ... In a terminal window I notice the system jumps 
to /var/logs and a tail -n30 /var/logs/messages report a RO File System. 
Closing the tail, a message reporting the ro fs problem comes constantly to 
the command prompt.

Is the problem associated with file system access, e.g. many or large files 
which causes high activity? Luckely my Servers are on SLES 10 and SLES 9, one 
DRBD cluster running (don't laugh) on two SATA RAID 5 Novell SUSE 10.0 
flawlessly since 2005.01.01.

What else than reducing disk access can be done?

:-(
Al
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org

Re: [opensuse] Help! RO File System Lock down OpenSUSE 10.2

LLLActive＠GMX.Net