[opensuse] Filesystem becomes read-only
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server. Both have experienced a strange and severe problem since I installed Suse. Basically the file system is put into a read-only state. While the OS and all apps are still running and I can log in a view files/logs etc. nothing can write to the disks. Obviously this causes all kinds of problems. The only way to get the filesystem into a more interactive state is a hard reboot. The first two times this occurred, there were messages in the /var/log/messages file relating to megasas, however as soon as the system was restarted, these warnings disappeared. I believe the file was not actually written to the disk when the system freaked out so once logging resumed those entries were lost. The messages are somewhat similar to these that I found via a Google search: sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: [ 0]waiting for 12 commands to complete megasas: [ 5]waiting for 12 commands to complete megasas: [10]waiting for 12 commands to complete megasas: [15]waiting for 12 commands to complete ... megasas: [170]waiting for 12 commands to complete megasas: [175]waiting for 12 commands to complete megasas: failed to do reset sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: cannot recover from previous reset failures sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: cannot recover from previous reset failures sd 1:2:0:0: scsi: Device offlined - not ready after error recovery Saturday the problem happened on the Dell server, although this time there is no mention of megasas in the logs files. A search of this list did not find any reference to megasas and all too many unrelated results for a read-only filesystem :). System Specs: Dell PowerEdge 2970 Opteron 2.0GHz 64-bit 4 GB memory 3x 73GB Hard Drive (RAID 5) PERC 5i "No-name server" Dual P3 1GHz 1GB memory Dual 36GB hard drives (no RAID) I appreciate any help. Let me know if more information is needed. Thanks Tim Donnelly Systems/Network Administrator Colorado Alliance of Research Libraries (303)759-3399 x106 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
* Tim Donnelly <tim@coalliance.org> [05-29-07 12:26]:
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server. Both have experienced a strange and severe problem since I installed Suse.
Basically the file system is put into a read-only state. While the OS and all apps are still running and I can log in a view files/logs etc. nothing can write to the disks. Obviously this causes all kinds of problems. The only way to get the filesystem into a more interactive state is a hard reboot.
The first two times this occurred, there were messages in the /var/log/messages file relating to megasas, however as soon as the system was restarted, these warnings disappeared. I believe the file was not actually written to the disk when the system freaked out so once logging resumed those entries were lost. The messages are somewhat similar to these that I found via a Google search:
sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: [ 0]waiting for 12 commands to complete megasas: [ 5]waiting for 12 commands to complete megasas: [10]waiting for 12 commands to complete megasas: [15]waiting for 12 commands to complete ... megasas: [170]waiting for 12 commands to complete megasas: [175]waiting for 12 commands to complete megasas: failed to do reset sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: cannot recover from previous reset failures sd 1:2:0:0: megasas: RESET -286287 cmd=8a megasas: cannot recover from previous reset failures sd 1:2:0:0: scsi: Device offlined - not ready after error recovery
see the 2nd hit searching google for megasas: http://lists.us.dell.com/pipermail/linux-poweredge/2007-March/029992.html
Tim Donnelly Systems/Network Administrator Colorado Alliance of Research Libraries (303)759-3399 x106
I would think that the *first* thing a "systems/network administrator" would do is a quick google search :^) -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 OpenSUSE Linux http://en.opensuse.org/ Registered Linux User #207535 @ http://counter.li.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, 2007-05-29 at 13:23 -0400, Patrick Shanahan wrote:
see the 2nd hit searching google for megasas:
http://lists.us.dell.com/pipermail/linux-poweredge/2007-March/029992.html
In this mail it mentions: echo 120 > /sys/block/sda/device/timeout It seems to have stabilised the system. I noticed recently that my USB devices like my mobile phone mount as SCSI device sda: Jul 11 23:27:53 dodo kernel: usb 6-2: new full speed USB device using uhci_hcd and address 6 Jul 11 23:27:54 dodo kernel: usb 6-2: new device found, idVendor=0fce, idProduct=d016 Jul 11 23:27:54 dodo kernel: usb 6-2: new device strings: Mfr=1, Product=2, SerialNumber=3 Jul 11 23:27:54 dodo kernel: usb 6-2: Product: Sony Ericsson D750 Jul 11 23:27:54 dodo kernel: usb 6-2: Manufacturer: Sony Ericsson Jul 11 23:27:54 dodo kernel: usb 6-2: SerialNumber: 9849819488923838_0 Jul 11 23:27:54 dodo kernel: usb 6-2: configuration #1 chosen from 1 choice Jul 11 23:27:54 dodo kernel: cdc_acm 6-2:1.1: ttyACM0: USB ACM device Jul 11 23:27:54 dodo kernel: cdc_acm 6-2:1.3: ttyACM1: USB ACM device Jul 11 23:27:54 dodo kernel: scsi4 : SCSI emulation for USB Mass Storage devices Jul 11 23:27:54 dodo kernel: usb-storage: device found at 6 Jul 11 23:27:54 dodo kernel: usb-storage: waiting for device to settle before scanning Jul 11 23:27:55 dodo kernel: Vendor: Sony Eri Model: Memory Stick Rev: 0000 Jul 11 23:27:55 dodo kernel: Type: Direct-Access ANSI SCSI revision: 00 Jul 11 23:27:55 dodo kernel: SCSI device sda: 1951744 512-byte hdwr ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ sectors (999 MB) Jul 11 23:27:55 dodo kernel: sda: Write Protect is off Jul 11 23:27:55 dodo kernel: sda: Mode Sense: 00 6a 00 00 Jul 11 23:27:55 dodo kernel: sda: assuming drive cache: write through Jul 11 23:27:55 dodo kernel: SCSI device sda: 1951744 512-byte hdwr sectors (999 MB) Jul 11 23:27:55 dodo kernel: sda: Write Protect is off Jul 11 23:27:55 dodo kernel: sda: Mode Sense: 00 6a 00 00 Jul 11 23:27:55 dodo kernel: sda: assuming drive cache: write through Jul 11 23:27:55 dodo kernel: sda: sda1 Jul 11 23:27:55 dodo kernel: sd 4:0:0:0: Attached scsi removable disk sda Jul 11 23:27:55 dodo kernel: sd 4:0:0:0: Attached scsi generic sg0 type 0 Jul 11 23:27:55 dodo kernel: usb-storage: device scan complete What is the best way to set the timeout to 120 permanently at boot-up as is done with echo 120 > /sys/block/sda/device/timeout manually? :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
* LLLActive@GMX.Net <LLLActive@GMX.Net> [07-11-07 17:41]:
What is the best way to set the timeout to 120 permanently at boot-up as is done with echo 120 > /sys/block/sda/device/timeout manually?
??? add the line "echo 120 > /sys/block/sda/device/timeout" to /etc/init.d/boot.local ??? -- Patrick Shanahan Plainfield, Indiana, USA HOG # US1244711 http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://counter.li.org -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 11 July 2007 21:12, Patrick Shanahan wrote:
* LLLActive@GMX.Net <LLLActive@GMX.Net> [07-11-07 17:41]:
What is the best way to set the timeout to 120 permanently at boot-up as is done with echo 120 > /sys/block/sda/device/timeout manually?
??? add the line "echo 120 > /sys/block/sda/device/timeout" to /etc/init.d/boot.local ???
The problem that I have with this line of logic is that I have no "sda" in /sys/block until AFTER I plug in my flash drive, therefore adding a line in boot.local that references "sda" won't do anything except generate an error message (or nothing at all) if there is no flash drive present at boot, which is the case here. I would think that the same process that detects the flash drive and adds the "sda" directory should also add the proper timeout value. Fred -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2007-07-11 at 22:09 -0500, Stevens wrote:
On Wednesday 11 July 2007 21:12, Patrick Shanahan wrote:
* LLLActive@GMX.Net <LLLActive@GMX.Net> [07-11-07 17:41]:
What is the best way to set the timeout to 120 permanently at boot-up as is done with echo 120 > /sys/block/sda/device/timeout manually?
??? add the line "echo 120 > /sys/block/sda/device/timeout" to /etc/init.d/boot.local ???
The problem that I have with this line of logic is that I have no "sda" in /sys/block until AFTER I plug in my flash drive, therefore adding a line in boot.local that references "sda" won't do anything except generate an error message (or nothing at all) if there is no flash drive present at boot, which is the case here.
I would think that the same process that detects the flash drive and adds the "sda" directory should also add the proper timeout value.
Fred
Makes sense. Could you help out with which process could be used? I know that USB at plugin calls klauncher. How can klauncher's start be enhanced to add the timeout value. Is there a process that starts before klauncher that detects USB? :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 2007-07-12 at 09:05 +0200, LLLActive@GMX.Net wrote:
On Wed, 2007-07-11 at 22:09 -0500, Stevens wrote:
On Wednesday 11 July 2007 21:12, Patrick Shanahan wrote:
* LLLActive@GMX.Net <LLLActive@GMX.Net> [07-11-07 17:41]:
What is the best way to set the timeout to 120 permanently at boot-up as is done with echo 120 > /sys/block/sda/device/timeout manually?
??? add the line "echo 120 > /sys/block/sda/device/timeout" to /etc/init.d/boot.local ???
The problem that I have with this line of logic is that I have no "sda" in /sys/block until AFTER I plug in my flash drive, therefore adding a line in boot.local that references "sda" won't do anything except generate an error message (or nothing at all) if there is no flash drive present at boot, which is the case here.
I would think that the same process that detects the flash drive and adds the "sda" directory should also add the proper timeout value.
Fred
Makes sense. Could you help out with which process could be used? I know that USB at plugin calls klauncher. How can klauncher's start be enhanced to add the timeout value. Is there a process that starts before klauncher that detects USB?
:-) Al
I have fortunately got a USB mouse at bootup time. Then Patrick's suggestion does seem to work for me now, perhaps because the SCSI part is loaded then. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BUT, alas, the one newer system froze up again a few times this week. I could not find any evidence or a RO FS, but KDE was completely locked. The /sys/block/sda/device/timeout is 120, but the problem is with hdb mounted as a DVD Burner with a burnt DVD ISO from OpenSUSE (10.2 x86_64). I logged in per ssh on the frozen system and had a look at the system messages, disk space and mount points: ######################################## SAW1:~ #tail -f -n50 /var/logs/messages ... Jul 19 08:27:01 SAW1 kernel: hdb: packet command error: status=0x51 { DriveReady SeekComplete Error } Jul 19 08:27:01 SAW1 kernel: hdb: packet command error: error=0x54 { AbortedCommand LastFailedSense=0x05 } Jul 19 08:27:01 SAW1 kernel: ide: failed opcode was: unknown Jul 19 08:27:01 SAW1 kernel: ATAPI device hdb: Jul 19 08:27:01 SAW1 kernel: Error: Illegal request -- (Sense key=0x05) Jul 19 08:27:01 SAW1 kernel: Cannot read medium - incompatible format -- (asc=0x30, ascq=0x02) Jul 19 08:27:01 SAW1 kernel: The failed "Read Subchannel" packet command was: Jul 19 08:27:01 SAW1 kernel: "42 02 40 01 00 00 00 00 10 00 00 00 00 00 00 00 " .... Jul 19 08:40:13 SAW1 sshd[28885]: Accepted keyboard-interactive/pam for root from 192.178.111.75 port 15326 ssh2 ... SAW1:~ # mount /dev/sdc5 on / type ext3 (rw,acl,user_xattr) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) /dev/sdb3 on /home type reiserfs (rw) /dev/sdc6 on /home/rls/Data/Data1 type reiserfs (rw) /dev/sdb4 on /home/rls/Data/Data2 type reiserfs (rw) /dev/sda4 on /home/rls/Data/Data3 type reiserfs (rw) /dev/sda3 on /home/rls/Old-OSs/SL10.1 type xfs (rw) securityfs on /sys/kernel/security type securityfs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/hdb on /media/SU1020.001 type iso9660 (ro,nosuid,nodev,noatime,uid=1000,utf8) SAW1:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/sdc5 15G 9.9G 4.2G 71% / udev 1.5G 156K 1.5G 1% /dev /dev/sdb3 31G 20G 11G 67% /home /dev/sdc6 264G 149G 115G 57% /home/rls/Data/Data1 /dev/sdb4 202G 135G 68G 67% /home/rls/Data/Data2 /dev/sda4 211G 153G 58G 73% /home/rls/Data/Data3 /dev/sda3 20G 16G 4.1G 80% /home/rls/Old-OSs/SL10.1 /dev/hdb 3.7G 3.7G 0 100% /media/SU1020.001 SAW1:~ # ####################################### The DVD is the current openSUSE 10.2 GM DVD x86_64 burnt ISO, in /dev/hdb (ATAPI device). Is it a DVD disk error or a DVD Drive error? It is the original installation DVD I used to install 10.2 on this machine. Why does it lock down KDE? I did not access the DVD at all; I wanted to save an OpenOffice 2.2.1 (Build 2.2.0.2) Writer file to HD (/dev/sdc6 - /home/rls/Data/Data1). It also happened with OOo-Writer version 2.2.0 yesterday. Any ideas? :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tue, 2007-05-29 at 10:24 -0600, Tim Donnelly wrote:
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server.
I have the same problem with two machines, one a dual core x86_64 and a AMD XP2000 ... self built together and working 2 and 5 years respectively.
Both have experienced a strange and severe problem since I installed Suse.
i had it with SuSE 9.3, 10.0, 10.1 (Boxed Novell purchased version) and now with OpenSUSE 10.2
Basically the file system is put into a read-only state.
This I noticed only now with OpenSUSE 10.2, the others just locked up solid.
While the OS and all apps are still running and I can log in a view files/logs etc. nothing can write to the disks.
I could only log in ssh from another machine.
Obviously this causes all kinds of problems. The only way to get the filesystem into a more interactive state is a hard reboot.
Same here
The first two times this occurred, there were messages in the /var/log/messages file relating to megasas,
I did not see it at all. The x86_64 system has SATA, and the other IDE
however as soon as the system was restarted, these warnings disappeared.
Until some days later perhaps. I have excessive problems with some versions of Evolution and the newest OpenOffice 2.2 running high disk access, then no access to the system is possible; the disks are in constant access (LED permanently on). Just the M$ reset trick works. :-( Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 LLLActive@GMX.Net wrote:
On Tue, 2007-05-29 at 10:24 -0600, Tim Donnelly wrote:
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server.
I have the same problem with two machines, one a dual core x86_64 and a AMD XP2000 ... self built together and working 2 and 5 years respectively.
Both have experienced a strange and severe problem since I installed Suse.
i had it with SuSE 9.3, 10.0, 10.1 (Boxed Novell purchased version) and now with OpenSUSE 10.2
Basically the file system is put into a read-only state.
This I noticed only now with OpenSUSE 10.2, the others just locked up solid.
While the OS and all apps are still running and I can log in a view files/logs etc. nothing can write to the disks.
I could only log in ssh from another machine.
Obviously this causes all kinds of problems. The only way to get the filesystem into a more interactive state is a hard reboot.
Same here
The first two times this occurred, there were messages in the /var/log/messages file relating to megasas,
I did not see it at all. The x86_64 system has SATA, and the other IDE
however as soon as the system was restarted, these warnings disappeared.
Until some days later perhaps. I have excessive problems with some versions of Evolution and the newest OpenOffice 2.2 running high disk access, then no access to the system is possible; the disks are in constant access (LED permanently on). Just the M$ reset trick works.
:-( Al
I have been experiencing something similar, the last occurrence involved FAM writing several hundred messages complaining about to many files being open. On a previous occasion I noted fam at 100% utilisation just before everything locked up (no log message ... the log was then read only). I suspect this is a symptom not a fault. In my case there seem to indications of a memory leak of some sort but I am not in position to pin it down anything in particular yet. What I do not know is whether if the kernel detects an internal problem it sets the file system to read only to avoid possible file corruption. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFGXVlEasN0sSnLmgIRAm9vAJ9yp7jDnVspScr6zQJY1NQtAtBy0ACdGO4J Hzn0WiUxOyFa3fYbZpIpUWc= =K8qH -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wed, 2007-05-30 at 10:43 +0200, LLLActive@GMX.Net wrote:
On Tue, 2007-05-29 at 10:24 -0600, Tim Donnelly wrote:
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server.
I have the same problem with two machines, one a dual core x86_64 and a AMD XP2000 ... self built together and working 2 and 5 years respectively.
Both have experienced a strange and severe problem since I installed Suse.
i had it with SuSE 9.3, 10.0, 10.1 (Boxed Novell purchased version) and now with OpenSUSE 10.2
Basically the file system is put into a read-only state.
This I noticed only now with OpenSUSE 10.2, the others just locked up solid.
I wonder if you have considered that the problem is hardware related and the timing of installing a new OS is coincidental? I say that from my own experience wherein a Dell server box was poorly ventilated and on really hot days (no airconditioning) the server filesystem would suddenly become either 'read-only' or I would have a complete hard freeze that required cycling the power button. Eventually I saw the connection with box temperature and the bad behaviour. Turned out that soon after the scsi drive in that box terminally failed so the software issues were just an early warning sign. My problems didn't end there, because after replacing the scsi drive the LSI controller card soon failed and refused to recognise/initialise the new drive. Moral of the story, you could have bad drive(s), controllers or ram that could cause the symptoms you see, and not software at all. Gavin -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 2007-05-31 at 04:42 +0800, Gavin Chester wrote:
On Wed, 2007-05-30 at 10:43 +0200, LLLActive@GMX.Net wrote:
On Tue, 2007-05-29 at 10:24 -0600, Tim Donnelly wrote:
I am using Suse 10.2 on two different machines. One is a new Dell Poweredge 2970, the other is an older (2002) "no-name" server.
I have the same problem with two machines, one a dual core x86_64 and a AMD XP2000 ... self built together and working 2 and 5 years respectively.
Both have experienced a strange and severe problem since I installed Suse.
i had it with SuSE 9.3, 10.0, 10.1 (Boxed Novell purchased version) and now with OpenSUSE 10.2
Basically the file system is put into a read-only state.
This I noticed only now with OpenSUSE 10.2, the others just locked up solid.
I wonder if you have considered that the problem is hardware related and the timing of installing a new OS is coincidental? I say that from my own experience wherein a Dell server box was poorly ventilated and on really hot days (no airconditioning) the server filesystem would suddenly become either 'read-only' or I would have a complete hard freeze that required cycling the power button. Eventually I saw the connection with box temperature and the bad behaviour. Turned out that soon after the scsi drive in that box terminally failed so the software issues were just an early warning sign. My problems didn't end there, because after replacing the scsi drive the LSI controller card soon failed and refused to recognise/initialise the new drive. Moral of the story, you could have bad drive(s), controllers or ram that could cause the symptoms you see, and not software at all.
Gavin
Hi Gavin, it has been happening on three completely different systems. 1. A 5 year old system that had win2K on it for 3 years and since then SuSE 9.3 and all the following. Here the problem occur with OpenSUSE 10.2. It has an ASUS mobo with ATI Radion graphic card. 2. A 2 year old system only had SuSE 10.0 that had the problem and now has WinXP without any problems. It's a Gigabyte mobo with ATI Radion graphic card. I noticed that intensive file access by Evolution caused a systrem lockup many times. 3. The latest 1 year old system showed the problem mainly with SUSE 10.0 and now with OpenSUSE 10.2. An identical system has SuSE 10.1, where the problem has till now not occured. It is a Gigabyte GA-K8N-SLi mobo with nVidia GeForce 7600 GS. I can't say for sure, but SUSE 10.1 Boxed Novell purchased version seemed to be the most stable. It makes no sense, because the OpenSUSE 10.1 should be identical; or am I mistaken? I rule out HW however. :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Thu, 2007-05-31 at 10:05 +0200, LLLActive@GMX.Net wrote:
On Thu, 2007-05-31 at 04:42 +0800, Gavin Chester wrote:
On Wed, 2007-05-30 at 10:43 +0200, LLLActive@GMX.Net wrote:
On Tue, 2007-05-29 at 10:24 -0600, Tim Donnelly wrote:
-snip-
I can't say for sure, but SUSE 10.1 Boxed Novell purchased version seemed to be the most stable. It makes no sense, because the OpenSUSE 10.1 should be identical; or am I mistaken? I rule out HW however.
:-) Al
It is indeed very curious, but I thought no harm in sharing my own experience in case it shed some light in a direction not thought about before. Seems very unlikely in your case :-) Gavin. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (6)
-
G T Smith
-
Gavin Chester
-
LLLActive@GMX.Net
-
Patrick Shanahan
-
Stevens
-
Tim Donnelly