On Wednesday 20 June 2007 00:56:00 Darryl Gregorash wrote:
You'll need to give us a lot more information about your system hardware (including the modules that are loaded for hard drive i/o), plus information from /var/log/messages about what is happening when the filesystem goes RO.
OK. I will give as much as I can. The mail is therefore a bit long ... I have solved the problem partly by keeping to one FS per drive, as suggested by Carl Hartung. Thanx Carl. On Tuesday 19 June 2007 23:47:43 Carl Hartung wrote:
On Tue June 19 2007 17:11, LLLActive@GMX.Net wrote: <snip>
... Can using different FS's in one system cause such problems?
Theoretically, no, but in actual fact there are circumstances where conflicts *can* arise.
In my case... with this specific chipset and corresponding kernel IDE controller module... cache buffering is enabled or disabled on a per drive basis. Running disparate filesystem types in adjacent partitions on the same drive (i.e. reiserfs + ext3) triggered errors comparable to those you're experiencing now.
I ultimately coaxed those errors away permanently by standardizing my installations to using only one journaling filesystem type per drive.
The system is much more stable. ################################ Last night. however, it happened again !! I put my mobile phone on the USB port. I left the mobile phone on the USB on to charge the batteries, thinking nothing of it. Only when I did some access to it the files disappeared after the listing. The USB was detached automatically from the USB HUB. Again thinking nothing of it, I attached it directly to a USB port om the MOBO. I then wanted to install from a dvd mounted as /dev/hdd, and did a lot of disk access, the system went RO FS again. I went to bed .... On Wednesday 20 June 2007 00:56:00 Darryl Gregorash wrote:
I tend to doubt that the specific filesystem(s) in use have anything at all to do with this, but the high disk access probably does. There is a thread on Dell about problems with the MegaRAID sas driver (module name megasas) -- http://lists.us.dell.com/pipermail/linux-poweredge/2007-March/029974.html -- but you have not given enough information for anyone to know if this is relevant to your problem. Grep /var/log/messages for "megasas".
sudo more /var/log/messages | grep "megasys" reports nothing Looking at the logs again afterwards this morning, I noticed these SCSI part /dev/sda1. sico@sico:~> sudo more /var/log/messages | grep "sda" Jun 30 22:43:28 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:43:28 sico kernel: sda: Write Protect is off Jun 30 22:43:28 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:43:28 sico kernel: sda: assuming drive cache: write through Jun 30 22:43:28 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:43:28 sico kernel: sda: Write Protect is off Jun 30 22:43:28 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:43:28 sico kernel: sda: assuming drive cache: write through Jun 30 22:43:28 sico kernel: sda: sda1 Jun 30 22:43:28 sico kernel: sd 0:0:0:0: Attached scsi removable disk sda Jun 30 22:43:30 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:47:57 sico kernel: sda: Current: sense key: No Sense ... (repeated many times) ... Jun 30 22:47:58 sico kernel: end_request: I/O error, dev sda, sector 14464 Jun 30 22:47:59 sico hald: unmounted /dev/sda1 from '/media/disk' on behalf of uid 0 Jun 30 22:47:59 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:47:59 sico kernel: sda: Write Protect is off Jun 30 22:47:59 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:47:59 sico kernel: sda: assuming drive cache: write through Jun 30 22:47:59 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:47:59 sico kernel: sda: Write Protect is off Jun 30 22:47:59 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:47:59 sico kernel: sda: assuming drive cache: write through Jun 30 22:47:59 sico kernel: sda: sda1 Jun 30 22:47:59 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:47:59 sico kernel: sda: Write Protect is off Jun 30 22:47:59 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:47:59 sico kernel: sda: assuming drive cache: write through Jun 30 22:47:59 sico kernel: sda: sda1 Jun 30 22:48:01 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:48:26 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:27 sico kernel: end_request: I/O error, dev sda, sector 9152 Jun 30 22:48:27 sico kernel: end_request: I/O error, dev sda, sector 9152 ... (repeated many times) ... Jun 30 22:48:27 sico kernel: end_request: I/O error, dev sda, sector 19328 Jun 30 22:48:27 sico hald: unmounted /dev/sda1 from '/media/disk' on behalf of uid 0 Jun 30 22:48:29 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:29 sico kernel: sda: Write Protect is off Jun 30 22:48:29 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:29 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:29 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:29 sico kernel: sda: Write Protect is off Jun 30 22:48:29 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:29 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:29 sico kernel: sda: sda1 Jun 30 22:48:31 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:48:37 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:37 sico kernel: end_request: I/O error, dev sda, sector 40320 ... (repeated many times) ... Jun 30 22:48:38 sico kernel: end_request: I/O error, dev sda, sector 46144 Jun 30 22:48:38 sico hald: unmounted /dev/sda1 from '/media/disk' on behalf of uid 0 Jun 30 22:48:40 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:40 sico kernel: sda: Write Protect is off Jun 30 22:48:40 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:40 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:40 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:40 sico kernel: sda: Write Protect is off Jun 30 22:48:40 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:40 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:40 sico kernel: sda: sda1 Jun 30 22:48:41 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:48:45 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:45 sico kernel: end_request: I/O error, dev sda, sector 41280 Jun 30 22:48:45 sico kernel: end_request: I/O error, dev sda, sector 41280 ... (repeated many times) ... Jun 30 22:48:46 sico kernel: end_request: I/O error, dev sda, sector 73344 Jun 30 22:48:47 sico hald: unmounted /dev/sda1 from '/media/disk' on behalf of uid 0 Jun 30 22:48:47 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:47 sico kernel: sda: Write Protect is off Jun 30 22:48:47 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:47 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:47 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:47 sico kernel: sda: Write Protect is off Jun 30 22:48:47 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:47 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:47 sico kernel: sda: sda1 Jun 30 22:48:48 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:48:50 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:50 sico kernel: end_request: I/O error, dev sda, sector 1985 Jun 30 22:48:50 sico kernel: end_request: I/O error, dev sda, sector 41088 ... (repeated many times) ... Jun 30 22:48:51 sico kernel: end_request: I/O error, dev sda, sector 50624 Jun 30 22:48:51 sico hald: unmounted /dev/sda1 from '/media/disk' on behalf of uid 0 Jun 30 22:48:53 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:53 sico kernel: sda: Write Protect is off Jun 30 22:48:53 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:53 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:53 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jun 30 22:48:53 sico kernel: sda: Write Protect is off Jun 30 22:48:53 sico kernel: sda: Mode Sense: 00 6a 00 00 Jun 30 22:48:53 sico kernel: sda: assuming drive cache: write through Jun 30 22:48:53 sico kernel: sda: sda1 Jun 30 22:48:54 sico hald: mounted /dev/sda1 on behalf of uid 1000 Jun 30 22:48:59 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:59 sico kernel: end_request: I/O error, dev sda, sector 41088 ... (repeated many times) ...
One writer in that thread (on Dell) writes "the problem is that the Linux kernel's SCSI layer insists on a single timeout for all SCSI requests, and doesn't tolerate high variances in command completion times. If any single command times out, it resets the whole bus, even if there is still significant activity." This suggests that the problem is more widespread than just a RAID issue. This is that writer's message -- http://lists.us.dell.com/pipermail/linux-poweredge/2007-March/029982.html -- and it contains a suggestion that may be of use to you.
I found the mail of Joe Malicki (http://lists.us.dell.com/pipermail/linux-poweredge/2007-March/029982.html) about this topic and changed the SCSI timeout: sico@sico:~> more /sys/block/sda/device/timeout 60 sico@sico:~> sudo echo 120 > /sys/block/sda/device/timeout bash: /sys/block/sda/device/timeout: Permission denied sico@sico:~> su - Password: sico:~ # echo 120 > /sys/block/sda/device/timeout Current state: ... Jul 1 14:47:35 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jul 1 14:47:35 sico kernel: sda: Write Protect is off Jul 1 14:47:35 sico kernel: sda: Mode Sense: 00 6a 00 00 Jul 1 14:47:35 sico kernel: sda: assuming drive cache: write through Jul 1 14:47:35 sico kernel: SCSI device sda: 3903488 512-byte hdwr sectors (1999 MB) Jul 1 14:47:35 sico kernel: sda: Write Protect is off Jul 1 14:47:35 sico kernel: sda: Mode Sense: 00 6a 00 00 Jul 1 14:47:35 sico kernel: sda: assuming drive cache: write through Jul 1 14:47:35 sico kernel: sda: sda1 Jul 1 14:47:51 sico hald: mounted /dev/sda1 on behalf of uid 1000 The lines: Jun 30 22:48:59 sico kernel: sda: Current: sense key: No Sense Jun 30 22:48:59 sico kernel: end_request: I/O error, dev sda, sector 41088 ... (repeated many times) ... do not seem to come anymore after some extensive disk access as before. ################################ I am not sure what to make of these RO comments in the last lines in messages. Can it be that it just reports that the DVD is RO?: Jul 1 18:18:29 sico sudo: sico : TTY=pts/1 ; PWD=/home/sico ; USER=root ; COMMAND=/bin/more /var/log/messages Jul 1 18:20:29 sico kernel: ISO 9660 Extensions: Microsoft Joliet Level 3 Jul 1 18:20:29 sico kernel: ISO 9660 Extensions: RRIP_1991A Jul 1 18:20:29 sico hald: mounted /dev/hdd on behalf of uid 1000 Jul 1 18:21:53 sico gconfd (sico-5635): GConf server is not in use, shutting down. Jul 1 18:21:53 sico gconfd (sico-5635): Exiting Jul 1 18:26:43 sico gconfd (sico-18750): starting (version 2.14.0), pid 18750 user 'sico' Jul 1 18:26:43 sico gconfd (sico-18750): Resolved address "xml:readonly:/etc/opt/gnome/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0 Jul 1 18:26:43 sico gconfd (sico-18750): Resolved address "xml:readwrite:/home/sico/.gconf" to a writable configuration source at position 1 Jul 1 18:26:43 sico gconfd (sico-18750): Resolved address "xml:readonly:/etc/opt/gnome/gconf/gconf.xml.defaults" to a read-only configuration source at position 2 Jul 1 18:26:43 sico gconfd (sico-18750): Resolved address "xml:readonly:/etc/opt/gnome/gconf/gconf.xml.schemas" to a read-only configuration source at position 3 Jul 1 18:27:13 sico gconfd (sico-18750): GConf server is not in use, shutting down. ################################ Is it normal for USB to use the SCSI layer? Can the SCSI layer be avoided? Can it be changed to IDE like /dev/hde? :-) Al -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org