[opensuse-kernel] FS again fell back to R/O

hi, just now again, the filesystem changed back to Read-Only on my 11.1 install. dmesg shows this output: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:14:68:01:15/00:00:09:00:00/40 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: hard resetting link ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata3.00: configured for UDMA/133 end_request: I/O error, dev sda, sector 1029928 ata3: EH complete sd 2:0:0:0: [sda] 488397168 512-byte hardware sectors: (250GB/232GiB) sd 2:0:0:0: [sda] Write Protect is off sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA REISERFS abort (device dm-0): Journal write error in flush_commit_list REISERFS warning (device dm-0): clm-6006 reiserfs_dirty_inode: writing inode 135862 on readonly FS The last line is being repeated now for many other inodes. Googling seems to point out that a lot of users (a lot also from Ubuntu Interpid) seem to have this issue on their machines, but maintly in combination with bigger drives (1.5TB)... I have a 2nd drive in that box of 1TB and an external USB of 1TB is also attached... just in case that makes any difference. Some users reported that downgrading their kernels (ubuntu forums.. sorry) helped them in this situation. So my question to you all might Kernel Hackers: Do you know of anything like this happening? Interesting links (at least they were for me:) http://lkml.org/lkml/2008/9/22/147 http://lkml.org/lkml/2008/9/29/41 If there is ANYTHING I can provide (even if we end up with a bugzilla entry as it might be a bug in the kernel) is fine with me. playing to much with different kernels is not that easy on this machine, as it's in production. Dominique -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org

On Mon, Mar 09, 2009 at 12:57:03PM +0100, Dominique Leuenberger wrote:
hi,
just now again, the filesystem changed back to Read-Only on my 11.1 install.
Please file a bug, but note that our reiserfs person is on vacation for a while, so it will be a bit before he can get to stuff like this. In the mean time, please try the Kernel-of-the-day for SLE11, it has some reiserfs fixes in it that missed the last 11.1 update kernel. thanks, greg k-h -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org

On 3/10/2009 at 4:41, Greg KH <gregkh@suse.de> wrote: On Mon, Mar 09, 2009 at 12:57:03PM +0100, Dominique Leuenberger wrote: hi,
just now again, the filesystem changed back to Read-Only on my 11.1 install.
Please file a bug, but note that our reiserfs person is on vacation for a while, so it will be a bit before he can get to stuff like this.
In the mean time, please try the Kernel-of-the-day for SLE11, it has some reiserfs fixes in it that missed the last 11.1 update kernel.
Thanks Greg, Actually I had the same problem already before I re-installed it and it was formatted with EXT3 then. Besides the fact that it was much trickier to recover (or fsck did a worse job, I ended up once with my entire /etc in /lost+found, I re-installed it on reiserfs to see if this disappears. There is actually a very lengthy bugreport on the red hat tracker, with the same error message but with different hardware (there it boiled down in a racing condition in the marv driver). I just yesterday installed Kernel of the Day (Kernel:HEAD), 2.6.29-rc7 and have this machine now running on this kernel. Anyhow, As advised, I'll create a bnc entry with all this information. If this happens more often, then this might cause some serious file loss (I got it all backed up.. but I think we all know how some people are with their backups). Dominique -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Greg KH wrote:
On Mon, Mar 09, 2009 at 12:57:03PM +0100, Dominique Leuenberger wrote:
hi,
just now again, the filesystem changed back to Read-Only on my 11.1 install.
Please file a bug, but note that our reiserfs person is on vacation for a while, so it will be a bit before he can get to stuff like this.
In the mean time, please try the Kernel-of-the-day for SLE11, it has some reiserfs fixes in it that missed the last 11.1 update kernel.
This isn't a reiserfs bug. This is reiserfs correctly handling a journal write failure. The log looks like the disk went out to lunch and then was reset, dropping existing requests on the floor and returning I/O errors. This is typically bad hardware, but Dominique followed up saying that Red Hat is tracking a bug in the marv driver. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iEYEARECAAYFAkm6crwACgkQLPWxlyuTD7IdNACgjfs+5BB9gKnMhV5QwQFAqqyA efMAnRS30beEv6vWK1bo4Sd0KQ+9+RJe =iWut -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org

On 3/13/2009 at 15:50, Jeff Mahoney <jeffm@suse.com> wrote: Please file a bug, but note that our reiserfs person is on vacation for a while, so it will be a bit before he can get to stuff like this.
In the mean time, please try the Kernel-of-the-day for SLE11, it has some reiserfs fixes in it that missed the last 11.1 update kernel.
This isn't a reiserfs bug. This is reiserfs correctly handling a journal write failure. The log looks like the disk went out to lunch and then was reset, dropping existing requests on the floor and returning I/O errors. This is typically bad hardware, but Dominique followed up saying that Red Hat is tracking a bug in the marv driver.
JEff, Greg, Indeed, it looked lik a hardware failure... but having two disks (ok: on the same controller, replaced one samsung 250GB with another) being broken sounded a bit awkwards. Just a short recap: The system was running fine for a long time on OSS 10.2 (two disks, one data one system). Installed 11.1 on the system disk (dropped 10.2)... formatted the disk with ext3. System changed FS to r/o frequently... two days up was maximum. Re-Installed 11.1, used reiserfs instead of EXT (ext linked all the stuff to lost+found.. not that I would have liked that.. so another try, another FS). Same behaviour: system goes to R/O of the root FS once in a while (2 days seemed still max). /var is on another patition so I can get some usable logs (opposed to the previous install). Swapped hard disks.. used the previous data disk, re-installed 11.1 on it... with reiserfs... no change at all.. so either the controller is broken or the OS. A lot of reading, I find similiar issues reported, but only from people running advanced raid systems (the marv mentioned earlier). Not the case here. I don't run raid (only lvm.. but I have two disks in the system only). On Monday, 9.3.2009 I installed the Kernel:HEAD on this bo (2.6.29-rc7). The machine has now an uptime of 3 days 20 hours. looks like a new best while running on openSUSE 11.1 for this machine. I also gave it some load, like rebuilding some packages (it's an OBS instance after all)... still, the FS seems to do just fine. The most 'scary' messages in dmesg so far would be:
JBD: barrie-based snync failed on dm-3 - disabling barriers
otherwise I don't see anything special in the dmesg output (after almost 4 days of uptime now). So whatever it was in the openSUSE stock kernel seems to no longer be a problem in the 2.6.29 kernel. Not very helpful knowing that the kernel shipped is possibly guilty for some crashes and not being able to tell exactly why it is, I know. Dominique -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org

On 3/13/2009 at 16:07, "Dominique Leuenberger" On Monday, 9.3.2009 I installed the Kernel:HEAD on this bo (2.6.29-rc7). The machine has now an uptime of 3 days 20 hours. looks like a new best while running on openSUSE 11.1 for this machine.
I just wanted to send you a follow-up on this topic: since the update to 2.6.29-rc7 on March 9, my FS did not fall back a single time to R/O. All that happened once was a kernel panic (caps lock and scroll lock were blinking.. otherwise no fcuntion). so it looks like 'my' issue was fixable like this. But this potentially leaves other people at risk of running into this. Jeff, Greg: shall I file a bug report with those findings? I'm not sure how wide spread it is to be able to reproduce this issue. Dominique -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org
participants (3)
-
Dominique Leuenberger
-
Greg KH
-
Jeff Mahoney