On 3/13/2009 at 15:50, Jeff Mahoney
wrote: Please file a bug, but note that our reiserfs person is on vacation for a while, so it will be a bit before he can get to stuff like this. In the mean time, please try the Kernel-of-the-day for SLE11, it has some reiserfs fixes in it that missed the last 11.1 update kernel.
This isn't a reiserfs bug. This is reiserfs correctly handling a journal write failure. The log looks like the disk went out to lunch and then was reset, dropping existing requests on the floor and returning I/O errors. This is typically bad hardware, but Dominique followed up saying that Red Hat is tracking a bug in the marv driver.
JEff, Greg, Indeed, it looked lik a hardware failure... but having two disks (ok: on the same controller, replaced one samsung 250GB with another) being broken sounded a bit awkwards. Just a short recap: The system was running fine for a long time on OSS 10.2 (two disks, one data one system). Installed 11.1 on the system disk (dropped 10.2)... formatted the disk with ext3. System changed FS to r/o frequently... two days up was maximum. Re-Installed 11.1, used reiserfs instead of EXT (ext linked all the stuff to lost+found.. not that I would have liked that.. so another try, another FS). Same behaviour: system goes to R/O of the root FS once in a while (2 days seemed still max). /var is on another patition so I can get some usable logs (opposed to the previous install). Swapped hard disks.. used the previous data disk, re-installed 11.1 on it... with reiserfs... no change at all.. so either the controller is broken or the OS. A lot of reading, I find similiar issues reported, but only from people running advanced raid systems (the marv mentioned earlier). Not the case here. I don't run raid (only lvm.. but I have two disks in the system only). On Monday, 9.3.2009 I installed the Kernel:HEAD on this bo (2.6.29-rc7). The machine has now an uptime of 3 days 20 hours. looks like a new best while running on openSUSE 11.1 for this machine. I also gave it some load, like rebuilding some packages (it's an OBS instance after all)... still, the FS seems to do just fine. The most 'scary' messages in dmesg so far would be:
JBD: barrie-based snync failed on dm-3 - disabling barriers
otherwise I don't see anything special in the dmesg output (after almost 4 days of uptime now). So whatever it was in the openSUSE stock kernel seems to no longer be a problem in the 2.6.29 kernel. Not very helpful knowing that the kernel shipped is possibly guilty for some crashes and not being able to tell exactly why it is, I know. Dominique -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-kernel+help@opensuse.org