On Monday 06 January 2003 06:30, Rohit wrote:
On Mon, 6 Jan 2003, Derek Fountain wrote:
it via NFS. While writing a few gigs of data to it from another machine, it's falling about all over the place. The logs are full of errors from reiserfs and encouragements to run fsck.
Just make sure that one file is not more than 2G in size, ever. That is an NFS limitation methinks... if individual files are not that big, then there should be no problem. NFS and reiserfs are matured technologies and so is LVM - so I think the problem may lie somewhere else.
Post the log-error when it started. The very first messages - may be the first 10-20 of them.
I don't think posting dozens of errors will help much. Here's what I see: the problems started off with a few dozen "bit already cleared" errors from reiserfs, then it had a load of tree/node problems. Then it got an I/O error, and it all went south from there. On reboot, the LVM vgscan can't read the LV details from the physical volumes. All appears to suggest that the LVM layer went completely bananas. I'm using the kernel level NFS server. I wonder if the user level one would be any better. Perhaps there's something messy in that interaction. -- Australian Linux Technical Conference 2003: http://www.linux.conf.au/ Explain to your boss the benefits of you going...
On Monday 06 January 2003 12:08 am, Derek Fountain wrote:
I don't think posting dozens of errors will help much. Here's what I see: the problems started off with a few dozen "bit already cleared" errors from reiserfs, then it had a load of tree/node problems. Then it got an I/O error, and it all went south from there. On reboot, the LVM vgscan can't read the LV details from the physical volumes. All appears to suggest that the LVM layer went completely bananas.
I'm using the kernel level NFS server. I wonder if the user level one would be any better. Perhaps there's something messy in that interaction.
Have you ruled out the possibility of a physical problem [bad/loose cable, power spikes, cosmic rays, or ...?] or perhaps a physically bad controller? [had that happen once -- the controller "flipped bits" on writes -- and this was on a mini/mainframe!]
Have you ruled out the possibility of a physical problem [bad/loose cable, power spikes, cosmic rays, or ...?] or perhaps a physically bad controller? [had that happen once -- the controller "flipped bits" on writes -- and this was on a mini/mainframe!]
There don't appear to be any hardware problems, at least not consistent ones. It could have been a bad disk sector or something setting off an unpleasant chain reaction. Nothing to suggest that in the logs though. I couldn't get the user space NFS thing to work, so I stopped trying with that. I reformatted my logical volumes as ext3. So far that's working flawlessly. -- Australian Linux Technical Conference 2003: http://www.linux.conf.au/ Explain to your boss the benefits of you going...
I couldn't get the user space NFS thing to work, so I stopped trying with that. I reformatted my logical volumes as ext3. So far that's working flawlessly.
Correction. Errors from ext3 now. Different from the reiser ones, I get: Jan 6 16:26:47 beetle kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #229383: rec_len is too small for name_len - offset=504, inode=229395, rec_len=36, name_len=36 and Jan 6 16:28:34 beetle kernel: attempt to access beyond end of device Jan 6 16:28:34 beetle kernel: 3a:00: rw=0, want=671612932, limit=5242880 on reading what appeared to be correctly written data. Looks like this stuff is broken. -- Australian Linux Technical Conference 2003: http://www.linux.conf.au/ Explain to your boss the benefits of you going...
On Mon, 6 Jan 2003, Derek Fountain wrote:
Correction. Errors from ext3 now. Different from the reiser ones, I get:
Naturally different :) Different code. Messages at error time may not be extremely correct-leading though.. Looks like you are losing the HDD or cabling or something.. Rohit ********************************************************* Disclaimer This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited. ********************************************************* Visit us at http://www.mahindrabt.com
It's related: http://www.zdnet.com.au/printfriendly?AT=2000034960-20270972 Op maandag 6 januari 2003 09:08, schreef Derek Fountain:
On Monday 06 January 2003 06:30, Rohit wrote:
On Mon, 6 Jan 2003, Derek Fountain wrote:
it via NFS. While writing a few gigs of data to it from another machine, it's falling about all over the place. The logs are full of errors from reiserfs and encouragements to run fsck.
Just make sure that one file is not more than 2G in size, ever. That is an NFS limitation methinks... if individual files are not that big, then there should be no problem. NFS and reiserfs are matured technologies and so is LVM - so I think the problem may lie somewhere else.
Post the log-error when it started. The very first messages - may be the first 10-20 of them.
I don't think posting dozens of errors will help much. Here's what I see: the problems started off with a few dozen "bit already cleared" errors from reiserfs, then it had a load of tree/node problems. Then it got an I/O error, and it all went south from there. On reboot, the LVM vgscan can't read the LV details from the physical volumes. All appears to suggest that the LVM layer went completely bananas.
I'm using the kernel level NFS server. I wonder if the user level one would be any better. Perhaps there's something messy in that interaction.
-- Richard Bos Without a home the journey is endless
participants (4)
-
Derek Fountain
-
Richard Bos
-
Rohit
-
Tom Emerson