Hi there, I wonder if anyone could give me a few pointers? Running SuSE 8.2Pro. I had a power cut last night while composing an email in mutt. When I brought the server back up I had the folloing issues: - A lot of file permissions changed. Generally to root.root. This crashed postfix. I found all/some of the problems and postfix started. I noticed some of 'my' files had also changed in the same way. - xinetd refuses to start. I reinstalled it from an RPM still no joy. This means that things like dial on demand refuse to work (yep I'm on a modem). - I use qmail style Maildirs for message storage. Some of these appear 'stuffed'. In some cases the whole maildir has lost the dir setting, and now looks like a binary file. In other cases the 'new' or 'cur' sub-dirs have also lost the dir 'bit' And that are the bits I have found! I have no idea what else is 'broken'. I'm running software RAID0 with Ext3 format. Is there any advice anyone could offer before I backup my configs and start all over? A good friend reminded me about running fschk over the drive (didn't do it last night due to mild panaic attack <g>), so I will do this tonight. All ideas are most welcome. -- Regards, Roland
On Tuesday 14 September 2004 06:53 pm, Roland Hill wrote:
Hi there,
I wonder if anyone could give me a few pointers? Running SuSE 8.2Pro.
I had a power cut last night while composing an email in mutt. When I brought the server back up I had the folloing issues:
- A lot of file permissions changed. Generally to root.root. This crashed postfix. I found all/some of the problems and postfix started. I noticed some of 'my' files had also changed in the same way.
- xinetd refuses to start. I reinstalled it from an RPM still no joy. This means that things like dial on demand refuse to work (yep I'm on a modem).
- I use qmail style Maildirs for message storage. Some of these appear 'stuffed'. In some cases the whole maildir has lost the dir setting, and now looks like a binary file. In other cases the 'new' or 'cur' sub-dirs have also lost the dir 'bit'
And that are the bits I have found! I have no idea what else is 'broken'.
I'm running software RAID0 with Ext3 format.
Is there any advice anyone could offer before I backup my configs and start all over?
A good friend reminded me about running fschk over the drive (didn't do it last night due to mild panaic attack <g>), so I will do this tonight. All ideas are most welcome.
-- Regards,
Roland
I wold start by booting from the cd and go into the repair mode, automatic, and let SuSE try to fix the problem. Then I would go out and buy a UPS for about $50 bucks and try to head off the next power glitch. Richard -- Old age ain't for Sissies!
On 14-Sep-04 Roland Hill wrote:
Hi there,
I wonder if anyone could give me a few pointers? Running SuSE 8.2Pro.
I had a power cut last night while composing an email in mutt. When I brought the server back up I had the folloing issues:
- A lot of file permissions changed. Generally to root.root. This crashed postfix. I found all/some of the problems and postfix started. I noticed some of 'my' files had also changed in the same way. [and other woes ... ]
A good friend reminded me about running fschk over the drive (didn't do it last night due to mild panaic attack <g>), so I will do this tonight. All ideas are most welcome.
'fsck' may help. It's a remarkably capable program and can solve a
surprising range of filesystem problems, considering the state that
file systems can get into.
However, you are likely to find that a lot of files (and file fragments)
end up in the "lost+found" directory of the relevant filesystem.
Nevertheless, I'm sadly inclined to suspect that you have suffered
major corruption. A simple power cut should not change permissions
of files etc. -- the computer would normally simply stop, in the
middle of whatever it was doing when the power failed. As a result,
the worst you should expect after re-boot is that disk-writes from
cache were not carried out, so some file changes did not take place
and there may possibly be some inconsistencies between the contents
of different files; and "hidden" backup files created during editing
are still there, etc. That sort of thing. At worst you should find
when 'fsck' is run automatically (since not shut down cleanly) is
that some inode deletions need cleaning up, and inode counts
resetting, all of which 'fsck' should take in its stride.
Changed permissions is much more ominous. This suggests RAM failure.
Maybe (the more likely) they are being wrongly read from disk, so
that it looks like "root.root" but in fact is not (remember that
userids are stored as numbers in inodes, not as names). Root has
user number "0", so all the bits in the user number are zeroed.
Possibly (though unlikely because it seems several files are
involved) this took place during a disk-write from corrupt RAM.
Or -- worse still -- your hard drive circuitry has been damaged.
I would suggest some hardware tests before trying to put things right,
just in case the cure turns out to be worse than the disease. Eg:
1. Run "memcheck" on your RAM. This is available as a boot-time
option so you do not actually involve your Linux system in this.
2. Physically take out each hard drive, and install it in some
other machine where you can mount its filesystems in the Linux
system running on that machine. What does the BIOS see? What
do you get when you run 'fdisk /dev/hdWhatever' on the drive?
(only use the "p" command in fdisk, to print the partition
information; DON'T TOUCH the "w" key because this will write
stuff back to the partition table. Finally quit with "q".)
Then check the contents of directories, ownerships, permissions,
etc. This enables you to separate hardware problems like corrupt
RAM from problems like damaged hard drive circuitry.
3. If you have a spare old hard drive, install it in the suspect
machine (with no other hard drives present) and try to carry
out an installation of Linux on this. Run various tasks on this
for a bit and check whether anything unwanted is happening.
This can check the motherboard/hdd interface as well as whether
the CPU/RAM are doing their job properly.
Unfortunately the power surge that sometimes accompanies a power
cut and/or its restoration can cause such hardware damage.
Living myself where brief power cuts are all too common, I have
my two desktops running off a Belkin uninterruptable power supply
(UPS). Belkin provide Linux drivers for their UPSs, and it works
well. There is a serial link from the UPS to one machine which
invokes a "shutdown" script if the power has been off more than
a certain time, and the UPS then waits for a further time before
switching itself off. Meanwhile machine 1 (from this "shutdown"
script) sends an email over the LAN to user "shutdown" on machine 2,
which has a root 'cron' job watching /usr/spool/mail/shutdown/, and
if a file appears there then machine 2 removes it and does
"shutdown -h now".
After sending the email, machine 1 shuts down. There's just time
for all this to happen before the UPS goes off. Result: any power
failure lasting more than a minute or so leads to clean shutdown
of both machines, therefore no file corrpution. Brief power cuts
(up to 10 secnds at a time are frequent) are simply ignored: the
UPS continues to supply power, and all is back to normal once
power resumes.
The UPS procides an excellent buffer against surges and spikes
coming up the mains, and I have had no problems since starting
to use it.
On the other hand, I thought I was OK with a laptop in this situation,
since a laptop has its own battery and can ride out even quite
extended power outages, and can be set up to go into 'shutdown' when
a "battery low" condition develops. However, I was rudely woken out
of this complacency one day when the power went off/on/off/on/...
several times in quick succcession. After this, the laptop was dead.
The subsequent repair attempt found that the motherboard had been fried.
The replacement motherboard being of the wrong type, all this machine
is now good for is an archive of files from which, when needed, I can
recover copies by splitting tar archives into floppy-sized chunks
(yes, the PCMCIA slot doesn't work either so I can't download over
the LAN!).
Now I use a replacement laptop which is isolated from the mains
through two layers of Belkin "SurgeMaster" surge protector.
Beware!
Good luck with your attempts at recovery.
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)
Ted Harding said: <a lot of extremely useful things> Thanks very much for your detailed reply. I am pretty much resigned to the fact that a fresh install is required. Your advice of hardware checks etc is appreciated. The 'box' has been through power cuts before, but I have never had this problem, and when people who have been 'at this' longer than I have not seen it before, you *know* you're in trouble! ......oh well, I did want to try a new partitioning scheme.....<grin> -- Regards, Roland
participants (3)
-
Richard
-
Roland Hill
-
Ted.Harding@nessie.mcc.ac.uk