Mailinglist Archive: opensuse (1695 mails)

< Previous Next >
Re: [opensuse] 10.2 no RAID to 11.0 RAID 1
  • From: "Andrew Joakimsen" <joakimsen@xxxxxxxxx>
  • Date: Mon, 22 Sep 2008 18:51:13 -0400
  • Message-id: <23fd749a0809221551h45dc3887w1ad24a73c1569831@xxxxxxxxxxxxxx>
On Mon, Sep 22, 2008 at 5:59 PM, Brian K. White <brian@xxxxxxxxx> wrote:

----- Original Message -----
From: "Andrew Joakimsen" <joakimsen@xxxxxxxxx>
To: "Brian K. White" <brian@xxxxxxxxx>
Cc: <opensuse@xxxxxxxxxxxx>
Sent: Monday, September 22, 2008 4:05 PM
Subject: Re: [opensuse] 10.2 no RAID to 11.0 RAID 1


On Mon, Sep 22, 2008 at 3:27 PM, Brian K. White <brian@xxxxxxxxx> wrote:

It's perfectly possible to force a rebuild.
In fact, you can force rebuilds in mdadm in situations where no firmware
raid will ever let you.
If you don't know how, that's a you problem not an mdadm problem.

I know how and issue the right command. It say /dev/sdb3 or whatnot
DOES NOT EXSIST.

But if you do ll /dev/sdb3 or even cat /dev/sdb3 the device is obviously
there.

So yes mdadm is crap and should never be used. If you need to do mdadm
/dev/md0 --fail /dev/sdb3 and it say sdb3 do not exist there is a
serious issues of the developers piping their toilets into their code.

Wrong. (Unless you can supply enough exact commands and responses and other
observations to prove your diagnostic process and deductions aren't full of
holes. You have not done so above.)

I still have the drives. I am still looking for real instructions on
how to use mdadm. One of the "step by step" guides even show one of
the errors as normal output! So I figure what the hell let me continue
anyways and of course it did not work.

I have seen a few different things that each were different problems, yet
each could have been described roughly as above, and yet in each case the
drive was not actually unavailable and all desired operations were able to be
performed somehow. The exact steps varied in each case because the exact
problem varied in each case. I don't know which of the exact problems you
actually had, because as I said, there was just in my own little experience
more than one way to get something roughly like that, so I can't say what
exactly you could or should have done that would have worked.

Ah, so there is no universal test case. There has to be. Let's assume
one drive is "bad" what then is the correct way to indicate this
through mdadm and start the now "degraded" RAID-1 array?

This all assumes good hardware btw. A buggy disk or controller could actually
make a disk appear bad and then later good again or good then lock up etc..
As far as I'm concerned, you could have bad hardware even. You are saying
something doesn't work, but you are not showing your deductive process and so
the claim is meaningless. Send me your problem disks that you think are
impossible to assemble and I bet in a little while I can tell you how to
assemble the array as long as there actually is enough there to use. (if you
did something stupid and blew away metadata that can't be recreated or
inferred, well no hardware raid card will save you from that either.)

All I can say is the systems have an ASUS P4P800-VM mainboard (Intel
865G chip set). They ran Fedora for 2 years and then I replaced the
hard drives and installed openSUSE on md RAID. Two systems physically
20 miles apart the same thing happens to. The hard drive manufacturer
long test "passed" on all four drives. The fact that I can mount each
of the partitions that made up /dev/md0 and the md5 of all important
files on the system on both partitions (and just the fact that I can
read the data off the individual partitions) further shows that is is
not a hardware issue. I still have the drives, if I am wrong I have
no problem admitting it.


And I'm not even slightly an mdadm guru. I simply spent a good solid weekend
and then several smaller incidents experimenting. I would say it's still
black art to me. But even at this level I already have actually performed
actions you claim are impossible, and have seen symptoms like you decribe
above, except I looked at the problem longer than 13 seconds and discovered
the problem was not as it seemed and that it was prfectly solvable in every
case so far. That includes those 10 boxes I was talking about. The disks kept
failing randomly, but it was always possible to rebuild and rejoin them. It
sometimes took some poking and insight. I'm not saying it was always obvious
what to do or why. Just that it always turned out to be do-able even when it
looked impossible based on the first and most obvious commands.

So far my assertion stands. You should not expect mdraid to work for you, but
that has no bearing on other people or on mdraid itself.
You are merely saying that because you don't know how to fly helicopters,
that helicopters are garbage.

Prove me wrong. Because noone has been able to provide the proper
commands to rebuild an array. There is no documentation on how to do
it, the man page is vague and the commands dont work correctly.
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >