Re: [opensuse] software RAID vs BIOS RAID

15 Sep 2011

      On 9/15/2011 6:08 AM, James Knott wrote:
...
Per Jessen wrote:
...
John Andersen wrote:
...
At one time there was a problem having /boot on raid, and its been a
while since I had to reconfigure a fresh box, so I don't know if this
is still the case.
With lilo it works fine, but I don't know about grub.
It doesn't work with grub. With RAID and also LVM, /boot has to be on a
regular partition. I recently set up a server with four 1 TB drives,
with LVM on RAID 4. I created a 2 GB partition to hold /boot and used
the other 3 2 GB partitions for swap. Everything else is in the LVM on
RAID. That system will soon also have the data backed up to another
computer in a different country.
It works fine, it just has to be raid1.

I usually don't actually do this any more simply because it's not worth 
the fuss, but just for reference, it's perfectly doable and works fine.
In a few cases where for whatever reason I can't install a usb thumb 
drive to boot from I do still do this.

* You make a small boot partition on one drive,
* fdisk type "fd" linux raid autodetect,
* mark it bootable,
* clone the partition table to all other drives,
* create a raid1 array using all the partition-1's,
* put /boot on that array in yast,

And you're pretty much done. When the bios boots, it picks one of the 
drives, boots grub from that drives mbr or from that drives bootable 
partition, grub reads it's files just fine from whatever drive the bios 
happened to pick as the boot drive, grub does not know or car that the 
filesystem it's reading is normally a member of a raid1 array. At boot 
time it's just a plain filesystem on a plain partition. The important 
factors are:

* The mdadm raid metadata does not modify the individual filesystems 
that it's maintaining copies of. Each copy is still a valid 
free-standing filesystem as if it were never part of an array. This is 
not necessarily true for other raid implementations but it is true for 
linux mdraid. This means that when the bios boots grub or other boot 
loader, the bootloader does not have to include a raid driver to read 
the partition or the filesystem, it can read any individual raid1 volume 
as a plain filesystem on a plain partition on a plain disk.

* The bootloader in most cases is purely read-only. It does not modify 
one byte of the data in the raid volume it's reading, and so a few 
seconds later when the kernel loads up and starts looking around for 
raid arrays to assemble, all the volumes of the raid1 array are still 
consistent. The raid1 array assembles just fine every time. Once the 
kernel has done that all further writes until power-off are written to 
the array not any single drive so no problem. (assuming the OS 
bootloader manager tools are configured correctly to write to /dev/md0 
not /dev/sd* as per my other post)

Some actual commands for an example 8-drive box:

Start a fresh install and either switch to another screen for a normal 
local console install, or for a remote text mode install, use ssh and 
don't start yast in the first place when you first log in. Either way, 
get to a shell after the install environment is loaded up but before 
yast has gone past the first screen or two.

Use fdisk or sfdisk or parted to partition one drive /dev/sda with say a 
512M or 1G /boot partition. You can't grow this later and you may end up 
needing to store several different versions of kernels and accompanying 
large initrd's, not to mention various other possible boot files like 
maybe a knoppix or puppy linux whole system in ram image, and you don't 
want kernel updates to fail in a couple years because it's out of room. 
You may want to make /boot even say 5G. But definitely 512M at least 
just to allow normal room for kernels and initrd's if you ever turn on 
multiple versions for testing kotd etc.
And one big everything-else partition. Knock yourself out making more 
partitions if you want for /home /var swap etc... that would just point 
out even more why not to do this part manually in yast during install.
Mark the /boot one active (bootable), mark them both type "fd" linux 
raid autodetect, not type 83 linux.

Then clone sda to sdb:
# sfdisk -d /dev/sda |sfdisk /dev/sdb

Then use the shell history to repeat the command for the rest of the 
drives. Up-arrow, edit last character, enter, repeat 6 times, bang bang 
bang done.
# sfdisk -d /dev/sda |sfdisk /dev/sdc
# sfdisk -d /dev/sda |sfdisk /dev/sdd
# sfdisk -d /dev/sda |sfdisk /dev/sde
# sfdisk -d /dev/sda |sfdisk /dev/sdf
# sfdisk -d /dev/sda |sfdisk /dev/sdg
# sfdisk -d /dev/sda |sfdisk /dev/sdh

This is for MSDOS partition tables which are still the norm.
Unfortunately last time I looked (not too recently) there was no equally 
efficient way to clone GUID partition tables with parted or anything else.

But luckily GPT are still not the norm and generally not necessary and 
the nice sfdisk way is available.

Then make sure the raid modules you will need are loaded, usually raid 
0, 1, and 456 are loaded by default, and these days raid10 is present in 
the install environment but not loaded by default. If you want raid10 
and you want to use the nice raid10 module which is a bit more 
sophisticated and a heck of a lot easier than manually using raid0 and 
raid1 on top of each other, just "modprobe raid10"

Then create the raid1 /boot array:
# mdadm -C -l1 -n8 /dev/md0 /dev/sd{a,b,c,d,e,f,g,h}1

Then create the / array, lets say raid5 so you don't have to worry about 
the modprobe issue:
# mdadm -C -l5 -n8 /dev/md1 /dev/sd{a,b,c,d,e,f,g,h}2

Those are literal valid shell syntax and there are a few reasons to 
actually type it just that way.
* easier and faster than /dev/sda1 /dev/sdb1 /dev/sdc1...
* less error-prone, you can't accidentally forget one of the 2's or 
mistakenly make it a 1 because you imperfectly edited the previous 
command with all 1's
* the smaller syntax /dev/sd[a-h]2 only works for contiguous consecutive 
ranges which may not be the case, and doesn't work in the installers 
less feature-rich shell, possibly not the emergency shell in the initrd 
during a failed boot attempt either.

Then either return to the yast screen if it's already running or run 
yast now, and the arrays md0 and md1 will appear and be selectable in 
yast. Put /boot on md0 and / on md1.

You _can_ do all that manually completely from within yast but it's sooo 
many clicks and steps and entering values manually, correctly, 
repeatedly, into fields. It's very error prone and tedious. But for only 
a few drives and only one machine one time, maybe it's simpler than 
going to the shell if your not used to it.

-- 
bkw
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org

Re: [opensuse] software RAID vs BIOS RAID

Brian K. White