Mailinglist Archive: opensuse (818 mails)

< Previous Next >
Re: [opensuse] software RAID vs BIOS RAID
On 9/15/2011 6:08 AM, James Knott wrote:
Per Jessen wrote:
John Andersen wrote:

At one time there was a problem having /boot on raid, and its been a
while since I had to reconfigure a fresh box, so I don't know if this
is still the case.
With lilo it works fine, but I don't know about grub.



It doesn't work with grub. With RAID and also LVM, /boot has to be on a
regular partition. I recently set up a server with four 1 TB drives,
with LVM on RAID 4. I created a 2 GB partition to hold /boot and used
the other 3 2 GB partitions for swap. Everything else is in the LVM on
RAID. That system will soon also have the data backed up to another
computer in a different country.

It works fine, it just has to be raid1.

I usually don't actually do this any more simply because it's not worth the fuss, but just for reference, it's perfectly doable and works fine.
In a few cases where for whatever reason I can't install a usb thumb drive to boot from I do still do this.

* You make a small boot partition on one drive,
* fdisk type "fd" linux raid autodetect,
* mark it bootable,
* clone the partition table to all other drives,
* create a raid1 array using all the partition-1's,
* put /boot on that array in yast,

And you're pretty much done. When the bios boots, it picks one of the drives, boots grub from that drives mbr or from that drives bootable partition, grub reads it's files just fine from whatever drive the bios happened to pick as the boot drive, grub does not know or car that the filesystem it's reading is normally a member of a raid1 array. At boot time it's just a plain filesystem on a plain partition. The important factors are:

* The mdadm raid metadata does not modify the individual filesystems that it's maintaining copies of. Each copy is still a valid free-standing filesystem as if it were never part of an array. This is not necessarily true for other raid implementations but it is true for linux mdraid. This means that when the bios boots grub or other boot loader, the bootloader does not have to include a raid driver to read the partition or the filesystem, it can read any individual raid1 volume as a plain filesystem on a plain partition on a plain disk.

* The bootloader in most cases is purely read-only. It does not modify one byte of the data in the raid volume it's reading, and so a few seconds later when the kernel loads up and starts looking around for raid arrays to assemble, all the volumes of the raid1 array are still consistent. The raid1 array assembles just fine every time. Once the kernel has done that all further writes until power-off are written to the array not any single drive so no problem. (assuming the OS bootloader manager tools are configured correctly to write to /dev/md0 not /dev/sd* as per my other post)


Some actual commands for an example 8-drive box:

Start a fresh install and either switch to another screen for a normal local console install, or for a remote text mode install, use ssh and don't start yast in the first place when you first log in. Either way, get to a shell after the install environment is loaded up but before yast has gone past the first screen or two.

Use fdisk or sfdisk or parted to partition one drive /dev/sda with say a 512M or 1G /boot partition. You can't grow this later and you may end up needing to store several different versions of kernels and accompanying large initrd's, not to mention various other possible boot files like maybe a knoppix or puppy linux whole system in ram image, and you don't want kernel updates to fail in a couple years because it's out of room. You may want to make /boot even say 5G. But definitely 512M at least just to allow normal room for kernels and initrd's if you ever turn on multiple versions for testing kotd etc.
And one big everything-else partition. Knock yourself out making more partitions if you want for /home /var swap etc... that would just point out even more why not to do this part manually in yast during install.
Mark the /boot one active (bootable), mark them both type "fd" linux raid autodetect, not type 83 linux.

Then clone sda to sdb:
# sfdisk -d /dev/sda |sfdisk /dev/sdb

Then use the shell history to repeat the command for the rest of the drives. Up-arrow, edit last character, enter, repeat 6 times, bang bang bang done.
# sfdisk -d /dev/sda |sfdisk /dev/sdc
# sfdisk -d /dev/sda |sfdisk /dev/sdd
# sfdisk -d /dev/sda |sfdisk /dev/sde
# sfdisk -d /dev/sda |sfdisk /dev/sdf
# sfdisk -d /dev/sda |sfdisk /dev/sdg
# sfdisk -d /dev/sda |sfdisk /dev/sdh

This is for MSDOS partition tables which are still the norm.
Unfortunately last time I looked (not too recently) there was no equally efficient way to clone GUID partition tables with parted or anything else.

But luckily GPT are still not the norm and generally not necessary and the nice sfdisk way is available.

Then make sure the raid modules you will need are loaded, usually raid 0, 1, and 456 are loaded by default, and these days raid10 is present in the install environment but not loaded by default. If you want raid10 and you want to use the nice raid10 module which is a bit more sophisticated and a heck of a lot easier than manually using raid0 and raid1 on top of each other, just "modprobe raid10"

Then create the raid1 /boot array:
# mdadm -C -l1 -n8 /dev/md0 /dev/sd{a,b,c,d,e,f,g,h}1

Then create the / array, lets say raid5 so you don't have to worry about the modprobe issue:
# mdadm -C -l5 -n8 /dev/md1 /dev/sd{a,b,c,d,e,f,g,h}2

Those are literal valid shell syntax and there are a few reasons to actually type it just that way.
* easier and faster than /dev/sda1 /dev/sdb1 /dev/sdc1...
* less error-prone, you can't accidentally forget one of the 2's or mistakenly make it a 1 because you imperfectly edited the previous command with all 1's
* the smaller syntax /dev/sd[a-h]2 only works for contiguous consecutive ranges which may not be the case, and doesn't work in the installers less feature-rich shell, possibly not the emergency shell in the initrd during a failed boot attempt either.

Then either return to the yast screen if it's already running or run yast now, and the arrays md0 and md1 will appear and be selectable in yast. Put /boot on md0 and / on md1.

You _can_ do all that manually completely from within yast but it's sooo many clicks and steps and entering values manually, correctly, repeatedly, into fields. It's very error prone and tedious. But for only a few drives and only one machine one time, maybe it's simpler than going to the shell if your not used to it.

--
bkw
--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx

< Previous Next >
Follow Ups