[opensuse] Recover hardware controlled RAID disks with other computer?

Bjoern Voigt

10 Jul 2017 10 Jul '17

08:52

I plan to install openSUSE Leap 42.2 on a HPE ProLiant MicroServer Gen8. I have too options to setup RAID 1 on for the disks: 1) Linux software RAID 2) Hardware RAID with the embedded HPE Dynamic Smart Array B120i Controller I want to be prepared for a possible hardware error. I am familiar with software RAID. Even if the mainboard breaks I can take one or both disks to another computer and recover the data with any Linux system. But how is it with the hardware RAID (B120i controller here)? If the mainboard/CPU/controller/whatever breaks, can I use the same strategy (take the disks to another PC without hardware RAID, recover data) or do I need to buy another HPE ProLiant MicroServer Gen8 or compatible server and recover the disks there? What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Greetings, Björn -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Show replies by date

Peter Sikorski GTL

10 Jul 10 Jul

09:28

Am 10.07.2017 um 10:52 schrieb Bjoern Voigt:

...

I plan to install openSUSE Leap 42.2 on a HPE ProLiant MicroServer Gen8.

I have too options to setup RAID 1 on for the disks: 1) Linux software RAID 2) Hardware RAID with the embedded HPE Dynamic Smart Array B120i Controller

I want to be prepared for a possible hardware error.

I am familiar with software RAID. Even if the mainboard breaks I can take one or both disks to another computer and recover the data with any Linux system.

But how is it with the hardware RAID (B120i controller here)? If the mainboard/CPU/controller/whatever breaks, can I use the same strategy (take the disks to another PC without hardware RAID, recover data) or do I need to buy another HPE ProLiant MicroServer Gen8 or compatible server and recover the disks there?

If you don’t use an identical raid controller (same model, maybe you need also same stepping and same bios version) on the new computer you will not be able to read your disks anymore. So if you don’t have a hardware support contract it is the safest way to buy 2 controller and use on as cold spare.

...

What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.)

Normally you get better performance. The controller take some work away from the cpu. Many controllers also have a lot of on board memory (at better ones also battery protected) Greetings Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

09:45

Peter Sikorski GTL wrote:

...

Am 10.07.2017 um 10:52 schrieb Bjoern Voigt:

...
I plan to install openSUSE Leap 42.2 on a HPE ProLiant MicroServer Gen8.

I have too options to setup RAID 1 on for the disks: 1) Linux software RAID 2) Hardware RAID with the embedded HPE Dynamic Smart Array B120i Controller

I want to be prepared for a possible hardware error.

I am familiar with software RAID. Even if the mainboard breaks I can take one or both disks to another computer and recover the data with any Linux system.

But how is it with the hardware RAID (B120i controller here)? If the mainboard/CPU/controller/whatever breaks, can I use the same strategy (take the disks to another PC without hardware RAID, recover data) or do I need to buy another HPE ProLiant MicroServer Gen8 or compatible server and recover the disks there?

If you don’t use an identical raid controller (same model, maybe you need also same stepping and same bios version) on the new computer you will not be able to read your disks anymore.

Uh, with HP Smart Array controllers, that is not a problem, they're backwards compatible. -- Per Jessen, Zürich (21.6°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Bjoern Voigt

11:31

Peter Sikorski GTL wrote:

...

If you don’t use an identical raid controller (same model, maybe you need also same stepping and same bios version) on the new computer you will not be able to read your disks anymore. So if you don’t have a hardware support contract it is the safest way to buy 2 controller and use on as cold spare.

...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Normally you get better performance. The controller take some work away from the cpu. Many controllers also have a lot of on board memory (at better ones also battery protected)

Per Jessen wrote:

...

TMK, an array built with an HP Smart Array controller can be recovered on any other HP Smart Array controller, the same or newer. I would not expect to recover on anything else.

...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Offloading to hardware is always about speed and less load on the CPU. Thanks, Peter and Peter.

So I will go for software RAID. The server is an backup server which runs backup jobs over night. Even RAID is not necessary, but on the wish-list. Disk performance is not so important. Most of the CPU power will be needed for the LUKS AES encryption and for the SSH/VPN/Rsync operations. I thought, that the server can run some days with a broken disks until I go to the server location and replace the disk. But I read, that this only works 50%, because the HPE ProLiant MicroServer Gen8 server always boots from the first disk or from a USB key. Probably the initrd from openSUSE on the USB key selects the working disk for further booting, if the first (boot) disk is broken. The lifetime of the USB key is a problem too. Some users suggest to install a SSD boot device. But this does not work as expected on this server. Greetings, Björn Greetings, Björn -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

19:38

On 10/07/17 12:31, Bjoern Voigt wrote:

...

Peter Sikorski GTL wrote:

...
If you don’t use an identical raid controller (same model, maybe you need also same stepping and same bios version) on the new computer you will not be able to read your disks anymore. So if you don’t have a hardware support contract it is the safest way to buy 2 controller and use on as cold spare.

...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Normally you get better performance. The controller take some work away from the cpu. Many controllers also have a lot of on board memory (at better ones also battery protected)

Per Jessen wrote:

...
TMK, an array built with an HP Smart Array controller can be recovered on any other HP Smart Array controller, the same or newer. I would not expect to recover on anything else.

I gather Dell have a very poor reputation in this regard. The point holds, however, MAKE SURE you can recover from a broken hardware raid controller, some you can, some you can't.

...

...
...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Offloading to hardware is always about speed and less load on the CPU. Thanks, Peter and Peter.

So I will go for software RAID. The server is an backup server which runs backup jobs over night. Even RAID is not necessary, but on the wish-list. Disk performance is not so important. Most of the CPU power will be needed for the LUKS AES encryption and for the SSH/VPN/Rsync operations.

I thought, that the server can run some days with a broken disks until I go to the server location and replace the disk.

Raid 1? Make sure if you intend to run with broken disks that you have a 3-way mirror. I'd rather run raid-6 ... But I read, that this

...

only works 50%, because the HPE ProLiant MicroServer Gen8 server always boots from the first disk or from a USB key. Probably the initrd from openSUSE on the USB key selects the working disk for further booting, if the first (boot) disk is broken. The lifetime of the USB key is a problem too. Some users suggest to install a SSD boot device. But this does not work as expected on this server.

Doing software raid, I would mirror the system partition across all drives, and also install grub on all drives. That way, if a drive disappears, the system will still boot from the first available drive. Snag is, if anything changes boot-wise, you need to make sure it gets mirrored across all drives. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

19:48

Wols Lists wrote:

...

...
Per Jessen wrote:

...
TMK, an array built with an HP Smart Array controller can be recovered on any other HP Smart Array controller, the same or newer. I would not expect to recover on anything else.

I gather Dell have a very poor reputation in this regard. The point holds, however, MAKE SURE you can recover from a broken hardware raid controller, some you can, some you can't.

Over the last 10-12 years I have moved enough arrays around across HP Smart Array Controllers to know it'll work. It just does. Chapeau!, HP

...

...
...
...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Offloading to hardware is always about speed and less load on the CPU. Thanks, Peter and Peter.

So I will go for software RAID. The server is an backup server which runs backup jobs over night. Even RAID is not necessary, but on the wish-list. Disk performance is not so important. Most of the CPU power will be needed for the LUKS AES encryption and for the SSH/VPN/Rsync operations.

I thought, that the server can run some days with a broken disks until I go to the server location and replace the disk.

Raid 1? Make sure if you intend to run with broken disks that you have a 3-way mirror. I'd rather run raid-6 ...

Yeah. The bigger the disk, the bigger the risk. -- Per Jessen, Zürich (18.1°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

11 Jul 11 Jul

09:56

On 10/07/17 20:48, Per Jessen wrote:

...

...
Raid 1? Make sure if you intend to run with broken disks that you have a

...
3-way mirror. I'd rather run raid-6 ...

...

Yeah. The bigger the disk, the bigger the risk.

The other thing about raid 1 or raid 6, is that linux raid-1 always assumes the first disk is correct in the case of any discrepancy. So if you have a write problem on that disk, your data is corrupt! Although with 4/5/6 it also assumes that the data is correct and if there's any error, that it's the parity at fault, there is a utility (raid6check) that you can run over a raid-6 to fix it provided only one disk is playing up. Raid-6 provides you with two bits of redundant data, so assuming any ONE random block gets corrupted, raid-6 can work out which block is corrupt, and what the correct value is. Raid-5 can only recover if a disk is failed, because it only has one bit of redundant data, so it needs you to tell it which block is corrupt before it can work out what the correct value is. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

10:06

Wols Lists wrote:

...

On 10/07/17 20:48, Per Jessen wrote:

...
...
Raid 1? Make sure if you intend to run with broken disks that you have a

...
3-way mirror. I'd rather run raid-6 ...

...
Yeah. The bigger the disk, the bigger the risk.

The other thing about raid 1 or raid 6, is that linux raid-1 always assumes the first disk is correct in the case of any discrepancy. So if you have a write problem on that disk, your data is corrupt!

If Linux always assumed the 2nd drive was correct, the situation wouldn't be much better :-)

...

Although with 4/5/6 it also assumes that the data is correct and if there's any error, that it's the parity at fault, there is a utility (raid6check) that you can run over a raid-6 to fix it provided only one disk is playing up.

Raid-6 provides you with two bits of redundant data, so assuming any ONE random block gets corrupted, raid-6 can work out which block is corrupt, and what the correct value is.

Raid-5 can only recover if a disk is failed, because it only has one bit of redundant data, so it needs you to tell it which block is corrupt before it can work out what the correct value is.

We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives :-) -- Per Jessen, Zürich (21.6°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Lew Wolfgang

14:13

On 07/11/2017 03:06 AM, Per Jessen wrote:

...

We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives:-)

I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing. BTW, no data lost, all arrays were many-disk RAID6. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

18:33

On 2017-07-11 16:13, Lew Wolfgang wrote:

...

On 07/11/2017 03:06 AM, Per Jessen wrote:

...
We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives:-)

I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing.

BTW, no data lost, all arrays were many-disk RAID6.

Wow. Just two days ago I migrated my last 500 GB disk in my computer to a new 3 TB disk. It had 25000 hours of use, and is still good, says smartctl. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Wols Lists

20:04

On 11/07/17 15:13, Lew Wolfgang wrote:

...

On 07/11/2017 03:06 AM, Per Jessen wrote:

...
We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives:-)

I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing.

They are proper raid drives with error recovery? If they are desktop drives without SCT/ERC, then you're asking for trouble, although I guess if you've successfully replaced them, then they are appropriate drives. For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ... (most drives perform far better than spec, I know. But you should always be prepared for what the manufacturer says is the worst acceptable case.)

...

BTW, no data lost, all arrays were many-disk RAID6.

Good going :-) Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

20:18

On 2017-07-11 22:04, Wols Lists wrote:

...

On 11/07/17 15:13, Lew Wolfgang wrote:

...
On 07/11/2017 03:06 AM, Per Jessen wrote:

...
We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives:-)

I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing.

They are proper raid drives with error recovery? If they are desktop drives without SCT/ERC, then you're asking for trouble, although I guess if you've successfully replaced them, then they are appropriate drives.

For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-? -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Dave Howorth

20:35

On Tue, 11 Jul 2017 22:18:03 +0200 "Carlos E. R." <robin.listas@telefonica.net> wrote:

...

On 2017-07-11 22:04, Wols Lists wrote:

...
On 11/07/17 15:13, Lew Wolfgang wrote:

...
On 07/11/2017 03:06 AM, Per Jessen wrote:

...
We have a disk fail about once a month, on average I think. Most in the bigger/newer RAID6 arrays, surprisingly rarely in RAID1 or RAID5, but they're usually smaller and older drives. Modern 2/4/6Tb SAS drives are a lot more prone to failure than ancient Compaq (9/18/36Gb drives:-)

I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing.

They are proper raid drives with error recovery? If they are desktop drives without SCT/ERC, then you're asking for trouble, although I guess if you've successfully replaced them, then they are appropriate drives.

For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-?

You need to set up the timeouts in Linux to be longer than the ones imposed by the firmware on the drive. I'm sorry but I don't remember whether it's a kernel thing or a mdadm thing. I expect the linux raid wiki knows. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

20:45

On 2017-07-11 22:35, Dave Howorth wrote:

...

On Tue, 11 Jul 2017 22:18:03 +0200 "Carlos E. R." <> wrote:

...

...
...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-?

You need to set up the timeouts in Linux to be longer than the ones imposed by the firmware on the drive. I'm sorry but I don't remember whether it's a kernel thing or a mdadm thing. I expect the linux raid wiki knows.

Wow :-( That can be minutes. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Greg Freemyer

21:16

On Tue, Jul 11, 2017 at 4:45 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:

...

On 2017-07-11 22:35, Dave Howorth wrote:

...
On Tue, 11 Jul 2017 22:18:03 +0200 "Carlos E. R." <> wrote:

...
...
...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-?

You need to set up the timeouts in Linux to be longer than the ones imposed by the firmware on the drive. I'm sorry but I don't remember whether it's a kernel thing or a mdadm thing. I expect the linux raid wiki knows.

Wow :-(

That can be minutes.

That's actually a big difference between "set-up properly" or not. A drive used in a raid-1/5/6 should be set to fail fast instead of retry for a minute or two. Drives designed for use in a raid array will come from the factory that way. If you're using a desktop drive you really need to try and set the retry time down low. Then, in addition you should be running a scrub routinely. That will do a read of all the sectors on the physical media looking for bad sectors. If it finds any, it recreates the data from the other drives and rewrites the sector. Hopefully that fixes it. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

21:39

On 2017-07-11 23:16, Greg Freemyer wrote:

...

On Tue, Jul 11, 2017 at 4:45 PM, Carlos E. R. <> wrote:

...
On 2017-07-11 22:35, Dave Howorth wrote:

...
On Tue, 11 Jul 2017 22:18:03 +0200 "Carlos E. R." <> wrote:

...
...
...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-?

You need to set up the timeouts in Linux to be longer than the ones imposed by the firmware on the drive. I'm sorry but I don't remember whether it's a kernel thing or a mdadm thing. I expect the linux raid wiki knows.

Wow :-(

That can be minutes.

That's actually a big difference between "set-up properly" or not.

A drive used in a raid-1/5/6 should be set to fail fast instead of retry for a minute or two.

Drives designed for use in a raid array will come from the factory that way.

If you're using a desktop drive you really need to try and set the retry time down low.

Where do you do that? "hdparm", perhaps? I just had a look at the man page, didn't locate a setting for it. I don't use raid in production, so to speak, I prefer backups for my use case. But I like learning :-)

...

Then, in addition you should be running a scrub routinely. That will do a read of all the sectors on the physical media looking for bad sectors. If it finds any, it recreates the data from the other drives and rewrites the sector. Hopefully that fixes it.

Well, a surface test like the one triggered by "smartctl" could do. If not, I could use "badblocks" on the entire disk: I have reason to believe the firmware relocates sectors on running it. More than once I have seen errors listed by SMART (bad sector). I try to locate the sector with badblocks, and they disappear. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Wols Lists

12 Jul 12 Jul

11:50

On 11/07/17 22:16, Greg Freemyer wrote:

...

On Tue, Jul 11, 2017 at 4:45 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:

...
On 2017-07-11 22:35, Dave Howorth wrote:

...
On Tue, 11 Jul 2017 22:18:03 +0200 "Carlos E. R." <> wrote:

...
...
...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What would one do to set them up properly? :-?

You need to set up the timeouts in Linux to be longer than the ones imposed by the firmware on the drive. I'm sorry but I don't remember whether it's a kernel thing or a mdadm thing. I expect the linux raid wiki knows.

Wow :-(

That can be minutes.

That's actually a big difference between "set-up properly" or not.

A drive used in a raid-1/5/6 should be set to fail fast instead of retry for a minute or two.

Drives designed for use in a raid array will come from the factory that way.

If you're using a desktop drive you really need to try and set the retry time down low.

Except you can't :-( Read the wiki page on timeouts. It seems to be drives over 1GB, but timeout is no longer an adjustable parameter - it's set to about 2 1/2 minutes and that's that :-( That's why you have to adjust the linux timeout up appropriately. There's a script on the website that will do it for you. Note that setting a long timeout in linux on all drives isn't a problem, it's just that you daren't let linux time out quicker than the drive times out.

...

Then, in addition you should be running a scrub routinely. That will do a read of all the sectors on the physical media looking for bad sectors. If it finds any, it recreates the data from the other drives and rewrites the sector. Hopefully that fixes it.

Not quite true ... The point of a scrub is it reads the drive "end to end". Nowadays drives are computers in their own right, with loads of error correction built in to the drive. So if the drive has difficulty reading, it will re-write internally. Note that magnetic media decays just like ram, just on a timescale of years rather than nano-seconds. Made worse if *parts* of the drive are repeatedly rewritten - that will wear down the data next to it. But yes, if a block fails, it will get recalculated and rewritten. The other thing is I think scrub also updates the mismatch count? This is where my knowledge is currently very patchy, but mismatch count means the data is inconsistent on disk. I think a check scrub just looks for and counts mismatches. A repair scrub will copy drive 1 over the other drive(s) for a mirror, and recalculate and overwrite parity for raid 5/6. For raids 1 & 5, that's about all you can do. For raid 6, there's a program raid6check, which will try to work out which block is corrupt and recalculate that. It's pretty good in that it will identify and fix a single-block corruption, and is unlikely to mis-identify a more complex (and unfixable) problem as a fixable single-block error. Getting raid to do this for you automatically is highly contentious - the raid guys say there are a lot of possible causes and don't want an attempted fix to make matters worse. Personally I think the current situation is sub-optimal but that's my opinion, not theirs ... Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

James Knott

11 Jul 11 Jul

20:21

On 07/11/2017 04:04 PM, Wols Lists wrote:

...

For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What about for those who do know? ;-) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

20:32

On 2017-07-11 22:21, James Knott wrote:

...

On 07/11/2017 04:04 PM, Wols Lists wrote:

...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What about for those who do know? ;-)

I don't, so please explain ;-) -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Greg Freemyer

21:46

On Tue, Jul 11, 2017 at 4:32 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:

...

On 2017-07-11 22:21, James Knott wrote:

...
On 07/11/2017 04:04 PM, Wols Lists wrote:

...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive

s/10GB/10TB/

...

...
...
end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What about for those who do know? ;-)

I don't, so please explain ;-)

Carlos, Consider a Raid-5: Then consider no background scrubber is running. Per the specs for most desktop drives, there is reasonable chance (20%) with 2+ TB drives that one or more of the drives will develop undetected bad sectors. Now, assume one of the drives fail. You replace it with a new drive and kick off a re-build. As soon as you hit the bad sector, your rebuild fails and you are stuck working with backups! == Solutions Use Raid-6, but realize in today's era it is still only good for one failed drive at a time. Find drives with significantly higher specs for undetected bad sectors (I don't know if drives like that exist or not). Use a scrubber religiously to make sure there are no undetected bad sectors. == Here's a 8-year old paper arguing even Raid-6 will run out of safety margin by 2019. http://queue.acm.org/detail.cfm?id=1670144 I have no idea if the assumptions it is making are still valid. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

21:55

On 2017-07-11 23:46, Greg Freemyer wrote:

...

On Tue, Jul 11, 2017 at 4:32 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:

...
On 2017-07-11 22:21, James Knott wrote:

...
On 07/11/2017 04:04 PM, Wols Lists wrote:

...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive

s/10GB/10TB/

...
...
...
end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What about for those who do know? ;-)

I don't, so please explain ;-)

Carlos,

Consider a Raid-5:

Then consider no background scrubber is running.

Per the specs for most desktop drives, there is reasonable chance (20%) with 2+ TB drives that one or more of the drives will develop undetected bad sectors.

Now, assume one of the drives fail. You replace it with a new drive and kick off a re-build.

As soon as you hit the bad sector, your rebuild fails and you are stuck working with backups!

Yes, that's reasonable.

...

== Solutions

Use Raid-6, but realize in today's era it is still only good for one failed drive at a time.

Find drives with significantly higher specs for undetected bad sectors (I don't know if drives like that exist or not).

Use a scrubber religiously to make sure there are no undetected bad sectors.

== Here's a 8-year old paper arguing even Raid-6 will run out of safety margin by 2019.

http://queue.acm.org/detail.cfm?id=1670144

I have no idea if the assumptions it is making are still valid.

I think I read that article or a similar one some years ago. I'll try read it later, time permitting :-) My question was rather on the issue of how to configure the disks properly. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Per Jessen

12 Jul 12 Jul

05:37

Greg Freemyer wrote:

...

== Solutions

Use Raid-6, but realize in today's era it is still only good for one failed drive at a time.

Find drives with significantly higher specs for undetected bad sectors (I don't know if drives like that exist or not).

Use a scrubber religiously to make sure there are no undetected bad sectors.

Don't use desktop drives :-) - I don't know if their error rates are actually lower than enterprise 24/7 drives, but their specs apply to an 8 hour duty cycle. Use mirrored RAID6. For any redundant setup, that's probably normal anyway. For home use, it's perhaps not really an option.

...

== Here's a 8-year old paper arguing even Raid-6 will run out of safety margin by 2019.

http://queue.acm.org/detail.cfm?id=1670144

I have no idea if the assumptions it is making are still valid.

He's primarily going after the reconstruction time, i.e. the time when an array is running degraded. Earlier when I wrote "the bigger the disk, the bigger the risk", that's exactly what I meant. Where is RAID7.x ? -- Per Jessen, Zürich (18.7°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

09:18

Per Jessen wrote:

...

Greg Freemyer wrote:

...
== Here's a 8-year old paper arguing even Raid-6 will run out of safety margin by 2019.

http://queue.acm.org/detail.cfm?id=1670144

I have no idea if the assumptions it is making are still valid.

He's primarily going after the reconstruction time, i.e. the time when an array is running degraded. Earlier when I wrote "the bigger the disk, the bigger the risk", that's exactly what I meant.

Where is RAID7.x ?

Um, in ZFS for starters. I knew that. Which is also where the author of the ACM Queue article above worked/works. -- Per Jessen, Zürich (21.6°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

10:27

On 2017-07-12 07:37, Per Jessen wrote:

...

Greg Freemyer wrote:

...
== Solutions

Use Raid-6, but realize in today's era it is still only good for one failed drive at a time.

Find drives with significantly higher specs for undetected bad sectors (I don't know if drives like that exist or not).

Use a scrubber religiously to make sure there are no undetected bad sectors.

Don't use desktop drives :-) - I don't know if their error rates are actually lower than enterprise 24/7 drives, but their specs apply to an 8 hour duty cycle.

They should talk to Microsoft. Their use case for homes is to have the computer 24*7 connected, idling and waiting for someone to tap a key and work. Not necessarily hibernated/suspended unless they close the lid. Even with the lid closed the computer may awake on its own do do programmed tasks, like an update or virus check. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Wols Lists

12:02

On 11/07/17 22:46, Greg Freemyer wrote:

...

On Tue, Jul 11, 2017 at 4:32 PM, Carlos E. R. <robin.listas@telefonica.net> wrote:

...
On 2017-07-11 22:21, James Knott wrote:

...
On 07/11/2017 04:04 PM, Wols Lists wrote:

...
For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive

s/10GB/10TB/

Whoops :-)

...

...
...
...
end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

What about for those who do know? ;-)

I don't, so please explain ;-)

Carlos,

Consider a Raid-5:

Then consider no background scrubber is running.

Per the specs for most desktop drives, there is reasonable chance (20%) with 2+ TB drives that one or more of the drives will develop undetected bad sectors.

Now, assume one of the drives fail. You replace it with a new drive and kick off a re-build.

As soon as you hit the bad sector, your rebuild fails and you are stuck working with backups!

<snip>

...

I have no idea if the assumptions it is making are still valid.

Sorry, but the facts are wrong, too :-( You're describing a hard error - where the sector is corrupt, and that's that. I'm describing a soft error, where "something" goes wrong, and the drive times out. The next attempt to read it will work fine. But as you describe, the rebuild will bomb. At which point, the raid novice panics and all hell breaks lose. The fix is actually dead simple - just reassemble, with force if necessary. The rebuild will restart, and off you go ... :-) The problem, of course, is that the larger your array gets, the more errors you can expect, and the greater the likelihood of your rebuild failing, maybe multiple times. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Carlos E. R.

12:13

On 2017-07-12 14:02, Wols Lists wrote:

...

On 11/07/17 22:46, Greg Freemyer wrote:

...
On Tue, Jul 11, 2017 at 4:32 PM, Carlos E. R. <> wrote:

...
On 2017-07-11 22:21, James Knott wrote:

...

...
As soon as you hit the bad sector, your rebuild fails and you are stuck working with backups!

<snip>

...
I have no idea if the assumptions it is making are still valid.

Sorry, but the facts are wrong, too :-(

You're describing a hard error - where the sector is corrupt, and that's that. I'm describing a soft error, where "something" goes wrong, and the drive times out. The next attempt to read it will work fine.

But as you describe, the rebuild will bomb. At which point, the raid novice panics and all hell breaks lose. The fix is actually dead simple - just reassemble, with force if necessary. The rebuild will restart, and off you go ... :-)

The problem, of course, is that the larger your array gets, the more errors you can expect, and the greater the likelihood of your rebuild failing, maybe multiple times.

:-( -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)

Lew Wolfgang

11 Jul 11 Jul

21:11

On 07/11/2017 01:04 PM, Wols Lists wrote:

...

On 11/07/17 15:13, Lew Wolfgang wrote:

...
I'm on a roll! three 6TB and two 4TB within the last two months. One 4TB went out yesterday. Disappointing. They are proper raid drives with error recovery? If they are desktop drives without SCT/ERC, then you're asking for trouble, although I guess if you've successfully replaced them, then they are appropriate drives.

We use enterprise-grade Seagate SATA and SAS 3.5-inch drives. Controllers are all LSI/Avago with super-capacitor cache batteries.

...

For those who don't know, a desktop drive is "within spec" if it returns one soft read error per 10GB read. In other words, read a 6TB drive end-to-end twice, and the manufacturer says "if you get a read error, that's normal". But it will cause an array to fail if you haven't set it up properly ...

(most drives perform far better than spec, I know. But you should always be prepared for what the manufacturer says is the worst acceptable case.)

...
BTW, no data lost, all arrays were many-disk RAID6.

Good going :-)

Thanks! Now if I can only find some wood to knock on... :-) Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Bjoern Voigt

07:47

Wols Lists wrote:

...

Doing software raid, I would mirror the system partition across all drives, and also install grub on all drives. That way, if a drive disappears, the system will still boot from the first available drive. Snag is, if anything changes boot-wise, you need to make sure it gets mirrored across all drives. As far as I know the Grub2 version of openSUSE can boot from a /boot ext4 filesystem with RAID 1.

But I am not sure, if the server continues to boot, if the first drive is available, but not working correctly (disk errors). Probably yes. A RAID1 system partition is no problem. Greetings, Björn -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

09:50

On 11/07/17 08:47, Bjoern Voigt wrote:

...

Wols Lists wrote:

...
Doing software raid, I would mirror the system partition across all drives, and also install grub on all drives. That way, if a drive disappears, the system will still boot from the first available drive. Snag is, if anything changes boot-wise, you need to make sure it gets mirrored across all drives.

...

As far as I know the Grub2 version of openSUSE can boot from a /boot ext4 filesystem with RAID 1.

Grub2 can boot from any "standard" raid, namely 1 or 4/5/6. But you MUST tell linux to "domdadm", and also to load the raid drivers into grub. If you don't want to boot from a raid array, but instead want to boot from just one disk of a raid array, you can only use a v1.0 mirror - any other supported raid will fail.

...

But I am not sure, if the server continues to boot, if the first drive is available, but not working correctly (disk errors). Probably yes.

Be careful. By default, if a drive fails, assembling the array will fail and there will be no system partition to boot into. You can, however, tell the system to "assemble and run regardless" - an option specifically for this situation.

...

A RAID1 system partition is no problem.

Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Andrei Borzenkov

10:29

On Tue, Jul 11, 2017 at 12:50 PM, Wols Lists <antlists@youngman.org.uk> wrote:

...

Be careful. By default, if a drive fails, assembling the array will fail and there will be no system partition to boot into. You can, however, tell the system to "assemble and run regardless" - an option specifically for this situation.

In current openSUSE there is no such option - it is (or at least should be) done automatically. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

David C. Rankin

04:46

On 07/10/2017 06:31 AM, Bjoern Voigt wrote:

...

...
...
What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.) Offloading to hardware is always about speed and less load on the CPU. Thanks, Peter and Peter.

So I will go for software RAID.

That is the correct choice. Unless you are saturating you I/O bandwidth and need the incremental boost that a battery supported hardware raid card with bios set to write-back caching -- then Linux software raid is the way to go. Even in the 386-33MHz days, software raid required minimal clock cycles to carryout raid functionality and the load was negligible. Step forward 2 decades with quad-core processors running in the GHz range and software raid is so far down in the noise that it isn't a load or performance issue at all. Best thing about it -- if your Server dies, etc.., you can just pull the array (or individual disks) and throw them in any other Linux box regardless of hardware -- and you are good to go. That's a whole lot of benefit. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

09:58

On 11/07/17 05:46, David C. Rankin wrote:

...

Step forward 2 decades with quad-core processors running in the GHz range and software raid is so far down in the noise that it isn't a load or performance issue at all.

Ummm ... It's being fixed as I write - there are a few performance horror stories floating around on the list at the moment of single-threaded bottlenecks. Mostly involving humungous arrays being hammered, so unlikely to be a problem for a backup server ... :-) Cheers. Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

10 Jul 10 Jul

09:43

Bjoern Voigt wrote:

...

I plan to install openSUSE Leap 42.2 on a HPE ProLiant MicroServer Gen8.

I have too options to setup RAID 1 on for the disks: 1) Linux software RAID 2) Hardware RAID with the embedded HPE Dynamic Smart Array B120i Controller

I want to be prepared for a possible hardware error.

I am familiar with software RAID. Even if the mainboard breaks I can take one or both disks to another computer and recover the data with any Linux system.

But how is it with the hardware RAID (B120i controller here)? If the mainboard/CPU/controller/whatever breaks, can I use the same strategy (take the disks to another PC without hardware RAID, recover data) or do I need to buy another HPE ProLiant MicroServer Gen8 or compatible server and recover the disks there?

TMK, an array built with an HP Smart Array controller can be recovered on any other HP Smart Array controller, the same or newer. I would not expect to recover on anything else.

...

What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.)

Offloading to hardware is always about speed and less load on the CPU. -- Per Jessen, Zürich (21.3°C) http://www.hostsuisse.com/ - dedicated server rental in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

John Andersen

17:14

On 07/10/2017 02:43 AM, Per Jessen wrote:

...

Offloading to hardware is always about speed and less load on the CPU.

And not something you really have to be concerned with any more. When we were doing raid arrays on a 386, sure. But unless you are paying BIG bucks for a raid controller, software raid will outperform hardware raid. The fastest raid controller today will never see an upgrade. Within a year or two when the machine gets upgraded, nobody want's to spring for faster controllers. So they remain the same old controllers and everybody hopes they won't crap out because the array may not be recoverable. I'm not talking about the FAKE Raid controllers here. I'm talking about the "qquality" controllers with on-board processors and lots of cache memory. (Which you can never find replacement memory modules for after a few years). Ive ran such contollers jumpered to shutdown the on-board raid processing, and used them just for their multiple independent headers in software raid arrays. -- After all is said and done, more is said than done. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

19:21

John Andersen wrote:

...

On 07/10/2017 02:43 AM, Per Jessen wrote:

...
Offloading to hardware is always about speed and less load on the CPU.

And not something you really have to be concerned with any more. When we were doing raid arrays on a 386, sure.

The general principle still holds - not just for RAID, but also for network and iSCSI. My Proliant servers even have a space for a cache card for the hardware assisted 10GigE interfaces. iSCSI is often offloaded to interfaces with dedicated hardware or even dedicated cards.

...

But unless you are paying BIG bucks for a raid controller, software raid will outperform hardware raid.

There are situations where swraid won't suffice. When you need more than the usual 4-6 drives for instance, you'll need separate controllers anyway. I would still bet hardware assist is faster for e.g. RAID6, maybe also RAID5. -- Per Jessen, Zürich (21.9°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Rüdiger Meier

19:46

On 07/10/2017 09:21 PM, Per Jessen wrote:

...

John Andersen wrote:

...
On 07/10/2017 02:43 AM, Per Jessen wrote:

...
Offloading to hardware is always about speed and less load on the CPU.

And not something you really have to be concerned with any more. When we were doing raid arrays on a 386, sure.

The general principle still holds - not just for RAID, but also for network and iSCSI. My Proliant servers even have a space for a cache card for the hardware assisted 10GigE interfaces. iSCSI is often offloaded to interfaces with dedicated hardware or even dedicated cards.

...
But unless you are paying BIG bucks for a raid controller, software raid will outperform hardware raid.

There are situations where swraid won't suffice. When you need more than the usual 4-6 drives for instance, you'll need separate controllers anyway.

I would still bet hardware assist is faster for e.g. RAID6, maybe also RAID5.

I have never benchmarked any advantage for hardware raid. Actually I found Linux softraid always faster. I guess that it's the reliability which matters. HW Raid is usually - OS independent, even inclusive remote storage administration, monitoring etc. - certified for (only) certain hard drive models. So that one can be sure that the whole chain (battery backup-ed Cache, HD cache, etc. will survive a power failure without inconsistent file systems. This makes HW raid according to my experience slower and less flexible. If you want reliability you need to have a controller cache with battery/capacitor and turned off HD write cache. If you don't have a battery you have to turn off any HD write cache. In this case SW Raid is probably much slower than HW Raid with reliable cache. cu, Rudi -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

20:09

Rüdiger Meier wrote:

...

I have never benchmarked any advantage for hardware raid. Actually I found Linux softraid always faster. I guess that it's the reliability which matters.

Yes, reliability and serviceability are key. I guess I could do some actual benchmarking at some point, but performance is secondary.

...

HW Raid is usually - OS independent, even inclusive remote storage administration, monitoring etc. - certified for (only) certain hard drive models. So that one can be sure that the whole chain (battery backup-ed Cache, HD cache, etc. will survive a power failure without inconsistent file systems.

We have been using 3ware/LSI and HP Smart Array Controllers for more than 10 years, none of them have required certified drives sofar. One annoying issue is their insistence on a battery for the write cache. Our older HSG80 fibre controllers had an option "site-wide UPS". Newer HP conrollers do have that option too.

...

If you want reliability you need to have a controller cache with battery/capacitor

Unless you have site-wide UPS.

...

and turned off HD write cache. If you don't have a battery you have to turn off any HD write cache.

Right. -- Per Jessen, Zürich (18.6°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Rüdiger Meier

20:46

On 07/10/2017 10:09 PM, Per Jessen wrote:

...

Rüdiger Meier wrote:

...
I have never benchmarked any advantage for hardware raid. Actually I found Linux softraid always faster. I guess that it's the reliability which matters.

Yes, reliability and serviceability are key. I guess I could do some actual benchmarking at some point, but performance is secondary.

...
HW Raid is usually - OS independent, even inclusive remote storage administration, monitoring etc. - certified for (only) certain hard drive models. So that one can be sure that the whole chain (battery backup-ed Cache, HD cache, etc. will survive a power failure without inconsistent file systems.

We have been using 3ware/LSI and HP Smart Array Controllers for more than 10 years, none of them have required certified drives sofar.

I have had many incompatibilities between Adaptec Controllers and certain drives but never any problem with onboard controllers. Anyways with non-certified drives you can't be sure that everything works as the controller expects. There are drives where you can't disable write cache or which are just "always fast", telling the controller that everything was written although it's still cached. Other drives don't validate checksums or whatever. I've read an interesting article about these issues a few years ago. There was a client/server based test script producing heavy load. It was able to validate the written data after one pulled the plug and rebooted. Surprising results. My conclusion was that it makes no sense to waste money for expensive controllers without testing the whole system and drives. Now I'm using only SW Raid with onboard controllers or cheap HBAs, IMO faster and more flexible. cu, Rudi One annoying

...

issue is their insistence on a battery for the write cache. Our older HSG80 fibre controllers had an option "site-wide UPS". Newer HP conrollers do have that option too.

...
If you want reliability you need to have a controller cache with battery/capacitor

Unless you have site-wide UPS.

...
and turned off HD write cache. If you don't have a battery you have to turn off any HD write cache.

Right.

-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Andrei Borzenkov

11 Jul 11 Jul

03:20

10.07.2017 23:09, Per Jessen пишет:

...

...
If you want reliability you need to have a controller cache with battery/capacitor

Unless you have site-wide UPS.

it does not proect against sudden power failure in server (e.g. PSU failure). Or someone simply unplugs server power cable. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Per Jessen

05:23

Andrei Borzenkov wrote:

...

10.07.2017 23:09, Per Jessen пишет:

...
...
If you want reliability you need to have a controller cache with battery/capacitor

Unless you have site-wide UPS.

it does not proect against sudden power failure in server (e.g. PSU failure). Or someone simply unplugs server power cable.

Yep, that is true. All the power suppliues are redundant, and someone simply unplugging both power cables, yeah, I guess it can't be discounted :-) -- Per Jessen, Zürich (18.1°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Wols Lists

10:09

On 11/07/17 06:23, Per Jessen wrote:

...

Andrei Borzenkov wrote:

...
10.07.2017 23:09, Per Jessen пишет:

...
...
If you want reliability you need to have a controller cache with battery/capacitor

Unless you have site-wide UPS.

it does not proect against sudden power failure in server (e.g. PSU failure). Or someone simply unplugs server power cable.

Yep, that is true. All the power suppliues are redundant, and someone simply unplugging both power cables, yeah, I guess it can't be discounted :-)

It was slightly unplanned, yes, but when I put in a new server room, I put a ring of sockets with mains power round the room. When we ran out of sockets, the other admin guy put in a large UPS that drove a matching ring. So all our servers (ProLiants) had one PSU plugged into the mains ring, and the other into the UPS ring :-) (I spec'd a double socket every 18 inches - shows how fast you use up sockets!) Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

David C. Rankin

12 Jul 12 Jul

22:52

On 07/10/2017 12:14 PM, John Andersen wrote:

...

And not something you really have to be concerned with any more. When we were doing raid arrays on a 386, sure.

But unless you are paying BIG bucks for a raid controller, software raid will outperform hardware raid.

The irony is that even if you did want a real-world hardware RAID card (a few years old), you can pick of incredible deals for under $15 bucks on e-bay. (they are usually without the battery itself) I'm talking full-fledged 8-port SATA II/SAS controllers with 512M of onboard RAM that were the commercial darlings 5 years ago or so like the LSI Megaraid 8888ELP, etc.. Hell, they even come with graphical RAID setup tools in the onboard BIOS. You can't even buy cheap throwaway controllers cards for that. Now, you can get the SATA III/SAS versions in the $40-80 range. If you need RAID or simply controller expansions for a backup project and want some older enterprise level SATA (and importantly SAS) gear that is -- granted slightly older and slower than top-of-the-line gear today -- but bulletproof at a trivial price, they are worth checking for. I've used them without issue when I wanted to turn a pci-slot into 8 more drives... and I keep them handy in the event I have an onboard controller failure (which I inexplicably experienced a year ago on my office server) That said, I still prefer, and use, Linux software raid. Even though with a hardware card, you are not tied to a specific server hardware RAID format, it's just adds another layer of complexity that I've never needed the benefit of. Just something to keep in your hip pocket. Today you can probably find the high-end SATA III/SAS cards readily available. Overseas shops make a living out of pulling and reselling this type gear from retired servers, etc.. I was amazed to see what was available. -- David C. Rankin, J.D.,P.E. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

Istvan Gabor

11 Jul 11 Jul

12:42

On Mon, 10 Jul 2017 10:52:57 +0200, Bjoern Voigt wrote:

...

I plan to install openSUSE Leap 42.2 on a HPE ProLiant MicroServer Gen8.

I have too options to setup RAID 1 on for the disks: 1) Linux software RAID 2) Hardware RAID with the embedded HPE Dynamic Smart Array B120i Controller

I want to be prepared for a possible hardware error.

I am familiar with software RAID. Even if the mainboard breaks I can take one or both disks to another computer and recover the data with any Linux system.

But how is it with the hardware RAID (B120i controller here)? If the mainboard/CPU/controller/whatever breaks, can I use the same strategy (take the disks to another PC without hardware RAID, recover data) or do I need to buy another HPE ProLiant MicroServer Gen8 or compatible server and recover the disks there?

What is the main advantage of hardware RAID over software RAID on Linux? (I think performance and that a hardware RAID can automatically boot from the second disk, but I am a newbie in hardware RAID and so not sure.)

Are you sure that this raid controller is supported by Leap? Some HP raid controllers require proprietary drivers in binary only format offered only for some linux OSs. These are not "real" hardware but MOBO based raid controllers. Istvan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org

2714

Age (days ago)

2716

Last active (days ago)

List overview

Download

42 comments

14 participants

participants (14)

Andrei Borzenkov
Bjoern Voigt
Carlos E. R.
Dave Howorth
David C. Rankin
Greg Freemyer
Istvan Gabor
James Knott
John Andersen
Lew Wolfgang
Per Jessen
Peter Sikorski GTL
Rüdiger Meier
Wols Lists