[opensuse] Large Data Backups?
Hi Folks, BLUF: How to backup large data stores? Say I've got 300-TB of on-line data, most if it is fairly static, while some of it changes frequently on a daily basis. I do daily incremental backups on the dynamic areas to disk filesystems on separate computers, with a monthly full disk image being stored off-line at an off-site location. But I'm worried about the bulk of the data that isn't being backed up. Most of the data consists of lots of binary files stored on multiple hardware RAID-6 arrays. The arrays have hot-spare disks, and I've got spares on the shelf. But as the graybeards know, reliable RAID isn't backup! One technology being considered is LTO tape. LTO-8 is due out any day now that claims to store 12-TB native. A drive or two with a tape library would possibly fill the requirement. Does anyone have any thoughts/advice? What do clouds like Google and AWS do for backups? Thanks, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Lew Wolfgang wrote:
One technology being considered is LTO tape. LTO-8 is due out any day now that claims to store 12-TB native. A drive or two with a tape library would possibly fill the requirement.
Even one or two generations back-level and it would still be manageable.
Does anyone have any thoughts/advice? What do clouds like Google and AWS do for backups?
I don't they do. I think they keep such high levels of redundancy that it eliminates the need for backup. -- Per Jessen, Zürich (5.4°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, Oct 30, 2017 at 3:35 PM, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
Hi Folks,
BLUF: How to backup large data stores?
Say I've got 300-TB of on-line data, most if it is fairly static, while some of it changes frequently on a daily basis. I do daily incremental backups on the dynamic areas to disk filesystems on separate computers, with a monthly full disk image being stored off-line at an off-site location. But I'm worried about the bulk of the data that isn't being backed up.
Most of the data consists of lots of binary files stored on multiple hardware RAID-6 arrays. The arrays have hot-spare disks, and I've got spares on the shelf. But as the graybeards know, reliable RAID isn't backup!
One technology being considered is LTO tape. LTO-8 is due out any day now that claims to store 12-TB native. A drive or two with a tape library would possibly fill the requirement.
Are you familiar with LTFS which LTO has supported since 2010 or so: https://en.wikipedia.org/wiki/Linear_Tape_File_System#Nature You use your library to make a full copy of the filesystem, then as files change, you write them to the tape via LTFS. Unfortunately, that's all I know about LTFS. Questions you need to answer: - Does openSUSE (or your OS of choice) have support - Does it provide point in time recovery. Or are you stuck with only the latest version of files. - What happens when a file in primary storage is deleted?
Does anyone have any thoughts/advice? What do clouds like Google and AWS do for backups?
I don't think they do. They are highly redundant. Google does resiliency testing by turning off a a rack of computers at a time to see if any data is lost. That is an intentional level of testing, or so I understand. They do unintentional testing at the DC level. What happens when a full DC has an outage? I can't recall how often that happens, but I think it does happen from time to time. Greg -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Greg Freemyer wrote:
I don't think they do. They are highly redundant. Google does resiliency testing by turning off a a rack of computers at a time to see if any data is lost. That is an intentional level of testing, or so I understand.
They do unintentional testing at the DC level. What happens when a full DC has an outage? I can't recall how often that happens, but I think it does happen from time to time.
This reminds me of Netflix using a test suite to turn off random servers. This allowed them to verify that the infrastructure/staff would respond appropriately. I found the software again, it is appropriately named Chaos Monkey, and was provided on github. The software can be found here https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey
On 10/30/2017 01:30 PM, Greg Freemyer wrote:
Are you familiar with LTFS which LTO has supported since 2010 or so:
https://en.wikipedia.org/wiki/Linear_Tape_File_System#Nature
Thanks for the pointer, Greg. I don't recall hearing about LTFS
You use your library to make a full copy of the filesystem, then as files change, you write them to the tape via LTFS.
Unfortunately, that's all I know about LTFS. Questions you need to answer:
- Does openSUSE (or your OS of choice) have support
Looks like its available on SLES.
- Does it provide point in time recovery. Or are you stuck with only the latest version of files.
I think you're stuck with the latest versions.
- What happens when a file in primary storage is deleted?
The blocks are marked "unavailable" and the space not reclaimed. I've been using rdiff-backup for years. It works well for on-line backups. It maintains a complete current image of the source filesystem, and calculates past point-restores as diffs. I run it on a daily basis and couldn't do without it now. I can restore with one day granularity going back for as many days as I've allowed. I usually trim it at 30-days. but it could archive for years. But I don't know how to use it with tape. Regards, Lew -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 30/10/17 19:35, Lew Wolfgang wrote:
Hi Folks,
BLUF: How to backup large data stores?
Say I've got 300-TB of on-line data, most if it is fairly static, while some of it changes frequently on a daily basis. I do daily incremental backups on the dynamic areas to disk filesystems on separate computers, with a monthly full disk image being stored off-line at an off-site location. But I'm worried about the bulk of the data that isn't being backed up.
Most of the data consists of lots of binary files stored on multiple hardware RAID-6 arrays. The arrays have hot-spare disks, and I've got spares on the shelf. But as the graybeards know, reliable RAID isn't backup!
One technology being considered is LTO tape. LTO-8 is due out any day now that claims to store 12-TB native. A drive or two with a tape library would possibly fill the requirement.
Does anyone have any thoughts/advice? What do clouds like Google and AWS do for backups?
Just done a quick look on Amazon ... A backup array you're looking at £3000 for the disks alone ... OUCH. I looked at 10TB Seagate Ironwolf raid drives, but Barracudas are even more expensive! and probably naffer drives. That said, it looks like you're looking at £1000 for a single set of LTO-8 tapes. What's your worlflow? Two 16-drive 10TB/drive arrays will give you near enough your 300TB. What filesystem are you using - btrfs? Is the all the actively changing stuff, stuff that's being worked on workstations? This might well not suit your situation, but I'd be inclined to store the stuff being worked on, on the workstations. Maybe raid-5, maybe no raid. Every night, they do a btrfs push to the big raid arrays (or, if they're not running btrfs, an in-place rsync). Every morning, the raid backups do a snapshot. That gives you a daily backup on-site. That might free up drives for an off-site raid setup, to which you can replicate over the internet. And then you can dump your raid arrays to tape as and when. Depends what level of backup and security you want, but for the price of three tape backups, you can have an off-site raid-array backup ... If you get a tape array as someone has suggested, I'm not sure how that will integrate with btrfs, but you could have a small offsite array that takes the btrfs pushes, and then stores it all on tape. Cheers, Wol -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (5)
-
Greg Freemyer
-
jgordon
-
Lew Wolfgang
-
Per Jessen
-
Wol's lists