Hi Folks, I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels. The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes. I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series. I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process. Any ideas? It seems to be a common requirement, maybe someone's done it already? Regards, Lew
On 2022-10-08 01:01, Lew Wolfgang wrote:
Hi Folks,
I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.
The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.
I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
I'm not familiar with JBOD. I would output as many whole files (each 7 GB) as can fit in a single hard disk, leaving a chunk of free space of less than 7 GB "wasted" at the end, then jump to the next disk. I understand this is archival, so independent disks and whole files have more chances of survival. And, if the data can be compressed, I would use compressed BTRFS partitions, which is transparent. If you do not want to waste those last 7 GB or less, then I would use perhaps dd to split the last file to size, and script it. But then you need some sort of catalog and scripting to extract the whole. But I would not use tar, I think. Maybe reinventing the wheel, though. (because they are all 7GB sized files) -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
On 10/8/22 04:33, Carlos E. R. wrote:
On 2022-10-08 01:01, Lew Wolfgang wrote:
Hi Folks,
I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.
The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.
I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
I'm not familiar with JBOD.
JBOD is "Just a Bunch of Disks", which in this case means the destination disks are just independent single disk drives with their own filesystems.
I would output as many whole files (each 7 GB) as can fit in a single hard disk, leaving a chunk of free space of less than 7 GB "wasted" at the end, then jump to the next disk. I understand this is archival, so independent disks and whole files have more chances of survival.
Yes, that's what I was thinking too.
And, if the data can be compressed, I would use compressed BTRFS partitions, which is transparent.
The data are binary, and encrypted, so compression would be a waste in this case.
If you do not want to waste those last 7 GB or less, then I would use perhaps dd to split the last file to size, and script it. But then you need some sort of catalog and scripting to extract the whole.
That gets complicated!
But I would not use tar, I think. Maybe reinventing the wheel, though.
(because they are all 7GB sized files)
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. Regards, Lew
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf. -- Per Jessen, Zürich (14.4°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland.
On 2022-10-08 19:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried. With backups, it would write alternately to disk A or B. When disk A was full, it switched automatically to B and told you to change disk A. And the reverse. You never had to press any key to continue, it would find out without wasting a second. If the machine was single floppy drive, at the end of each floppy it simply asked to replace the disk, which you did, not pressing any key, and the motor not stopping. It was amazing how fast it would go: faster that than I could label the disks. On restore, it read floppies faster than the hard disk could write the files, specially if small (involved more head movements to data, FAT and directory tables). AND, it would both compress the data and and add forward error recovery data, so that the operation could survive bad sectors without losing a file. PCtools backup. Later Microsoft bought a license to include it with MSdos 6 (not the fully featured program, though). It kept a catalogue of what was backed up and where on the hard disk. This catalogue could be retrieved as well from the last floppy (or more). I wish I could get a similar thing on Linux. Not for floppies, of course. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
On Sat, 8 Oct 2022 20:21:37 +0200 Carlos E. R. wrote:
On 2022-10-08 19:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.
[snipped]
I wish I could get a similar thing on Linux. Not for floppies, of course.
I have a backup script written in python. Rsync does the initial backup (using rsync filters) to a btrfs formatted drive, then the backup is snapshotted, giving me incremental backups as far back as I want. It's worked well for years. -- Bob Williams No HTML please. Plain text preferred. https://useplaintext.email/
On Sat, 8 Oct 2022 20:21:37 +0200 "Carlos E. R." <robin.listas@telefonica.net> wrote:
On 2022-10-08 19:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.
With backups, it would write alternately to disk A or B. When disk A was full, it switched automatically to B and told you to change disk A. And the reverse. You never had to press any key to continue, it would find out without wasting a second.
[snip] A long, long time ago there was an operating system called GEORGE 3. It did all of this and managed multiple copies on multiple tapes with no trouble. Many times I've viewed software developments as reinventing parts of G3.
Carlos E. R. wrote:
On 2022-10-08 19:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.
Years ago, in the very beginning of my career, we could copy anything to anything - disk, tape, floppy, various other exotic types of storage. They were all treated the same.
I wish I could get a similar thing on Linux. Not for floppies, of course.
You _can_ and other people _have_ for decades. In a nutshell, "tar with mtst". Downstairs I have four tape libraries - two HP Storageworks and two Storage Technology. The HP Storageworks libraries take 40 and 200 volumes each, the STK libraries take 800 and 1600 volumes each. The HP libs each have 2 and 4 drives respectively, the STK libs have 4 and 16 drives. HP Storageworks are cheap these days, you can have either of mine for the price of transport. -- Per Jessen, Zürich (14.4°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland.
On 2022-10-08 21:38, Per Jessen wrote:
Carlos E. R. wrote:
On 2022-10-08 19:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.
In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.
Years ago, in the very beginning of my career, we could copy anything to anything - disk, tape, floppy, various other exotic types of storage. They were all treated the same.
I wish I could get a similar thing on Linux. Not for floppies, of course.
You _can_ and other people _have_ for decades. In a nutshell, "tar with mtst".
What is mtst?
Downstairs I have four tape libraries - two HP Storageworks and two Storage Technology. The HP Storageworks libraries take 40 and 200 volumes each, the STK libraries take 800 and 1600 volumes each. The HP libs each have 2 and 4 drives respectively, the STK libs have 4 and 16 drives.
HP Storageworks are cheap these days, you can have either of mine for the price of transport.
-- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
Carlos E. R. wrote:
What is mtst?
Sorry, it is mt-st. Magnetic Tape Control tools for Linux SCSI tapes. Google is your friend. -- Per Jessen, Zürich (14.5°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland.
On 10/8/22 10:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
I used to use Exabyte and DLT tape libraries for backups. They were great! I used the classic "dump" program with a Tower of Hanoi dump-level plan. It saved my bacon many times. Indeed, I was perplexed when SuSE went to Reiserfs by default. Dump doesn't support Reiserfs. I remember complaining about it on a Usenet list and received a rather curt reply from none other than Hans Reiser himself telling me to just use tar instead. This was obviously before he murdered his wife and ended up in prison. I wonder what he's up to these days? Regards, Lew
Lew Wolfgang wrote:
On 10/8/22 10:05, Per Jessen wrote:
Lew Wolfgang wrote:
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.
I used to use Exabyte and DLT tape libraries for backups. They were
Professionally I grew up with IBM 6250, 3480 and 3490 tape drives. I jumpstarted my career by analysing their use and how to optimize it. Later on I wrote the software for those.
Indeed, I was perplexed when SuSE went to Reiserfs by default.
Defaults are just that, meant to be changed. I've never used reiserfs. -- Per Jessen, Zürich (14.4°C)
Lew Wolfgang wrote:
Hi Folks,
I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.
The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.
I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
Pseudo-code: while more-to-do do prompt to mount new disk while rsync raid6:file disk:file do file=file+1 done prompt to unmount current disk done -- Per Jessen, Zürich (16.2°C) Слава Україні! Slava Ukraini!
On 2022-10-07 18:01:49 Lew Wolfgang wrote:
|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | |Regards, |Lew
dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media. Leslie -- Platform: Linux Distribution: openSUSE Leap 15.4 x86_64
On 10/9/22 21:50, J Leslie Turriff wrote:
On 2022-10-07 18:01:49 Lew Wolfgang wrote:
|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | | dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media.
Leslie
Thanks for the suggestion, Leslie! I've already got someone working on our own script, but we'll take a look at dar. Regards, Lew
* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 02:51]:
On 10/9/22 21:50, J Leslie Turriff wrote:
On 2022-10-07 18:01:49 Lew Wolfgang wrote:
|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | | dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media.
Leslie
Thanks for the suggestion, Leslie! I've already got someone working on our own script, but we'll take a look at dar.
there is even a gui which makes complicated dar much easier to understand. cravat, you must compile it yourself. and dar is excellent and supported by the author. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
On 10/7/22 18:01, Lew Wolfgang wrote:
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
Well... That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish. If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint. There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written. There are no limits listed in the man page (man 1 split), but I suspect a 64-bit size is likely the upper-limit (more than enough for your needs). I haven't tried on large multi-gig files, but no reason it would work any differently than on smaller files. -- David C. Rankin, J.D.,P.E.
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
Well...
That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish.
We don't want to split a file between disks. The files are all 7-GB binary, and the disks are 8-TB. That should give us about 1,100 files on a disk, we're not going to worry about the wasted space.
If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint.
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time. Thanks for the suggestions! Regards, Lew
* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.
Any ideas? It seems to be a common requirement, maybe someone's done it already?
Well...
That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish.
We don't want to split a file between disks. The files are all 7-GB binary, and the disks are 8-TB. That should give us about 1,100 files on a disk, we're not going to worry about the wasted space.
If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint.
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery. AFAIU He wants the files saved as the same files, complete, not split, not changed. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed.
DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed.
DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.
but "dar" is necessary to access the files. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
On 10/10/22 15:12, Patrick Shanahan wrote:
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote: ...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written. Yes, split is great and would work, but we're not going to need it this time. and once more, dar will accomplish all that for you very quickly, especially using the gui AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed. DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.
but "dar" is necessary to access the files.
Right, we wouldn't want that. The files have to be accessible by any host with a SATA interface, including possibly Windows, without having to install dependencies for this requirement. We've started writing scripts to handle our rather unique requirement. Looks like we'll have one master script that will take the file inventory from the four separate RAID directories and figure out how to parse the files to the destination disks, then pass the file list to four secondary scripts to do the actual simultaneous transfers, prompting the user to swap disks when ready. The transfer scripts, since they know how many whole files will fit on the disk, will stop before the disk-full condition. The scripts will mount/unmount the disks as appropriate to make the process as foolproof as possible. It should be interesting, I'll let you know how it works out. Regards, Lew
* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 20:01]:
On 10/10/22 15:12, Patrick Shanahan wrote:
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote: > On 10/7/22 18:01, Lew Wolfgang wrote: ...
> There is a coreutils utility called 'split' that allows you to slice > large files into smaller files by lines or bytes and it also has a > --filter=command option to filter through a shell script which would > give you the ability to umount/mount new JBOD devices between parts of > the original being written. Yes, split is great and would work, but we're not going to need it this time. and once more, dar will accomplish all that for you very quickly, especially using the gui AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed. DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.
but "dar" is necessary to access the files.
Right, we wouldn't want that. The files have to be accessible by any host with a SATA interface, including possibly Windows, without having to install dependencies for this requirement.
according to wikipedia, dar is available for windows. https://en.wikipedia.org/wiki/Dar_(disk_archiver) and a browser/extractor plugin for mc and the gui leaves you with a copy of the script used to archive. may give you pointers on writing your own scrip.
We've started writing scripts to handle our rather unique requirement. Looks like we'll have one master script that will take the file inventory from the four separate RAID directories and figure out how to parse the files to the destination disks, then pass the file list to four secondary scripts to do the actual simultaneous transfers, prompting the user to swap disks when ready. The transfer scripts, since they know how many whole files will fit on the disk, will stop before the disk-full condition. The scripts will mount/unmount the disks as appropriate to make the process as foolproof as possible.
It should be interesting, I'll let you know how it works out.
tks, -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc
Lew, et al -- ...and then Lew Wolfgang said... % ... % We've started writing scripts to handle our rather unique requirement. ... % % It should be interesting, I'll let you know how it works out. Hope you can share the finished version! Despite all of the lovely mergefs discussion, I can see this and especially the copy forking as very handy. % % Regards, % Lew TIA & HAND :-D -- David T-G See http://justpickone.org/davidtg/email/ See http://justpickone.org/davidtg/tofu.txt
On Mon, 10 Oct 2022 18:12:14 -0400, Patrick Shanahan wrote:
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed.
DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.
but "dar" is necessary to access the files.
Mergerfs [1], a union, fuse filesystem, provides some of the needed features. It merges an assortment of filesystems (your archive disks) that can be mounted and unmounted "at will". When a filesystem is not part of the union, it can be used normally, without running mergerfs. You would read and write whole files, not pieces. It does not prompt when a filesystem fills up, but it can be configured to switch to writing to the next in priority filesystem when the free space is too small (your 7GB max filesize) by setting the 'minfreespace' option. So, for your process of archiving to a pile of disks, if you somehow detect when writing has transitioned to another disk, you could unmount the first one, prompt for the next, join it to the union, and then wait for another disk to get filled up by your writing process. This depends on configuring a write policy for mergerfs that, instead of trying to even out the percent full-ness of the component disks, gives priority to a disk that already has more on it. [1] https://github.com/trapexit/mergerfs -- Robert Webb
On 10/10/22 19:39, Robert Webb wrote:
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote: > On 10/7/22 18:01, Lew Wolfgang wrote: ...
> There is a coreutils utility called 'split' that allows you to slice > large files into smaller files by lines or bytes and it also has a > --filter=command option to filter through a shell script which would > give you the ability to umount/mount new JBOD devices between parts of > the original being written. Yes, split is great and would work, but we're not going to need it this time. and once more, dar will accomplish all that for you very quickly, especially using the gui AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed. DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so. but "dar" is necessary to access the files. Mergerfs [1], a union, fuse filesystem, provides some of the needed features. It merges an assortment of filesystems (your archive disks)
On Mon, 10 Oct 2022 18:12:14 -0400, Patrick Shanahan wrote: that can be mounted and unmounted "at will". When a filesystem is not part of the union, it can be used normally, without running mergerfs. You would read and write whole files, not pieces.
It does not prompt when a filesystem fills up, but it can be configured to switch to writing to the next in priority filesystem when the free space is too small (your 7GB max filesize) by setting the 'minfreespace' option.
So, for your process of archiving to a pile of disks, if you somehow detect when writing has transitioned to another disk, you could unmount the first one, prompt for the next, join it to the union, and then wait for another disk to get filled up by your writing process. This depends on configuring a write policy for mergerfs that, instead of trying to even out the percent full-ness of the component disks, gives priority to a disk that already has more on it.
[1] https://github.com/trapexit/mergerfs -- Robert Webb
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once. Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us. Regards, Lew
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
[...] [1] https://github.com/trapexit/mergerfs
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once.
Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us.
Yes, according to the README.md on the linked page (That's all the info I have. I have never used it), all the disks/filesystems can be used independently, without mergerfs. If one of the disks fails, it only affects the files on that one drive. There is no extra error correction. The filesystems aren't even required to be the same: "Works with heterogeneous filesystem types". When you do use multiple drives together under mergerfs, its ability to use a changed configuration at runtime might be useful to control which drive has priority to get written to. -- Robert Webb
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
[...] [1] https://github.com/trapexit/mergerfs
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once.
Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us.
Yes, according to the README.md on the linked page (That's all the info I have. I have never used it), all the disks/filesystems can be used independently, without mergerfs. If one of the disks fails, it only affects the files on that one drive. There is no extra error correction. The filesystems aren't even required to be the same: "Works with heterogeneous filesystem types".
When you do use multiple drives together under mergerfs, its ability to use a changed configuration at runtime might be useful to control which drive has priority to get written to.
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split. The set of destination disks could be connected simultaneously if done using USB3. The advantage would be not needing to be there to change the disks; just issue a single command and have the copy process run for hours or days, undisturbed. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
On 10/11/22 10:50, Carlos E. R. wrote:
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
[...] [1] https://github.com/trapexit/mergerfs
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once.
Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us.
Yes, according to the README.md on the linked page (That's all the info I have. I have never used it), all the disks/filesystems can be used independently, without mergerfs. If one of the disks fails, it only affects the files on that one drive. There is no extra error correction. The filesystems aren't even required to be the same: "Works with heterogeneous filesystem types".
When you do use multiple drives together under mergerfs, its ability to use a changed configuration at runtime might be useful to control which drive has priority to get written to.
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split.
That would be the goal.
The set of destination disks could be connected simultaneously if done using USB3.
We can connect five SATA disks simultaneously in a 2U chassis. The other 19 disks are part of a RAID6 array.
The advantage would be not needing to be there to change the disks; just issue a single command and have the copy process run for hours or days, undisturbed.
But even the five disks might fill up, so changing will probably be required. Regards, Lew
On 2022-10-11 19:57, Lew Wolfgang wrote:
On 10/11/22 10:50, Carlos E. R. wrote:
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
[...] [1] https://github.com/trapexit/mergerfs
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once.
Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us.
Yes, according to the README.md on the linked page (That's all the info I have. I have never used it), all the disks/filesystems can be used independently, without mergerfs. If one of the disks fails, it only affects the files on that one drive. There is no extra error correction. The filesystems aren't even required to be the same: "Works with heterogeneous filesystem types".
When you do use multiple drives together under mergerfs, its ability to use a changed configuration at runtime might be useful to control which drive has priority to get written to.
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split.
That would be the goal.
The set of destination disks could be connected simultaneously if done using USB3.
We can connect five SATA disks simultaneously in a 2U chassis. The other 19 disks are part of a RAID6 array.
The advantage would be not needing to be there to change the disks; just issue a single command and have the copy process run for hours or days, undisturbed.
But even the five disks might fill up, so changing will probably be required.
Just connect more. On USB there is no practical limit. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
On 10/11/22 11:02, Carlos E. R. wrote:
On 2022-10-11 19:57, Lew Wolfgang wrote:
On 10/11/22 10:50, Carlos E. R. wrote:
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
[...] [1] https://github.com/trapexit/mergerfs
Thanks for the pointer, Robert! I've never used anything like this, but I did dabble with zfs once.
Does mergerfs allow each of the drives in a pool to be mounted individually on another host? That's one of our requirements, that the drives not be aggregated or raided together. Our requirement isn't for archiving, but for data dissemination to possibly non-Linux systems not administrated by us.
Yes, according to the README.md on the linked page (That's all the info I have. I have never used it), all the disks/filesystems can be used independently, without mergerfs. If one of the disks fails, it only affects the files on that one drive. There is no extra error correction. The filesystems aren't even required to be the same: "Works with heterogeneous filesystem types".
When you do use multiple drives together under mergerfs, its ability to use a changed configuration at runtime might be useful to control which drive has priority to get written to.
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split.
That would be the goal.
The set of destination disks could be connected simultaneously if done using USB3.
We can connect five SATA disks simultaneously in a 2U chassis. The other 19 disks are part of a RAID6 array.
The advantage would be not needing to be there to change the disks; just issue a single command and have the copy process run for hours or days, undisturbed.
But even the five disks might fill up, so changing will probably be required.
Just connect more. On USB there is no practical limit.
There are some physical constraints I can't get into here. Regards, Lew
On Tue, 11 Oct 2022 19:50:20 +0200, "Carlos E. R." <robin.listas@telefonica.net> wrote:
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <wolfgang@sweet-haven.com> wrote:
On 10/10/22 19:39, Robert Webb wrote:
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split.
Yes, the main purpose of mergerfs is to have a whole set of disks active and to appear as one filesystem hierarchy. Each of the files exists whole, as one file in the native filesystem of an individual disk. The copy command would think everything is normal, usually, while mergerfs would be deciding to which disk each particular file gets written, according to a configurable policy. Things can get complicated for general filesystem use or if you just collect a bunch of independently written disks into a union set. For instance, multiple disks could have a file at the same path, and then mergerfs has to choose the one to present, according to some policy. But, the use case of copying a hierarchy onto a set of disks means that there is no path duplication (of files. The same directory may appear on multiple disks), and after a disk fills up, you are done with those files (because you know that those paths will not be written to, or read, again) and the disk can be unmounted. So you can run the process with just one or two disks mounted at a time, or with the whole set. But, you may not be able to change the set of disks in the union during a single copy command, or maybe you can. IDK. As far as I can tell, mergerfs does not have an idea of a complete set for the union filesystem, other than the sum of whatever disks are included at any moment.
The set of destination disks could be connected simultaneously if done using USB3.
The advantage would be not needing to be there to change the disks; just issue a single command and have the copy process run for hours or days, undisturbed. -- Robert Webb
On 2022-10-12 01:12, Robert Webb wrote:
On Tue, 11 Oct 2022 19:50:20 +0200, "Carlos E. R." <> wrote:
On 2022-10-11 08:54, Robert Webb wrote:
On Mon, 10 Oct 2022 21:16:23 -0700, Lew Wolfgang <> wrote:
On 10/10/22 19:39, Robert Webb wrote:
Can you connect all the destination disks at once, and issue a single copy command to write all the files? And the files not being split.
Yes, the main purpose of mergerfs is to have a whole set of disks active and to appear as one filesystem hierarchy. Each of the files exists whole, as one file in the native filesystem of an individual disk. The copy command would think everything is normal, usually, while mergerfs would be deciding to which disk each particular file gets written, according to a configurable policy.
Things can get complicated for general filesystem use or if you just collect a bunch of independently written disks into a union set. For instance, multiple disks could have a file at the same path, and then mergerfs has to choose the one to present, according to some policy.
But, the use case of copying a hierarchy onto a set of disks means that there is no path duplication (of files. The same directory may appear on multiple disks), and after a disk fills up, you are done with those files (because you know that those paths will not be written to, or read, again) and the disk can be unmounted. So you can run the process with just one or two disks mounted at a time, or with the whole set. But, you may not be able to change the set of disks in the union during a single copy command, or maybe you can. IDK.
As far as I can tell, mergerfs does not have an idea of a complete set for the union filesystem, other than the sum of whatever disks are included at any moment.
This seems to be a technology I was seeking for, for ages. With technologies like raid0, one disk dies, the whole lot dies. Here only the files on one disk die. I like the idea. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
On 2022-10-12 01:12, Robert Webb wrote:
This seems to be a technology I was seeking for, for ages.
With technologies like raid0, one disk dies, the whole lot dies. Here only the files on one disk die. I like the idea.
Yes, agreed. I've been using it for around 6 months now, and it works amazingly well. I can take a bunch of disks, mergerfs them together, and then NFS export the result. -Nick
On 10/10/22 14:22, Carlos E. R. wrote:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed.
Yes, that's true. dar looks interesting for other applications though. We've been using rdiff-backup for many years, but we ran into a problem with Leap 15.4. The rdiff-backup version in 15.4 is incompatible with older Leap versions due to python dependencies. 15.4 rdiff-backup can't make network backups to/from older versions of leap. That means that we have to do something else for the various hosts we upgrade to 15.4 until we upgrade our central server. I've been using rsync temporarily. Rhetorical question: Would ALP help here? Regards, Lew
On 2022-10-11 00:59, Lew Wolfgang wrote:
On 10/10/22 14:22, Carlos E. R. wrote:
On 2022-10-10 22:43, Patrick Shanahan wrote:
* Lew Wolfgang <> [10-10-22 15:30]:
On 10/10/22 12:07, David C. Rankin wrote:
On 10/7/22 18:01, Lew Wolfgang wrote:
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.
Yes, split is great and would work, but we're not going to need it this time.
and once more, dar will accomplish all that for you very quickly, especially using the gui
AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.
AFAIU He wants the files saved as the same files, complete, not split, not changed.
Yes, that's true. dar looks interesting for other applications though. We've been using rdiff-backup for many years, but we ran into a problem with Leap 15.4. The rdiff-backup version in 15.4 is incompatible with older Leap versions due to python dependencies. 15.4 rdiff-backup can't make network backups to/from older versions of leap. That means that we have to do something else for the various hosts we upgrade to 15.4 until we upgrade our central server. I've been using rsync temporarily.
Rhetorical question: Would ALP help here?
Unrelated. Irrelevant. :-) Oh, you could use the unused space for a catalogue. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)
participants (11)
-
Bob Williams
-
Carlos E. R.
-
Dave Howorth
-
David C. Rankin
-
David T-G
-
J Leslie Turriff
-
Lew Wolfgang
-
Nick LeRoy
-
Patrick Shanahan
-
Per Jessen
-
Robert Webb