How would you do this?

older
Re: Adaptable Linux Platform v0.01...

Lew Wolfgang

7 Oct 2022 7 Oct '22

23:01

Hi Folks, I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels. The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes. I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series. I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process. Any ideas? It seems to be a common requirement, maybe someone's done it already? Regards, Lew

Show replies by date

Carlos E. R.

8 Oct 8 Oct

11:33

On 2022-10-08 01:01, Lew Wolfgang wrote:

...

Hi Folks,

I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.

The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.

I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.

I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

I'm not familiar with JBOD. I would output as many whole files (each 7 GB) as can fit in a single hard disk, leaving a chunk of free space of less than 7 GB "wasted" at the end, then jump to the next disk. I understand this is archival, so independent disks and whole files have more chances of survival. And, if the data can be compressed, I would use compressed BTRFS partitions, which is transparent. If you do not want to waste those last 7 GB or less, then I would use perhaps dd to split the last file to size, and script it. But then you need some sort of catalog and scripting to extract the whole. But I would not use tar, I think. Maybe reinventing the wheel, though. (because they are all 7GB sized files) -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)

Lew Wolfgang

16:33

On 10/8/22 04:33, Carlos E. R. wrote:

...

On 2022-10-08 01:01, Lew Wolfgang wrote:

...
Hi Folks,

I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.

The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.

I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.

I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

I'm not familiar with JBOD.

JBOD is "Just a Bunch of Disks", which in this case means the destination disks are just independent single disk drives with their own filesystems.

...

I would output as many whole files (each 7 GB) as can fit in a single hard disk, leaving a chunk of free space of less than 7 GB "wasted" at the end, then jump to the next disk. I understand this is archival, so independent disks and whole files have more chances of survival.

Yes, that's what I was thinking too.

...

And, if the data can be compressed, I would use compressed BTRFS partitions, which is transparent.

The data are binary, and encrypted, so compression would be a waste in this case.

...

If you do not want to waste those last 7 GB or less, then I would use perhaps dd to split the last file to size, and script it. But then you need some sort of catalog and scripting to extract the whole.

That gets complicated!

...

But I would not use tar, I think. Maybe reinventing the wheel, though.

(because they are all 7GB sized files)

Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. Regards, Lew

Per Jessen

17:05

Lew Wolfgang wrote:

...

Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf. -- Per Jessen, Zürich (14.4°C) http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland.

Carlos E. R.

18:21

On 2022-10-08 19:05, Per Jessen wrote:

...

Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried. With backups, it would write alternately to disk A or B. When disk A was full, it switched automatically to B and told you to change disk A. And the reverse. You never had to press any key to continue, it would find out without wasting a second. If the machine was single floppy drive, at the end of each floppy it simply asked to replace the disk, which you did, not pressing any key, and the motor not stopping. It was amazing how fast it would go: faster that than I could label the disks. On restore, it read floppies faster than the hard disk could write the files, specially if small (involved more head movements to data, FAT and directory tables). AND, it would both compress the data and and add forward error recovery data, so that the operation could survive bad sectors without losing a file. PCtools backup. Later Microsoft bought a license to include it with MSdos 6 (not the fully featured program, though). It kept a catalogue of what was backed up and where on the hard disk. This catalogue could be retrieved as well from the last floppy (or more). I wish I could get a similar thing on Linux. Not for floppies, of course. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)

Bob Williams

18:44

On Sat, 8 Oct 2022 20:21:37 +0200 Carlos E. R. wrote:

...

On 2022-10-08 19:05, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.

[snipped]

...

I wish I could get a similar thing on Linux. Not for floppies, of course.

I have a backup script written in python. Rsync does the initial backup (using rsync filters) to a btrfs formatted drive, then the backup is snapshotted, giving me incremental backups as far back as I want. It's worked well for years. -- Bob Williams No HTML please. Plain text preferred. https://useplaintext.email/

Dave Howorth

19:32

On Sat, 8 Oct 2022 20:21:37 +0200 "Carlos E. R." <robin.listas@telefonica.net> wrote:

...

On 2022-10-08 19:05, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.

With backups, it would write alternately to disk A or B. When disk A was full, it switched automatically to B and told you to change disk A. And the reverse. You never had to press any key to continue, it would find out without wasting a second.

[snip] A long, long time ago there was an operating system called GEORGE 3. It did all of this and managed multiple copies on multiple tapes with no trouble. Many times I've viewed software developments as reinventing parts of G3.

Per Jessen

19:38

Carlos E. R. wrote:

...

On 2022-10-08 19:05, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.

Years ago, in the very beginning of my career, we could copy anything to anything - disk, tape, floppy, various other exotic types of storage. They were all treated the same.

...

I wish I could get a similar thing on Linux. Not for floppies, of course.

You _can_ and other people _have_ for decades. In a nutshell, "tar with mtst". Downstairs I have four tape libraries - two HP Storageworks and two Storage Technology. The HP Storageworks libraries take 40 and 200 volumes each, the STK libraries take 800 and 1600 volumes each. The HP libs each have 2 and 4 drives respectively, the STK libs have 4 and 16 drives. HP Storageworks are cheap these days, you can have either of mine for the price of transport. -- Per Jessen, Zürich (14.4°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland.

Carlos E. R.

19:57

On 2022-10-08 21:38, Per Jessen wrote:

...

Carlos E. R. wrote:

...
On 2022-10-08 19:05, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here.

In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

Ages ago, when I used MsDOS, I had a wonderful backup program. I used it with floppies, but it would work with tapes too, although I never tried.

Years ago, in the very beginning of my career, we could copy anything to anything - disk, tape, floppy, various other exotic types of storage. They were all treated the same.

...
I wish I could get a similar thing on Linux. Not for floppies, of course.

You _can_ and other people _have_ for decades. In a nutshell, "tar with mtst".

What is mtst?

...

Downstairs I have four tape libraries - two HP Storageworks and two Storage Technology. The HP Storageworks libraries take 40 and 200 volumes each, the STK libraries take 800 and 1600 volumes each. The HP libs each have 2 and 4 drives respectively, the STK libs have 4 and 16 drives.

HP Storageworks are cheap these days, you can have either of mine for the price of transport.

-- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)

Per Jessen

20:11

New subject: what is mt-st

Carlos E. R. wrote:

...

What is mtst?

Sorry, it is mt-st. Magnetic Tape Control tools for Linux SCSI tapes. Google is your friend. -- Per Jessen, Zürich (14.5°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland.

Lew Wolfgang

18:51

On 10/8/22 10:05, Per Jessen wrote:

...

Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

I used to use Exabyte and DLT tape libraries for backups. They were great! I used the classic "dump" program with a Tower of Hanoi dump-level plan. It saved my bacon many times. Indeed, I was perplexed when SuSE went to Reiserfs by default. Dump doesn't support Reiserfs. I remember complaining about it on a Usenet list and received a rather curt reply from none other than Hans Reiser himself telling me to just use tar instead. This was obviously before he murdered his wife and ended up in prison. I wonder what he's up to these days? Regards, Lew

Per Jessen

19:42

Lew Wolfgang wrote:

...

On 10/8/22 10:05, Per Jessen wrote:

...
Lew Wolfgang wrote:

...
Agree. I think a script being smart with rsync is the way to go. I didn't find anything on-the-shelf that would make sense here. In my experience - but I only work with tape - anything that involves human intervention is rarely found off-the-shelf.

I used to use Exabyte and DLT tape libraries for backups. They were

Professionally I grew up with IBM 6250, 3480 and 3490 tape drives. I jumpstarted my career by analysing their use and how to optimize it. Later on I wrote the software for those.

...

Indeed, I was perplexed when SuSE went to Reiserfs by default.

Defaults are just that, meant to be changed. I've never used reiserfs. -- Per Jessen, Zürich (14.4°C)

Per Jessen

12:23

Lew Wolfgang wrote:

...

Hi Folks,

I'm anticipating a requirement where I'll have lots of data on a large RAID-6 partition. The data consists of many 7-GB files that were uploaded over several Ethernet channels.

The files then need to be copied to individual "JBOD" disks plugged into the same chassis as the RAID-6 array. Alas, there is more data on the RAID-6 then will fit on one JBOD disk, so I've got to come up with a process that will copy the files to a disk, then stop when the disk is full, prompt the operator to unmount and remove the JBOD, then plug in another one, and mount to continue the process. Repeat until all the data are copied to as many JBOD's as it takes.

I think I can convince tar to write to multiple volumes and prompt to change media, but I don't want to treat the JBOD's as character devices, or like a tape drive in other words. I'd like to take any one of the disks in a series and mount them on another host and read the files without having to load the other disks in that series.

I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

Pseudo-code: while more-to-do do prompt to mount new disk while rsync raid6:file disk:file do file=file+1 done prompt to unmount current disk done -- Per Jessen, Zürich (16.2°C) Слава Україні! Slava Ukraini!

J Leslie Turriff

10 Oct 10 Oct

04:50

On 2022-10-07 18:01:49 Lew Wolfgang wrote:

...

|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | |Regards, |Lew

dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media. Leslie -- Platform: Linux Distribution: openSUSE Leap 15.4 x86_64

Lew Wolfgang

06:50

On 10/9/22 21:50, J Leslie Turriff wrote:

...

On 2022-10-07 18:01:49 Lew Wolfgang wrote:

...
|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | | dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media.

Leslie

Thanks for the suggestion, Leslie! I've already got someone working on our own script, but we'll take a look at dar. Regards, Lew

Patrick Shanahan

13:00

* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 02:51]:

...

On 10/9/22 21:50, J Leslie Turriff wrote:

...
On 2022-10-07 18:01:49 Lew Wolfgang wrote:

...
|Hi Folks, | |I'm anticipating a requirement where I'll have lots of data on a large |RAID-6 partition. The data consists of many 7-GB files that were |uploaded over several Ethernet channels. | |The files then need to be copied to individual "JBOD" disks plugged |into the same chassis as the RAID-6 array. Alas, there is more data |on the RAID-6 then will fit on one JBOD disk, so I've got to come up |with a process that will copy the files to a disk, then stop when the |disk is full, prompt the operator to unmount and remove the JBOD, |then plug in another one, and mount to continue the process. Repeat |until all the data are copied to as many JBOD's as it takes. | |I think I can convince tar to write to multiple volumes and prompt |to change media, but I don't want to treat the JBOD's as character |devices, or like a tape drive in other words. I'd like to take any one |of the disks in a series and mount them on another host and read the |files without having to load the other disks in that series. | |I'm thinking a fancy shell script is called for here. I bet that rsync |should be leveraged to allow restarting from an aborted copy process. | |Any ideas? It seems to be a common requirement, maybe someone's |done it already? | | dar (disk archive) can do this; it has built-in prompting for waiting for operator intervention to swap media.

Leslie

Thanks for the suggestion, Leslie! I've already got someone working on our own script, but we'll take a look at dar.

there is even a gui which makes complicated dar much easier to understand. cravat, you must compile it yourself. and dar is excellent and supported by the author. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

David C. Rankin

19:07

On 10/7/22 18:01, Lew Wolfgang wrote:

...

I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

Well... That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish. If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint. There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written. There are no limits listed in the man page (man 1 split), but I suspect a 64-bit size is likely the upper-limit (more than enough for your needs). I haven't tried on large multi-gig files, but no reason it would work any differently than on smaller files. -- David C. Rankin, J.D.,P.E.

Lew Wolfgang

19:30

On 10/10/22 12:07, David C. Rankin wrote:

...

On 10/7/22 18:01, Lew Wolfgang wrote:

...
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

Well...

That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish.

We don't want to split a file between disks. The files are all 7-GB binary, and the disks are 8-TB. That should give us about 1,100 files on a disk, we're not going to worry about the wasted space.

...

If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint.

There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.

Yes, split is great and would work, but we're not going to need it this time. Thanks for the suggestions! Regards, Lew

Patrick Shanahan

20:43

* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 15:30]:

...

On 10/10/22 12:07, David C. Rankin wrote:

...
On 10/7/22 18:01, Lew Wolfgang wrote:

...
I'm thinking a fancy shell script is called for here. I bet that rsync should be leveraged to allow restarting from an aborted copy process.

Any ideas? It seems to be a common requirement, maybe someone's done it already?

Well...

That really depends on what is in the files. If they are binary, then you are going to need to split on a record boundary if you hope to mount some arbitrary JBOD copy somewhere and read something other than gibberish.

We don't want to split a file between disks. The files are all 7-GB binary, and the disks are 8-TB. That should give us about 1,100 files on a disk, we're not going to worry about the wasted space.

...
If they are text, that's a bit easier, but the storage would be less than optimal from a size standpoint.

There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.

Yes, split is great and would work, but we're not going to need it this time.

and once more, dar will accomplish all that for you very quickly, especially using the gui -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

Carlos E. R.

21:22

On 2022-10-10 22:43, Patrick Shanahan wrote:

...

* Lew Wolfgang <> [10-10-22 15:30]:

...
On 10/10/22 12:07, David C. Rankin wrote:

...
On 10/7/22 18:01, Lew Wolfgang wrote:

...

...
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.

Yes, split is great and would work, but we're not going to need it this time.

and once more, dar will accomplish all that for you very quickly, especially using the gui

AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery. AFAIU He wants the files saved as the same files, complete, not split, not changed. -- Cheers / Saludos, Carlos E. R. (from 15.3 x86_64 at Telcontar)

Patrick Shanahan

21:47

* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:

...

On 2022-10-10 22:43, Patrick Shanahan wrote:

...
* Lew Wolfgang <> [10-10-22 15:30]:

...
On 10/10/22 12:07, David C. Rankin wrote:

...
On 10/7/22 18:01, Lew Wolfgang wrote:

...

...
...
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.

Yes, split is great and would work, but we're not going to need it this time.

and once more, dar will accomplish all that for you very quickly, especially using the gui

AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.

AFAIU He wants the files saved as the same files, complete, not split, not changed.

DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

Patrick Shanahan

22:12

* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:

...

* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:

...
On 2022-10-10 22:43, Patrick Shanahan wrote:

...
* Lew Wolfgang <> [10-10-22 15:30]:

...
On 10/10/22 12:07, David C. Rankin wrote:

...
On 10/7/22 18:01, Lew Wolfgang wrote:

...

...
...
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written.

Yes, split is great and would work, but we're not going to need it this time.

and once more, dar will accomplish all that for you very quickly, especially using the gui

AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.

AFAIU He wants the files saved as the same files, complete, not split, not changed.

DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.

but "dar" is necessary to access the files. -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

Lew Wolfgang

23:59

On 10/10/22 15:12, Patrick Shanahan wrote:

...

* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:

...
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:

...
On 2022-10-10 22:43, Patrick Shanahan wrote:

...
* Lew Wolfgang <> [10-10-22 15:30]:

...
On 10/10/22 12:07, David C. Rankin wrote:

...
On 10/7/22 18:01, Lew Wolfgang wrote: ...

...
...
...
There is a coreutils utility called 'split' that allows you to slice large files into smaller files by lines or bytes and it also has a --filter=command option to filter through a shell script which would give you the ability to umount/mount new JBOD devices between parts of the original being written. Yes, split is great and would work, but we're not going to need it this time. and once more, dar will accomplish all that for you very quickly, especially using the gui AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.

AFAIU He wants the files saved as the same files, complete, not split, not changed. DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.

but "dar" is necessary to access the files.

Right, we wouldn't want that. The files have to be accessible by any host with a SATA interface, including possibly Windows, without having to install dependencies for this requirement. We've started writing scripts to handle our rather unique requirement. Looks like we'll have one master script that will take the file inventory from the four separate RAID directories and figure out how to parse the files to the destination disks, then pass the file list to four secondary scripts to do the actual simultaneous transfers, prompting the user to swap disks when ready. The transfer scripts, since they know how many whole files will fit on the disk, will stop before the disk-full condition. The scripts will mount/unmount the disks as appropriate to make the process as foolproof as possible. It should be interesting, I'll let you know how it works out. Regards, Lew

Patrick Shanahan

11 Oct 11 Oct

02:03

* Lew Wolfgang <wolfgang@sweet-haven.com> [10-10-22 20:01]:

...

On 10/10/22 15:12, Patrick Shanahan wrote:

...
* Patrick Shanahan <paka@opensuse.org> [10-10-22 17:48]:

...
* Carlos E. R. <robin.listas@telefonica.net> [10-10-22 17:25]:

...
On 2022-10-10 22:43, Patrick Shanahan wrote:

...
* Lew Wolfgang <> [10-10-22 15:30]:

...
On 10/10/22 12:07, David C. Rankin wrote: > On 10/7/22 18:01, Lew Wolfgang wrote: ...

...
...
> There is a coreutils utility called 'split' that allows you to slice > large files into smaller files by lines or bytes and it also has a > --filter=command option to filter through a shell script which would > give you the ability to umount/mount new JBOD devices between parts of > the original being written. Yes, split is great and would work, but we're not going to need it this time. and once more, dar will accomplish all that for you very quickly, especially using the gui AFAIK, dar doesn't store full independent files on the disk, that can be just read or copied directly from the (backup) disk without using dar for recovery.

AFAIU He wants the files saved as the same files, complete, not split, not changed. DAR will store files rather than compress files. files stored are identical copies, complete and not split unless told to do so.

but "dar" is necessary to access the files.

Right, we wouldn't want that. The files have to be accessible by any host with a SATA interface, including possibly Windows, without having to install dependencies for this requirement.

according to wikipedia, dar is available for windows. https://en.wikipedia.org/wiki/Dar_(disk_archiver) and a browser/extractor plugin for mc and the gui leaves you with a copy of the script used to archive. may give you pointers on writing your own scrip.

...

We've started writing scripts to handle our rather unique requirement. Looks like we'll have one master script that will take the file inventory from the four separate RAID directories and figure out how to parse the files to the destination disks, then pass the file list to four secondary scripts to do the actual simultaneous transfers, prompting the user to swap disks when ready. The transfer scripts, since they know how many whole files will fit on the disk, will stop before the disk-full condition. The scripts will mount/unmount the disks as appropriate to make the process as foolproof as possible.

It should be interesting, I'll let you know how it works out.

tks, -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri Photos: http://wahoo.no-ip.org/piwigo paka @ IRCnet oftc

David T-G

12 Oct 12 Oct