David, That was fantastically nice of you to put that together for me. Thank you, Greg -- Greg Freemyer Advances are made by answering questions. Discoveries are made by questioning answers. — Bernard Haisch On Sat, Aug 11, 2018 at 5:47 AM, David C. Rankin <drankinatty@suddenlinkmail.com> wrote:
On 08/10/2018 06:00 PM, Greg Freemyer wrote:
All,
I don't know the origin of these files, but I have a 100GB of corrupted PST files.
From what I can tell some sort of a processing / extraction tool went haywire and prepended binary junk in the front of the real data. The actual start of the data is a header with !BDN as the first 4 chars.
The prepended junk from what I've seen can be roughly from 10-500 binary octets (chars). It is sort of like ram slack, but at the start of the files. (No idea how that happened).
If I knew for certain that the binary junk didn't have any newlines in it, this sed script would get rid of the junk:
find . -name \*.pst -exec sed -e '1s/^.*!BDN/!BDN/' -i "{}" \;
I know I can write a program to do the same but working in binary and not worrying about intervening newlines.
Is there a relatively straight forward way to accomplish the above?
fyi: I'm going to try and get replacement uncorrupted data files as well, but that might be easier said than done.
Thanks Greg -- Greg Freemyer Advances are made by answering questions. Discoveries are made by questioning answers. — Bernard Haisch
I'm not sure scripting is your friend here. In C is straight forward to look for you "!BDN" mark and copy it and the rest of the file to a new output file (say original_name+SUFFIX). Your mark "!BDN" must be ASCII or UTF-8 and not some strange multi-byte character set that just looks like "!BDN" on the terminal.
If your PST files are hundreds of Megabytes each, then rather than a read/write to the end of the PST, it would be better to mmap the file or use sendfile.
A short implementation is attached. Just compile it and its usage is:
$ progname <input file> [mark (default !BDN)]
Where progname is whatever you compile it to, 1st <input file> is the file to search for mark, and 2nd argument mark is the mark in the file to find. If no arguments are given it will read from stdin searing for the default mark.
The default new filename will be "input file_from_data_mark" or "stdin_from_data_mark" if reading stdin. Compile instructions are at top of file.
**Example Input File**
$ cat dat/psttest unwanted!crap!BDNwanted!PSTdata
**Example Use/Output**
$ ./bin/extract_from_mark dat/psttest $ cat dat/psttest_from_data_mark !BDNwanted!PSTdata
-- David C. Rankin, J.D.,P.E.
-- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org