Re: [opensuse] How to remove binary garbage from the front of a binary file??

12 Aug 2018

      David,

That was fantastically nice of you to put that together for me.

Thank you,
Greg
--
Greg Freemyer
Advances are made by answering questions. Discoveries are made by
questioning answers.
— Bernard Haisch

On Sat, Aug 11, 2018 at 5:47 AM, David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:
...
On 08/10/2018 06:00 PM, Greg Freemyer wrote:
...
All,
I don't know the origin of these files, but I have a 100GB of
corrupted PST files.
From what I can tell some sort of a processing / extraction tool went
haywire and prepended binary junk in the front of the real data.  The
actual start of the data is a header with !BDN as the first 4 chars.
The prepended junk from what I've seen can be roughly from 10-500
binary octets (chars).  It is sort of like ram slack, but at the start
of the files.  (No idea how that happened).
If I knew for certain that the binary junk didn't have any newlines in
it, this sed script would get rid of the junk:
find . -name \*.pst -exec sed -e '1s/^.*!BDN/!BDN/' -i "{}" \;
I know I can write a program to do the same but working in binary and
not worrying about intervening newlines.
Is there a relatively straight forward way to accomplish the above?
fyi: I'm going to try and get replacement uncorrupted data files as
well, but that might be easier said than done.
Thanks
Greg
--
Greg Freemyer
Advances are made by answering questions. Discoveries are made by
questioning answers.
— Bernard Haisch
I'm not sure scripting is your friend here. In C is straight forward to look
for you "!BDN" mark and copy it and the rest of the file to a new output file
(say original_name+SUFFIX). Your mark "!BDN" must be ASCII or UTF-8 and not
some strange multi-byte character set that just looks like "!BDN" on the terminal.
If your PST files are hundreds of Megabytes each, then rather than a
read/write to the end of the PST, it would be better to mmap the file or use
sendfile.
A short implementation is attached. Just compile it and its usage is:
$ progname <input file> [mark (default !BDN)]
Where progname is whatever you compile it to, 1st <input file> is the file to
search for mark, and 2nd argument mark is the mark in the file to find. If no
arguments are given it will read from stdin searing for the default mark.
The default new filename will be "input file_from_data_mark" or
"stdin_from_data_mark" if reading stdin. Compile instructions are at top of file.
**Example Input File**
$ cat dat/psttest
    unwanted!crap!BDNwanted!PSTdata
**Example Use/Output**
$ ./bin/extract_from_mark dat/psttest
    $ cat dat/psttest_from_data_mark
    !BDNwanted!PSTdata
--
David C. Rankin, J.D.,P.E.
-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse+owner@opensuse.org