Re: [opensuse] BASH - howto 'read' line in file and preserve leading whitespace?

4 Sep 2008

      ----- Original Message ----- 
From: "David C. Rankin" 
To: "suse" 
Sent: Thursday, September 04, 2008 2:10 AM
Subject: [opensuse] BASH - howto 'read' line in file and preserve leading 
whitespace?
...
Listmates,
I stumbled on a problem trying to read a file (email mailbox) line-by-line 
in bash. Using the built-in read, it strips the leading whitespace from 
each line making the subsequent write impossible. I was using a while loop 
as follows:
{ while read XTAG VALUE LINE; do
        if [[ ${XTAG} == "X-Mozilla-Status:" ]]; then
                case ${VALUE} in
                1019 ) NEWVAL=1011;;
                1009 ) NEWVAL=1001;;
                001b ) NEWVAL=0013;;
                0019 ) NEWVAL=0011;;
                000b ) NEWVAL=0003;;
                0009 ) NEWVAL=0001;;
                 *   ) NEWVAL=${VALUE};;
                esac
                echo -e "${XTAG} ${NEWVAL} ${LINE}" >>  ${NEWFILE}
        else
                echo -e "${XTAG} ${VALUE} ${LINE}" >>  ${NEWFILE}
        fi
XTAG=''; VALUE=''; LINE=''; NEWVAL=''
done } < 
~/.thunderbird/2k12pnl0.default/Mail/pop.suddenlinkmail.com/openSuSE.sav
All of the lines in the mailbox with leading whitespace were written 
without the leading whitespace like:
original file:
Received: from edge03.suddenlink.net ([195.135.221.135])
          by imta03.suddenlink.net
newfile:
Received: from edge03.suddenlink.net ([195.135.221.135])
by imta03.suddenlink.net
Is there a bash trick that will preserve the whitespace?
Try this:

---
INFILE="~/.thunderbird/2k12pnl0.default/Mail/pop.suddenlinkmail.com/openSuSE.sav"
OUTFILE="some_file"

DONE=false
until $DONE ; do
  IFS="" read || DONE=true

  [[ "${REPLY%%:*}" == "X-Mozilla-Status" ]] || echo "$REPLY" ; continue

  VAL=${REPLY#*:}

  VAL=${VAL// /}

  case "$VAL" in
    1019) VAL=1011 ;;
    1009) VAL=1001 ;;
    001b) VAL=0013 ;;
    0019) VAL=0011 ;;
    000b) VAL=0003 ;;
    0009) VAL=0001 ;;
  esac

  echo "${REPLY%%:*}: $VAL"

done <$INFILE >$OUTFILE
---

explaination:
IFS="" Eliminate any word seperator so read sees the whole line as one big 
word, including the leading, trailing, and all other spaces. Line break 
still breaks on linefeed and the ifs change only effects the read command, 
nothing else in the script.

read (with no variable) just my minimalist nature. we happen to only need 
one variable, and read happens to supply a variable REPLY if no other 
specified.

${REPLY%%:*} display part of $REPLY , from beginning to the first ":" , 
non-inclusive.

For sanity sake, you should always try to compare things in the same 
context.
so, either quote, or don't quote the values on both sides of the test 
comparator
If you have to quote one side for any reason, then quote the other side too.
Most times when either side is a variable then you should quote, to account 
for the possibility of the variable being empty.

If that doesn't match, then echo the entire line verbatim and skip the rest 
of the loop.
I re-arranged the loop that way because 99% of lines will not match, so this 
way 99% of the time we do almost no work. Also, this way we are almost 
impervious to the content of the line. We don't care what's in it or have to 
parse all the possible types of lines and reconstruct them, we just spit the 
whole line back out without even looking at it.

The rest only ever happens on those rare lines when we didn't skip out 
above,

VAL=${REPLY#*:}  VAL=everything from the 1st colon to end of line

case "${VAL// /}" in    VAL with all spaces stripped out.

I'm assuming that on THESE particular lines that this is reasonable. IE that 
these lines have the format "tag: value"  with no other junk after the 
colon. So that taking everything after the colon, and then stripping all 
spaces anywhere, will leave you with a clean "####" for case-matching.

If there is any other stuff, well no problem, since these are mail headers 
and they have an intentionally regular, defined, parseable structure. So if 
we need to do another split on a comma or a semicolon or something, it's 
just another VAL=${VAL%%,*} or some such.

theres really no need for val & newval, just start with val, and sometimes 
overwrite it.

echo "${REPLY%%:*}: $VAL"
Whether we changed VAL or not, either way we simply (re)create the line out 
of the parts the same way every time.
We word-split on the : so we have to put one back in ourselves.

Finally, if the input file happens to end at the end of the last line (ie, 
no final linefeed), then that last line will not be processed and will NOT 
appear in the output, because the "while read" will exit with an exit status 
above 0 on hitting the end of file, and so the rest of the loop will not get 
a chance to run that one last time.

To allow for that possibility you need to put the read within a different 
loop that exits on a variable instead of directly on the exit status of read 
itself. Then in the loop merely remember the exit status of read but proceed 
to do that iteration of the loop.

So, this:

while IFS="" read ;do
  [...]
done <infile >outfile

Becomes this:

DONE=false
until $DONE ;do
  IFS="" read || DONE=true
  [...]
done <infile >outfile

Brian K. White    brian@aljex.com    http://www.myspace.com/KEYofR
+++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!

-- 
To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org
For additional commands, e-mail: opensuse+help@opensuse.org