Re: grep, cut, and all that (Re: [opensuse-factory] Status: Beta)

8 Jun 2012

      Hi David,

Am 07.06.2012 08:28, schrieb David Haller:
...
Hello,
On Tue, 29 May 2012, Stefan Seyfried wrote:
...
Am 29.05.2012 12:03, schrieb Guido Berhoerster:
...
Try mawk, it's an order of a magnitude faster than gawk and can be
configured as /bin/awk using update-alternatives.
Yeah, and it has incompatible syntax. The following snippet to extract
name and version from a debian control file does not work with mawk:
eval $(awk -F":[[:space:]*]" \
                          ^ are you sure the * belongs there?
no, but it works, should the * be behind the second ]?
...
...
'/^Package:/{print "NAME=\""$2"\""};
        /^Version:/{print "VERSION=\""$2"\""}' CONTROL/control)
'/^Package:/{printf("NAME=\"%s\"\n", $2);};
         /^Version:/{printf("VERSION=\"%s\"\n", $2);}' CONTROL/control)
How does that help? $2 was empty. Guido explained this to me (in
personal mail, as this topic is offtopic here), debian has an obsolet
version of mawk which does not support character classed. The code is
now using [\t ]* instead of [[:space:]*]
...
...
But that's now going to be seriously offtopic here.
...
...
grep and two cuts are probably still cheaper.
If you use a modern shell such as bash or ksh93 with built-in support
for regexes neither of them should be ever needed.
Whenever I see "while read a b c; do some string processing in shell;
done < foo" code, I always wonder if a simple grep|cut would have been
more efficient. The shell is probably not as optimized for such stuff as
the specialized tools are. And often it is more readable, too.
Usually, the "startup-penalty" of _any_ external tool outweighs by far
a dozen lines of using var="${var/%foo/}" and similar lines.
and the readability penalty of

val=""
while read a b c; do
    case $a in
        frob)
            val=$b; break ;;
    esac
done < /foo/bar/baz

outweighs by far
val=$(awk '/^frob /{ print $2 }' /foo/bar/baz)
...
Oh, and awk is rather fast in startup:
$ time awk 'BEGIN{exit;}'
real    0m0.066s
$ time awk 'BEGIN{exit;}'
real    0m0.004s
$ time perl -e 1
real    0m0.006s
$ time perl -e 1
real    0m0.003s
awk varies further on between .005 and .002, perl between .003 and .008.
...
So, think again about using grep + cut in a pipe sequence ...
This benchmark is like most benchmarks: irrelevant.
In practice, those calls happen after cross compiling a complete
operating system => cold caches and stuff in swap.

perl loads ~7 MB for doing nothing, awk loads "only" ~4 MB AFAICT. I
*guess* (not tested) that as soon as you do anything useful, perl will
load even more.
...
PS: for comparison this is a AMD Athlon II X2 250 with 2 * 3.0GHz cores.
CPU is mostly irrelevant for this kind of workload, it's I/O.

But this is still offtopic here. And the problem is solved now.

Best regards
-- 
Stefan Seyfried

"Dispatch war rocket Ajax to bring back his body!"
-- 
To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-factory+owner@opensuse.org