Hi David, Am 07.06.2012 08:28, schrieb David Haller:
Hello,
On Tue, 29 May 2012, Stefan Seyfried wrote:
Am 29.05.2012 12:03, schrieb Guido Berhoerster:
Try mawk, it's an order of a magnitude faster than gawk and can be configured as /bin/awk using update-alternatives.
Yeah, and it has incompatible syntax. The following snippet to extract name and version from a debian control file does not work with mawk:
eval $(awk -F":[[:space:]*]" \ ^ are you sure the * belongs there?
no, but it works, should the * be behind the second ]?
'/^Package:/{print "NAME=\""$2"\""}; /^Version:/{print "VERSION=\""$2"\""}' CONTROL/control)
'/^Package:/{printf("NAME=\"%s\"\n", $2);}; /^Version:/{printf("VERSION=\"%s\"\n", $2);}' CONTROL/control)
How does that help? $2 was empty. Guido explained this to me (in personal mail, as this topic is offtopic here), debian has an obsolet version of mawk which does not support character classed. The code is now using [\t ]* instead of [[:space:]*]
But that's now going to be seriously offtopic here.
grep and two cuts are probably still cheaper.
If you use a modern shell such as bash or ksh93 with built-in support for regexes neither of them should be ever needed.
Whenever I see "while read a b c; do some string processing in shell; done < foo" code, I always wonder if a simple grep|cut would have been more efficient. The shell is probably not as optimized for such stuff as the specialized tools are. And often it is more readable, too.
Usually, the "startup-penalty" of _any_ external tool outweighs by far a dozen lines of using var="${var/%foo/}" and similar lines.
and the readability penalty of val="" while read a b c; do case $a in frob) val=$b; break ;; esac done < /foo/bar/baz outweighs by far val=$(awk '/^frob /{ print $2 }' /foo/bar/baz)
Oh, and awk is rather fast in startup:
$ time awk 'BEGIN{exit;}' real 0m0.066s $ time awk 'BEGIN{exit;}' real 0m0.004s $ time perl -e 1 real 0m0.006s $ time perl -e 1 real 0m0.003s
awk varies further on between .005 and .002, perl between .003 and .008.
So, think again about using grep + cut in a pipe sequence ...
This benchmark is like most benchmarks: irrelevant. In practice, those calls happen after cross compiling a complete operating system => cold caches and stuff in swap. perl loads ~7 MB for doing nothing, awk loads "only" ~4 MB AFAICT. I *guess* (not tested) that as soon as you do anything useful, perl will load even more.
PS: for comparison this is a AMD Athlon II X2 250 with 2 * 3.0GHz cores.
CPU is mostly irrelevant for this kind of workload, it's I/O. But this is still offtopic here. And the problem is solved now. Best regards -- Stefan Seyfried "Dispatch war rocket Ajax to bring back his body!" -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org