[opensuse] Performance of "cut" and "uniq"
The "cut" binary that comes with openSUSE-12.1 is much slower than the "cut" I get by building from the coreutils srcpackage files that are from the openSUSE-12.1 repository. i | coreutils | package | 8.14-3.1.2 | x86_64 | openSUSE-12.1-Oss i | coreutils | package | 8.14-3.1.2 | x86_64 | openSUSE-12.1-12.1-1.4 | coreutils | srcpackage | 8.14-3.1.2 | noarch | openSUSE-12.1-Source The difference in performance is erased by setting LANG=C (the default is en_US.UTF-8), which change makes the installed binary much faster but doesn't change the built-from-source binary performance. See examples below. "uniq" seems to be in the same boat as as "cut". Is it a known issue that building from source lets the CPU safely take advantage of short-cuts to make things faster even under UTF-8 that a packaged binary cannot? Or perhaps is there a bug in the build-from-source that makes it faster, but somehow unsafe for UTF-8? perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time cut -d, -f2 |wc -c 71.93user 1.36system 1:13.58elapsed 99%CPU (0avgtext+0avgdata 3328maxresident)k perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.81user 0.85system 0:05.08elapsed 91%CPU (0avgtext+0avgdata 3104maxresident)k perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.79user 0.89system 0:05.08elapsed 92%CPU (0avgtext+0avgdata 2368maxresident)k perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time cut -d, -f2 |wc -c 3.68user 0.89system 0:05.00elapsed 91%CPU (0avgtext+0avgdata 2432maxresident)k Cheers, Jeff -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 28/04/12 00:24, Jeff Janes wrote:
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time cut -d, -f2 |wc -c 71.93user 1.36system 1:13.58elapsed 99%CPU (0avgtext+0avgdata 3328maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.81user 0.85system 0:05.08elapsed 91%CPU (0avgtext+0avgdata 3104maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.79user 0.89system 0:05.08elapsed 92%CPU (0avgtext+0avgdata 2368maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time cut -d, -f2 |wc -c 3.68user 0.89system 0:05.00elapsed 91%CPU (0avgtext+0avgdata 2432maxresident)k
Cheers,
Jeff
Please file a bug report ;) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Sat, 28 Apr 2012, Cristian Rodríguez wrote:
On 28/04/12 00:24, Jeff Janes wrote:
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time cut -d, -f2 |wc -c 71.93user 1.36system 1:13.58elapsed 99%CPU (0avgtext+0avgdata 3328maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.81user 0.85system 0:05.08elapsed 91%CPU (0avgtext+0avgdata 3104maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.79user 0.89system 0:05.08elapsed 92%CPU (0avgtext+0avgdata 2368maxresident)k [..] Please file a bug report ;)
Seconded. ISTR something much like that some time ago that also was locale (UTF-8?) specific. So, it probably is a regression or basically the same bug cropping up in a different utility. JFTR (first run, second was actually a tad slower): $ perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'|\ /usr/bin/time cut -d, -f2 |wc -c 3.82user 1.05system 0:05.38elapsed 90%CPU (0avgtext+0avgdata 2944maxresident)k 96inputs+0outputs (1major+233minor)pagefaults 0swaps $ echo $LANG en_US.iso885915 $ perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| \ LANG=en_US.UTF-8 /usr/bin/time cut -d, -f2 |wc -c 52.52user 1.34system 0:54.25elapsed 99%CPU (0avgtext+0avgdata 3152maxresident)k 680inputs+0outputs (11major+235minor)pagefaults 0swaps So, it's quite definitely UTF-8 related. Probably how cut parses lines to find the seperator in UTF-8 vs. 1 Byte charsets. For comparison: $ perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| \ LANG=en_US.UTF-8 /usr/bin/time awk -F, '{print $2;}' | wc -c 35.68user 1.34system 0:37.51elapsed 98%CPU (0avgtext+0avgdata 4368maxresident)k 720inputs+0outputs (1major+336minor)pagefaults 0swaps $ perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| \ /usr/bin/time awk -F, '{print $2;}' | wc -c 2.00user 0.98system 0:03.38elapsed 88%CPU (0avgtext+0avgdata 4432maxresident)k 0inputs+0outputs (0major+342minor)pagefaults 0swaps So awk seems to have the same problem (whoah: it's faster than 'cut' ;) Feel free to add above to the bug, and/or mail the Bug-No / add me to the CC-list of the bug and I'll add the above myself. HTH, -dnh -- NT is the only OS that has caused me to beat a piece of hardware to death with my bare hands. -- Derry Hamilton -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
El 28/04/12 04:10, David Haller escribió:
Seconded. ISTR something much like that some time ago that also was locale (UTF-8?) specific. So, it probably is a regression or basically the same bug cropping up in a different utility.
Yeah, and whatever this bug is about, it is still present in factory. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, Apr 28, 2012 at 12:10 AM, David Haller <dnh@opensuse.org> wrote:
Hello,
On Sat, 28 Apr 2012, Cristian Rodríguez wrote:
On 28/04/12 00:24, Jeff Janes wrote:
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time cut -d, -f2 |wc -c 71.93user 1.36system 1:13.58elapsed 99%CPU (0avgtext+0avgdata 3328maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.81user 0.85system 0:05.08elapsed 91%CPU (0avgtext+0avgdata 3104maxresident)k
perl -le 'my $x="x"x1024; print "123456,$x" foreach 1..1e6'| LANG=C /usr/bin/time ~/coreutils_suse_source/bin/cut -d, -f2 |wc -c 3.79user 0.89system 0:05.08elapsed 92%CPU (0avgtext+0avgdata 2368maxresident)k [..] Please file a bug report ;)
OK, I've now filed bug report 759814. (I was expecting someone to tell me it wasn't a bug, just the way things worked.)
Feel free to add above to the bug, and/or mail the Bug-No / add me to the CC-list of the bug and I'll add the above myself.
I couldn't figure out how to add you on cc, but the report is: https://bugzilla.novell.com/show_activity.cgi?id=759814 Thanks, Jeff -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, 28 Apr 2012 09:10:23 +0200, David Haller <dnh@opensuse.org> wrote:
Seconded. ISTR something much like that some time ago that also was locale (UTF-8?) specific. So, it probably is a regression or basically the same bug cropping up in a different utility.
No bug! Multibyte processing is slower, period. Either file a bug upstreams (which will be rejected as upstream doesn't don't do multibyte processing) or find ways to speed up the current i18n patch. All other bug reports on this will be closed by me as I'm the currant coreutils maintainer.
So, it's quite definitely UTF-8 related. Probably how cut parses lines to find the seperator in UTF-8 vs. 1 Byte charsets.
Of cause it is! Just look at the i18n patch that we apply and you'll see the differences. Philipp -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Fri, 27 Apr 2012 20:24:32 -0700, Jeff Janes <jeff.janes@gmail.com> wrote:
The difference in performance is erased by setting LANG=C (the default is en_US.UTF-8), which change makes the installed binary much faster but doesn't change the built-from-source binary performance. See examples below.
That's to be expected. Many distributions (I know of at least Fedora and Arch Linux besides SUSE) use a patch that teaches the coreutils how to handle multibyte locales. The downside is that multibyte processing slows down the utils.
Is it a known issue that building from source lets the CPU safely take advantage of short-cuts to make things faster even under UTF-8 that a packaged binary cannot? Or perhaps is there a bug in the build-from-source that makes it faster, but somehow unsafe for UTF-8?
Something's wrong if you get different results from a binary package vs. binaries built by building your owen rpm. If you do not use our spec file and the patches we supply it's no wonder you get faster binaries. So the real question is, how did you build the binaries precisely. If you built the package locally by using build, there should be no differences. Philipp -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Sat, Apr 28, 2012 at 12:24 PM, Philipp Thomas <Philipp.Thomas2@gmx.net> wrote:
On Fri, 27 Apr 2012 20:24:32 -0700, Jeff Janes <jeff.janes@gmail.com> wrote:
The difference in performance is erased by setting LANG=C (the default is en_US.UTF-8), which change makes the installed binary much faster but doesn't change the built-from-source binary performance. See examples below.
That's to be expected. Many distributions (I know of at least Fedora and Arch Linux besides SUSE) use a patch that teaches the coreutils how to handle multibyte locales. The downside is that multibyte processing slows down the utils.
OK, thanks. I think there might be a way to only suffer the slow down if the delimiter is actually a wide character (sort, for example, doesn't seem to be slowed down when the input happens to have no wide characters), but I don't know how to go about doing that. I'll probably just set LANG=C in my login rc file and call it a day.
Is it a known issue that building from source lets the CPU safely take advantage of short-cuts to make things faster even under UTF-8 that a packaged binary cannot? Or perhaps is there a bug in the build-from-source that makes it faster, but somehow unsafe for UTF-8?
Something's wrong if you get different results from a binary package vs. binaries built by building your owen rpm.
If you do not use our spec file and the patches we supply it's no wonder you get faster binaries. So the real question is, how did you build the binaries precisely.
I built it from the instructions from INSTALL file that came in the tarball put in /usr/src/packages/SOURCES by running zypper install -t srcpackage coreutils. So I guess I did the wrong thing there.
If you built the package locally by using build, there should be no differences.
OK, thanks. I haven't been able to get build to work for me yet, I'll keep trying. Thanks, Jeff -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On Mon, 30 Apr 2012 13:28:24 -0700, Jeff Janes <jeff.janes@gmail.com> wrote:
I built it from the instructions from INSTALL file that came in the tarball put in /usr/src/packages/SOURCES by running zypper install -t srcpackage coreutils. So I guess I did the wrong thing there.
Nope, nothing wrong, but installing the source rpm is unnecessary. I'd propose you get an opensuse login in order to use the open source build system at build.opensuse.org. I you have a login, install osc and then download the sources in question via 'osc co' and then use 'osc build' to build a new package. Philipp -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday, 2012-05-01 at 19:31 +0200, Philipp Thomas wrote:
Nope, nothing wrong, but installing the source rpm is unnecessary. I'd propose you get an opensuse login in order to use the open source build system at build.opensuse.org. I you have a login, install osc and then download the sources in question via 'osc co' and then use 'osc build' to build a new package.
What, and cause another raid outage for the entire community? :-P - -- Cheers, Carlos E. R. (from 11.4 x86_64 "Celadon" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) iEYEARECAAYFAk+geUcACgkQtTMYHG2NR9WdPwCeLNy70CtAWnu+mrP6PvEmaL87 haUAoJedHS8YNrAXNWmHatQYfXvoQtMd =YnYQ -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (5)
-
Carlos E. R.
-
Cristian Rodríguez
-
David Haller
-
Jeff Janes
-
Philipp Thomas