[opensuse-factory] Why does OpenSuSE disabled the default of allowing multi-core use in Coreutils 'sort'??
I wondered why I wasn't getting multi-cpu support in my sorts of >128k line files, even when I specified that it should use multiple threads. It seems SuSE adds a patch to set the max CPU's my machine has to 1, so no matter what I set in --parallel, it uses the smaller of (1, number I ask for). The default is to use min(available CPU's, number I ask for)... Yet the OSuSE version is hard-coded to disallow parallel usage. Could someone point me at the discussion in the archives as to why this was decided? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Linda Walsh wrote:
I wondered why I wasn't getting multi-cpu support in my sorts of >128k line files, even when I specified that it should use multiple threads.
It seems SuSE adds a patch to set the max CPU's my machine has to 1, so no matter what I set in --parallel, it uses the smaller of (1, number I ask for).
The default is to use min(available CPU's, number I ask for)...
Yet the OSuSE version is hard-coded to disallow parallel usage.
Could someone point me at the discussion in the archives as to why this was decided?
I don't remember such a discussion, but maybe check the changelog, there ought to be an explanation. -- Per Jessen, Zürich (17.7°C) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Mon, Sep 10, 2012 at 07:57:01AM +0200, Per Jessen wrote:
Linda Walsh wrote:
I wondered why I wasn't getting multi-cpu support in my sorts of >128k line files, even when I specified that it should use multiple threads.
It seems SuSE adds a patch to set the max CPU's my machine has to 1, so no matter what I set in --parallel, it uses the smaller of (1, number I ask for).
The default is to use min(available CPU's, number I ask for)...
Yet the OSuSE version is hard-coded to disallow parallel usage.
Could someone point me at the discussion in the archives as to why this was decided?
I don't remember such a discussion, but maybe check the changelog, there ought to be an explanation.
changes has: ------------------------------------------------------------------- Fri Jan 14 14:13:28 CET 2011 - uli@suse.de - sort threading still broken, it deadlocks occasionally; set default number of threads to 1 as a workaround as there were some new versions inbetween perhaps its better now, but someone needs to try. Ciao, Marcus -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Sun, 09 Sep 2012 17:19:26 -0700, Linda Walsh <suse@tlinx.org> wrote:
Could someone point me at the discussion in the archives as to why this was decided?
There is no discussion in the archives and it would be ridiculous to require that every patch to a package first has to undergo discussion on a mailing list. It was done last year as one of my colleagues encountered deadlocks as Marcus quoted from the changelog (where you could have looked yourself ...). I need to be convinced that sort threading works on all platforms openSUSE/SLES support in order to disable that patch. If you're willing to help and got a nice sample work load to stress test sort threading I'd try to turn that into a test for the coreutils-testsuite package that I could then build and thus run on all the supported platforms. hth Philipp -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Philipp Thomas wrote:from the changelog (where you
could have looked yourself ...).
I need to be convinced that sort threading works on all platforms openSUSE/SLES support in order to disable that patch.
I wasn't aware that openSUSE supported all platforms. However the people who wrote sort regularly do support 'many' platforms -- far more than what openSuSE supports. Do you regularly disable feature from upstream requiring someone else to provide extra proof that they work? Did you have some reason to suspect that their fixes didn't work? Did you submit a bug report upstream on the issue? I could easily have missed it, but don't recall seeing one. If you don't submit bug reports they won't get fixed. Too often, I see see patches going back 7-10 versions for bugs openSuSE has fixed in various progs/utils that should have been passed back upstream -- but it doesn't *appear* that they have been -- if they had been, you wouldn't need so many custom patches at build time.
If you're willing to help and got a nice sample work load to stress test sort threading I'd try to turn that into a test for the coreutils-testsuite package that I could then build and thus run on all the supported platforms.
They worked through multiple iterations of this algorithm to find a balance that worked and you just throw away their work without question? Do you have reason to believe they have buggy code by default? I'll Cc' the coreutils bug-list on this and open a bug-report on this as should have been done originally, and maybe your questions can be addressed. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
tags 12427 notabug thanks Linda Walsh wrote: ...
Do you have reason to believe they have buggy code by default? I'll Cc' the coreutils bug-list on this and open a bug-report on this as should have been done originally, and maybe your questions can be addressed.
Hi Linda, Do you realize that by merely sending a message to bug-coreutils, you have created an entry in our bug-tracking software? Here it is: http://bugs.gnu.org/12427 If you can describe what you think is a bug in upstream coreutils, we welcome such reports, but when you are not sure (as your message implies), please address your mail to the coreutils@gnu.org mailing list instead. That is a more general forum, e.g., for discussion, where each new thread does not create a new bug-tracking issue that someone will end up having to re-read, maybe mark as "notabug" and close some day. In the future, if you open an issue, you're welcome (encouraged, even) to close it yourself if/when you realize that the issue is not considered a bug. To close bug DDDDD, just send an email to DDDDD-done@debbugs.gnu.org where DDDDD is your bug number. That will save others the time and trouble of having to close it for you, leaving them more time to address issues that *are* deemed to be bugs. The entire set of bugs: http://debbugs.gnu.org/coreutils Here are some graphs: http://debbugs.gnu.org/rrd/coreutils.html I've gone ahead and closed this issue (via the Cc' above), but if you have details on a bug, please start a new thread here to create a new one. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
It seems upstream doesn't know if you are asserting there is a bug, or you are just feeling uncomfortable with "something"... parallel processing in general? Did you, or have you planned to disable parallel functionality in make, for example or anything else... Given your philosophy, why doesn't OpenSUSE boot up with a default number of usable processors = 1 on SMP machines, until all software is stress tested to whatever level is the standard for openSuSE? The largest single complaint of the forced move away from faster clock speeds and towards increased parallelism was that programs weren't written to take advantage of parallelism. Now we see that when they do, some people disable that ability by default. Now I am a big believe in stress testing... I got stories.. but the most recent was in trying to convince openSuSE to test their builds on a fully populated openSuSE development system. That would guarantee to catch corner cases and unforeseen interactions and allow them, if not fixed, to at least be documented in the releases 'errata'... But I was told that such level of assurance wasn't reasonable or expected, and that my level of compatibility testing was unreasonable and they only wanted to support clean and sterile builds... *wonderful...* so robust! So...um... I know the sort people did SOME testing... and spent considerable effort tweaking that after it was found to interact badly with default buffer sizes in pipes being apparently so huge that 4 -8 of them would bring a machine to a crawl (?? seems odd to me, but that's how I read it... )...their algorithm WAS horrid -- more than 8 items to sort -- split to another process... ACK!!! I spent weeks tuning a parallel sort-merge of rpm NVR's to allow me to sort out and get rid of old versions in a directory -- and each required a query through rpm... (fortunately you can do more than one query / invocation)... but they still would compete... I used a % of the processors (75%), as a start, then aimed for a minimum of 50 queries/process. If # queries came out to <50, I recalced the cpu's to use at most, 1 greater than what was needed to be over my minimum. So I used a fraction of my cpu's and made sure the work load on each was sufficient to justify breaking the workload into another piece. Then the sorted lists were passed off to each, for reducing, with the final results passed back to 1 collector who still did another compare (as the ends of each segment might have values that would drop off -- but no disk queries were required in that phase, as they were already done in the children. Then the merge routine returned the result to the initial process -- it was only in the initial process that the list would be processed and duplicates moved off to a volume-recycle bin (shared by my samba files). I don't think their routine will scale as well to larger numbers of processes, and they use a fixed max of '8' threads as an upper maximum, min'ed with the # of processors and number of user settable MP_THREADS (via ENV var OMP_MAX_THREADS). So at most 8, and usually < 8 except on larger setups, and even there there are ARCANE ways to limit the default cases... For example. If you had just set OMP_MAX_THREADS=1 in the environment -- then their sort algorithm, would have required no *source* patch. And no matter if you think their testing is adequate or not -- it seems setting an env var in the sysconfig vars to be propagated to user env's, is FAR preferable than hard-coding the max, in the source, to 1. If you really feel there is a problem there, I ask what evidence (or lack of evidence) are you basing your decision on that couldn't also be applied to the default usage of SMP cores>1 or parallel make (or xargs, or anything else)? Otherwise, use the ENV var to limit things and tell the user what the var means... i.e. something changeable in system settings/yast or the user can override it in their ENV... gives the same effect, without the harder-to-change, hard-coded-value. ???Comments? Philipp Thomas wrote:
I need to be convinced that sort threading works on all platforms openSUSE/SLES support in order to disable that patch. If you're willing to help and got a nice sample work load to stress test sort threading I'd try to turn that into a test for the coreutils-testsuite package that I could then build and thus run on all the supported platforms.
-------- Original Message -------- Subject: Re: bug#12427: Why does OpenSuSE disabled the default of allowing multi-core use in Coreutils 'sort'?? Date: Wed, 12 Sep 2012 16:45:00 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Organization: UCLA Computer Science Department To: Linda Walsh <suse@tlinx.org> CC: Jim Meyering <jim@meyering.net>, 12427@debbugs.gnu.org References: <504D320E.5050405@tlinx.org> <7bss48hf3m8ik45nq67dol986m1ohgtb40@4ax.com> <5050E3DF.1090504@tlinx.org> <505105CB.7060706@tlinx.org> On 09/12/2012 02:59 PM, Linda Walsh wrote:
OpenSuSE's maintainer/integrator of the gnu sort package believes it to be faulty
That is not a correct summary of the email that you forwarded. That email merely said that he was not convinced that it works. If no bugs are known, there's no point to filing a bug report. -------- Original Message -------- Subject: Re: bug#12427: Why does OpenSuSE disabled the default of allowing multi-core use in Coreutils 'sort'?? Date: Wed, 12 Sep 2012 17:28:19 -0600 From: Eric Blake <eblake@redhat.com> Organization: Red Hat To: Linda Walsh <suse@tlinx.org> CC: Jim Meyering <jim@meyering.net>, 12427@debbugs.gnu.org References: <504D320E.5050405@tlinx.org> <7bss48hf3m8ik45nq67dol986m1ohgtb40@4ax.com> <5050E3DF.1090504@tlinx.org> <505105CB.7060706@tlinx.org> On 09/12/2012 03:59 PM, Linda Walsh wrote:
OpenSuSE's maintainer/integrator of the gnu sort package believes it to be faulty -- that's why I forwarded it here, in hopes that his concerns would be heard/dealt with.
Nothing can be dealt with if it is not first identified what needs to be dealt with.
If the downstream maintain thinks there is a bug in sort, then isn't submitting that bug back up stream the correct thing to do?
Yes, reporting bugs upstream is generally the correct thing to do. But please report the actual bug, and not just fill the bug tracker with a 'me too' report. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 12, 2012 at 10:12 PM, Linda Walsh <suse@tlinx.org> wrote:
Given your philosophy, why doesn't OpenSUSE boot up with a default number of usable processors = 1 on SMP machines, until all software is stress tested to whatever level is the standard for openSuSE?
The largest single complaint of the forced move away from faster clock speeds and towards increased parallelism was that programs weren't written to take advantage of parallelism.
Now we see that when they do, some people disable that ability by default.
...
So...um... I know the sort people did SOME testing... and spent considerable effort tweaking that after it was found to interact badly with default buffer sizes in pipes being apparently so huge that 4 -8 of them would bring a machine to a crawl (?? seems odd to me, but that's how I read it... )...their algorithm WAS horrid -- more than 8 items to sort -- split to another process... ACK!!!
I'm having trouble following your mails, perhaps it's the long rants, or perhaps it's only because I'm not a native english speaker, so bare with me. But I do believe you're not understanding what a deadlock is (as mentioned on the changelog at Fri Jan 14 14:13:28 CET 2011). It's not slower performance, it's a stall, a crash, the program freezes. That's serious. And since coreutils are used for everything, that's VERY serious. It would kill the entire build service I'd imagine, for one. All of our pcs, as long as updatedb ran, for two. So... it's not a matter of "I don't trust it". It's a matter of "it's broken, it's critical, disable broken functionality". The path to re-enablement involves stress testing to make sure the bug is fixed (either you, or anyone, including upstream). It's a pity the patch hasn't been associated, so we don't know if there's an upstream bug about it. Search? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Claudio Freire wrote:
But I do believe you're not understanding what a deadlock is (as mentioned on the changelog at Fri Jan 14 14:13:28 CET 2011). It's not slower performance, it's a stall, a crash, the program freezes.
There were bugs and discussions about this 2 years ago that mentioned this type of behavior. See: http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=880 If the symptoms don't sound similar.
That's serious.
And since coreutils are used for everything, that's VERY serious. It would kill the entire build service I'd imagine, for one. All of our pcs, as long as updatedb ran, for two.
Then wouldn't it be better to set the OMP_NUM_THREADS set in your environment rather than "fixing" it for everyone, when it wasn't everyone's problem? OR, if it was, then set OMP_NUM_THREADS=1 in the system startup script from a read-in value in /etc/sysconfig/coreutils, so people will know how to configure the behavior they want. Seems to be that would be the best possible solution -- as it solves your problem and tells everyone how to get the results they want.
So... it's not a matter of "I don't trust it". It's a matter of "it's broken, it's critical, disable broken functionality".
I bet if you set SMP=1, it would have solved it as well, why not do the "least-touch" solution?
The path to re-enablement involves stress testing to make sure the bug is fixed (either you, or anyone, including upstream).
--- The path to disablement involved no thinking. It was a knee jerk reaction to a problem. When I have problem in my build, that are quite a bit more involved -- do I patch the source for everyone else? Or am I told to figure out my problem? They could have set an ENV var, and gotten the same behavior... As the path to disablement involved no though, are you saying that upon reconsideration and thought, that a solution involving not removing a key feature that would service most customers, wouldn't be wise? a lower impact on all of suse's customers wouldn't be wise?
It's a pity the patch hasn't been associated, so we don't know if there's an upstream bug about it. Search?
--- See the above discussion... .. the behavior was fixed and should have been disabled by setting an ENV var rather than by modifying the source which affects everyone, IMO... And BTW -- we are seeing more costs of the build service -- is that what I hear you saying? ;-) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, Sep 12, 2012 at 11:48 PM, Linda Walsh <suse@tlinx.org> wrote:
And since coreutils are used for everything, that's VERY serious. It would kill the entire build service I'd imagine, for one. All of our pcs, as long as updatedb ran, for two.
---- Then wouldn't it be better to set the OMP_NUM_THREADS set in your environment rather than "fixing" it for everyone, when it wasn't everyone's problem? OR, if it was, then set OMP_NUM_THREADS=1 in the system startup script from a read-in value in /etc/sysconfig/coreutils, so people will know how to configure the behavior they want. Seems to be that would be the best possible solution -- as it solves your problem and tells everyone how to get the results they want.
It's great what upstream did, but it's not clear it fixes the pipe-related deadlock. That aside, IMO, the original post you linked is spot on: people assume sort will use 1 core, so the default should be 1 core. Only an explicit --parallel argument should result in parallelism being exploited. And that's what coreutils' patch does.
The path to re-enablement involves stress testing to make sure the bug is fixed (either you, or anyone, including upstream).
--- The path to disablement involved no thinking. It was a knee jerk reaction to a problem.
The patch does exactly what the post you linked suggested if I read it correctly. sort --parallel N still gives you N threads. In fact, I just tested it.
When I have problem in my build, that are quite a bit more involved -- do I patch the source for everyone else? Or am I told to figure out my problem? They could have set an ENV var, and gotten the same behavior...
Actually, environment variables don't propagate everywhere. It would have been a rather fragile solution, not to mention one that would have required touching a lot more packages (ie: whoever used sort, rather than just coreutils).
As the path to disablement involved no though, are you saying that upon reconsideration and thought, that a solution involving not removing a key feature that would service most customers, wouldn't be wise?
It hasn't been removed. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Claudio Freire wrote:
As the path to disablement involved no though, are you saying that upon reconsideration and thought, that a solution involving not removing a key feature that would service most customers, wouldn't be wise?
It hasn't been removed.
So how do I have sort automatically take advantage of my extra cores? What environment variable do I change? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change?
alias sort="sort --parallel=N" ? -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change?
alias sort="sort --parallel=N" ?
So I put that where to have it in effect for my 45 minute runs of updatedb? The point is you removed the feature of even allowing the user to choose their default. You chose for EVERYONE... If you had chosen to only set it to 1 if the ENV var wasn't set, then I might not like it because it is not well documented, but there would be a ready workaround. But you chose not to limit sort to using 1/4-1/12 of a machine's capacity, but chose to remove user's ability to configure the default. The default was configurable. you removed the ability to configure it. That's what's annoying...removing choice is always a bad thing... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change?
alias sort="sort --parallel=N" ?
So I put that where to have it in effect for my 45 minute runs of updatedb?
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"' >/usr/local/bin/sort; -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Jan Engelhardt wrote:
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change? alias sort="sort --parallel=N" ? So I put that where to have it in effect for my 45 minute runs of updatedb?
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"'
/usr/local/bin/sort;
Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed. But still, clever. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Quoting Linda Walsh <suse@tlinx.org>:
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"'
/usr/local/bin/sort;
Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
/usr/local/bin is never overwritten by a package... or the package does not follow packaging conventions. yet /usr/local/bin is in $PATH before /usr/bin, thus has a higher priority. Dominique -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hi, On Thu, Sep 13, Linda Walsh wrote:
What environment variable do I change? alias sort="sort --parallel=N" ? So I put that where to have it in effect for my 45 minute runs of updatedb?
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"'
/usr/local/bin/sort;
Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
Which update overwrites stuff in /usr/local/bin ? Hubert Mantel -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thursday 2012-09-13 11:07, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change? alias sort="sort --parallel=N" ? So I put that where to have it in effect for my 45 minute runs of updatedb?
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"'
/usr/local/bin/sort;
Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
It won't be overwritten, it's in /usr/local. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Jan Engelhardt wrote:
On Thursday 2012-09-13 11:07, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote:
What environment variable do I change? alias sort="sort --parallel=N" ? So I put that where to have it in effect for my 45 minute runs of updatedb? echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"' /usr/local/bin/sort;
Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
It won't be overwritten, it's in /usr/local.
---- Then that won't work, as many utils -- including, specifically, updatedb, set the path and do not include /usr/local/bin, as it's not an official path. updatedb looks for the location of the binary 'find' (/usr/bin on my system), and creates it's own path: PATH=/bin:/usr/bin:${BINDIR}; export PATH So the script you provided won't change the global behavior of 'sort' nor in the originally mentioned util, "updatedb". You are trying to work around a systemic bug introduced by a bug in the source. If the source used a systemic variable to base the thread limit on, then one could have a systemic solution. However, with a source that no longer supports a systemic setting, getting systemic behavior will be prone to error as the above illustrates... Note: I didn't know it wouldn't work until you mentioned it was in "/usr/local/bin". I knew that wasn't a standard system path, so it would be unlikely to be referenced in a system utility like updatedb. In checking the source, I confirmed that it is not used. The only way to test if it works in a given application is to test it with that application, since no amount of stress testing with test cases that do not accurate represent your workload or program will provide no guarantee that it will work when you run it with your workload. If only 'sort' was configurable with an ENV var, then it might be easier for it to be tested... (i.e. in the unpatched source it would be far simpler to test as the ENV only has to be modified one place -- versus finding each place that sort is called and modifying the source of each place that calls sort -- a far greater, work-intensive, task. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-09-14 00:38, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 11:07, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote:
On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote: > What environment variable do I change? alias sort="sort --parallel=N" ? So I put that where to have it in effect for my 45 minute runs of updatedb? echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"' /usr/local/bin/sort; --- Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
It won't be overwritten, it's in /usr/local.
---- Then that won't work, as many utils -- including, specifically, updatedb, set the path and do not include /usr/local/bin, as it's not an official path.
Frankly, I would not want updatedb to run the search with all cores. I would not be able to do anything else while it runs. :-( - -- Cheers / Saludos, Carlos E. R. (from 12.1 x86_64 "Asparagus" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlBTGeoACgkQIvFNjefEBxpp+QCggYQgKAza9PKjEkENzyWj7ATh RL0AoJtZfz2PsHl/BXvlV2nBURKh3Zdx =7Pie -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Friday 14 September 2012, Carlos E. R. wrote:
On 2012-09-14 00:38, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 11:07, Linda Walsh wrote:
Jan Engelhardt wrote:
On Thursday 2012-09-13 07:54, Linda Walsh wrote:
Claudio Freire wrote: > On Thu, Sep 13, 2012 at 2:36 AM, Linda Walsh <suse@tlinx.org> wrote: >> What environment variable do I change? > > alias sort="sort --parallel=N" ?
So I put that where to have it in effect for my 45 minute runs of updatedb?
echo -en '#!/bin/sh\nexec /usr/bin/sort --parallel=N "$@"'
/usr/local/bin/sort;
--- Clever, but it sorta misses the point as it will be overwritten on the next update if it isn't fixed.
It won't be overwritten, it's in /usr/local.
---- Then that won't work, as many utils -- including, specifically, updatedb, set the path and do not include /usr/local/bin, as it's not an official path.
Frankly, I would not want updatedb to run the search with all cores. I would not be able to do anything else while it runs. :-(
It's niced 19, so shouldn't be a problem. You could also edit /etc/cron.daily/suse-updatedb or /usr/bin/updatedb cu, Rudi -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-09-14 14:16, Ruediger Meier wrote:
Frankly, I would not want updatedb to run the search with all cores.
I would not be able to do anything else while it runs. :-( It's niced 19, so shouldn't be a problem. You could also edit /etc/cron.daily/suse-updatedb or /usr/bin/updatedb
That negates the use of all cores. I see no point in that combination. - -- Cheers / Saludos, Carlos E. R. (from 12.1 x86_64 "Asparagus" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlBTNBQACgkQIvFNjefEBxoipwCeLWRmP561AMAYPnNdnSG53ZV+ cZIAn0nptgiUN7piOBFMWsAeO4tUNt3y =XA24 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, 2012-09-14 at 15:41 +0200, Carlos E. R. wrote:
On 2012-09-14 14:16, Ruediger Meier wrote:
Frankly, I would not want updatedb to run the search with all cores.
I would not be able to do anything else while it runs. :-( It's niced 19, so shouldn't be a problem. You could also edit /etc/cron.daily/suse-updatedb or /usr/bin/updatedb
That negates the use of all cores. I see no point in that combination.
CPU usage isn't a problem with updatedb at nice 19, but IO induced latency remains, and can be highly annoying. -Mike -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Fri, Sep 14, 2012 at 8:50 AM, Carlos E. R. <robin.listas@telefonica.net> wrote:
---- Then that won't work, as many utils -- including, specifically, updatedb, set the path and do not include /usr/local/bin, as it's not an official path.
Frankly, I would not want updatedb to run the search with all cores. I would not be able to do anything else while it runs. :-(
And, also, sort is not the bottleneck. find is. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2012-09-14 16:24, Claudio Freire wrote:
On Fri, Sep 14, 2012 at 8:50 AM, Carlos E. R. <> wrote:
And, also, sort is not the bottleneck. find is.
Or rather disk i/o. - -- Cheers / Saludos, Carlos E. R. (from 12.1 x86_64 "Asparagus" at Telcontar) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAlBTjZMACgkQIvFNjefEBxpWywCgrdmtOeO+an7+2EY7YXD00nex 3PQAoIJ2T2kiRo4G6xrrcnS90ljIwH/L =RSMg -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Quoting Linda Walsh <suse@tlinx.org>:
You chose for EVERYONE...
That's the nature of a distribution: it chooses for everybody. Maybe you should head over to LFS and choose for yourself? Dominique -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
"Dominique Leuenberger a.k.a DimStar" <DimStar@openSUSE.org> writes:
Quoting Linda Walsh <suse@tlinx.org>:
You chose for EVERYONE...
That's the nature of a distribution: it chooses for everybody.
Maybe you should head over to LFS and choose for yourself?
Huh? Linda is spot on: we have "over-solved" the problem, which was that multi-threaded sort was buggy in some circumstances. Specifically, we have changed sort to deviate in a non-obvious way from what people expect when they are familiar with gnu coreutils. We've furthermore changed it in a way that is not compliant with unix standard philosophy (use environemnt variables), and that can only be worked-around in unreliable ways. I completely agree with Linda, that a less intrusive work-around for the sometimes broken multi-core feature of sort(1) had been an environment variable, that for suse, if unset, is assumed to be 1. It's much easyer to export an environment variable in your .profile than it is to create a replacement script and to ensure it's used everywhere. Would the coreutils gatekeepers accept a patch to sort that replaces the coreutils-8.9-singlethreaded-sort.patch by a patch using an environment variable? And how do we, on SUSE, highlight how we deviate from expected behaviour? in the manpage? or in README.SUSE? The patch should highlight this special behaviour. S. -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2012 at 10:54 AM, Susanne Oberhauser <froh@suse.com> wrote:
I completely agree with Linda, that a less intrusive work-around for the sometimes broken multi-core feature of sort(1) had been an environment variable, that for suse, if unset, is assumed to be 1.
Well, yes. That would seem is what the patch does, although a further test suggests it's not: python -c 'for i in xrange(10000000): print "\n".join([str(i)]*10)' | OMP_MAX_THREADS=4 sort > /dev/null That command line will stress sort, yet it will not seem to use multiple threads. However, python -c 'for i in xrange(10000000): print "\n".join([str(i)]*10)' | sort --parallel=4 > /dev/null this works like a charm. Maybe I used the wrong environment variable (strings /usr/bin/sort doesn't show it). Or maybe a better patch is in order. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
thx for like Linda using OMP_MAX_THREADS :) that made me google for it and I discovered openmp.org. The Right Fix, given OpenMP, seems to be: make GNU sort use Claudio Freire <klaussfreire@gmail.com> writes:
On Thu, Sep 20, 2012 at 10:54 AM, Susanne Oberhauser <froh@suse.com> wrote:
I completely agree with Linda, that a less intrusive work-around for the sometimes broken multi-core feature of sort(1) had been an environment variable, that for suse, if unset, is assumed to be 1.
Well, yes. That would seem is what the patch does, although a further test suggests it's not:
python -c 'for i in xrange(10000000): print "\n".join([str(i)]*10)' | OMP_MAX_THREADS=4 sort > /dev/null
That command line will stress sort, yet it will not seem to use multiple threads.
However,
python -c 'for i in xrange(10000000): print "\n".join([str(i)]*10)' | sort --parallel=4 > /dev/null
this works like a charm.
Maybe I used the wrong environment variable (strings /usr/bin/sort doesn't show it). Or maybe a better patch is in order.
correct. the patch we have just sets parallelism to 1, if it was not set with --parallel. The multi-threading of GNU sort predates openMP, thus it does not respect OMP_MAX_THREADS from openmp.org yet. In a way that's exactly the issue Linda stumbled accross :) S. -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2012 at 12:34 PM, Susanne Oberhauser <froh@suse.com> wrote:
correct. the patch we have just sets parallelism to 1, if it was not set with --parallel.
The multi-threading of GNU sort predates openMP, thus it does not respect OMP_MAX_THREADS from openmp.org yet. In a way that's exactly the issue Linda stumbled accross :)
That explains why the variable wasn't on the man pages or strings. Then all this is moot, the patch is correct and the bug report invalid. Because, if sort doesn't use openmp, then there's no reason to expect it to pay attention to OMP_MAX_THREADS. Closed. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Claudio Freire <klaussfreire@gmail.com> writes:
On Thu, Sep 20, 2012 at 12:34 PM, Susanne Oberhauser <froh@suse.com> wrote:
correct. the patch we have just sets parallelism to 1, if it was not set with --parallel.
The multi-threading of GNU sort predates openMP, thus it does not respect OMP_MAX_THREADS from openmp.org yet. In a way that's exactly the issue Linda stumbled accross :)
That explains why the variable wasn't on the man pages or strings.
Then all this is moot, the patch is correct and the bug report invalid.
Because, if sort doesn't use openmp, then there's no reason to expect it to pay attention to OMP_MAX_THREADS.
Closed.
not so fast :) I checked the source... the coreutils respect OMP_MAX_THREADS. for that they use their coreutils-specific num_processors utility helper. Now if we look at the patch, we see that we have killed that environment variable incidentially, by just üpatching out num_processors(). This is a bug in the patch! Index: src/sort.c =================================================================== --- src/sort.c.orig 2012-04-16 13:17:12.342019601 +0200 +++ src/sort.c 2012-04-16 13:17:12.463016705 +0200 @@ -5288,8 +5288,8 @@ main (int argc, char **argv) { if (!nthreads) { - unsigned long int np = num_processors (NPROC_CURRENT_OVERRIDABLE); - nthreads = MIN (np, DEFAULT_MAX_THREADS); + //unsigned long int np = num_processors (NPROC_CURRENT_OVERRIDABLE); + nthreads = 1; //MIN (np, DEFAULT_MAX_THREADS); } /* Avoid integer overflow later. */ =================================================================== I think it should be instead like this (UNTESTED!): Index: src/sort.c =================================================================== --- src/sort.c.orig +++ src/sort.c @@ -5288,8 +5288,12 @@ main (int argc, char **argv) { if (!nthreads) { - unsigned long int np = num_processors (NPROC_CURRENT_OVERRIDABLE); + unsigned long int np; + if (getenv("OMP_NUM_THREADS")) + np = num_processors (NPROC_CURRENT_OVERRIDABLE); + else + np = 1; nthreads = MIN (np, DEFAULT_MAX_THREADS); } /* Avoid integer overflow later. */ -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, Sep 20, 2012 at 1:35 PM, Susanne Oberhauser <froh@suse.com> wrote:
I think it should be instead like this (UNTESTED!):
Index: src/sort.c =================================================================== --- src/sort.c.orig +++ src/sort.c @@ -5288,8 +5288,12 @@ main (int argc, char **argv) { if (!nthreads) { - unsigned long int np = num_processors (NPROC_CURRENT_OVERRIDABLE); + unsigned long int np; + if (getenv("OMP_NUM_THREADS")) + np = num_processors (NPROC_CURRENT_OVERRIDABLE); + else + np = 1; nthreads = MIN (np, DEFAULT_MAX_THREADS); }
/* Avoid integer overflow later. */
A better way would patch num_processors to do that for all coreutils utilities. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Claudio Freire <klaussfreire@gmail.com> writes:
On Thu, Sep 20, 2012 at 1:35 PM, Susanne Oberhauser <froh@suse.com> wrote:
I think it should be instead like this (UNTESTED!):
Index: src/sort.c =================================================================== --- src/sort.c.orig +++ src/sort.c @@ -5288,8 +5288,12 @@ main (int argc, char **argv) { if (!nthreads) { - unsigned long int np = num_processors (NPROC_CURRENT_OVERRIDABLE); + unsigned long int np; + if (getenv("OMP_NUM_THREADS")) + np = num_processors (NPROC_CURRENT_OVERRIDABLE); + else + np = 1; nthreads = MIN (np, DEFAULT_MAX_THREADS); }
/* Avoid integer overflow later. */
A better way would patch num_processors to do that for all coreutils utilities.
To my understanding, it's just sort(1) that has the multi-threaded issue. the proposed patch would thus turn multiprocessing off just for sort, which has the problem, except if OMP_NUM_THREADS is set, or --parallel was given. btw the whole code has another non-obvious feature, which it has upstream, too: --parallel can override DEFAULT_MAX_THREADS, OMP_NUM_THREADS can't. but that's an upstream feature :) S. -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Susanne Oberhauser <froh@suse.com> writes:
the proposed patch would thus turn multiprocessing off just for sort, which has the problem, except if OMP_NUM_THREADS is set, or --parallel was given.
FYI: https://build.opensuse.org/request/show/135850 S. -- Susanne Oberhauser SUSE LINUX Products GmbH +49-911-74053-574 Maxfeldstraße 5 Processes and Infrastructure 90409 Nürnberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (13)
-
Carlos E. R.
-
Claudio Freire
-
Dominique Leuenberger a.k.a DimStar
-
Hubert Mantel
-
Jan Engelhardt
-
Jim Meyering
-
Linda Walsh
-
Marcus Meissner
-
Mike Galbraith
-
Per Jessen
-
Philipp Thomas
-
Ruediger Meier
-
Susanne Oberhauser