Kaare Rasmussen
Hi
Why is the Perl included in SUSE 9.1 with thread support ?
It seems that there are modules that doesn't work because of this, notably CPAN...
I'm installing Interchange. It will not install on a threaded Perl, claiming that more modules that Interchange requires, won't work with threads enabled.
What benefits come from this choice?
On a multiprocessor (or hyper-threading) machine it buys you a WHOLE lot because the threads can execute at the same time. On a single processor it is wasteful of machine cycles, but does allow the programmer to background process things in a slightly more efficient manner than forking allows for. SuSE probably would have been wise to have included both. You _can_ change two lines in the spec file for perl and compile a non-threaded version, but that could break all the SuSE supplied modules, so if you rely on them you should know what you might be getting into. (Take a look at the patches that SuSE applied to perl as well, since they might assume a threaded perl.)
Mark Gray wrote:
Kaare Rasmussen
writes: Hi
Why is the Perl included in SUSE 9.1 with thread support ?
It seems that there are modules that doesn't work because of this, notably CPAN...
I'm installing Interchange. It will not install on a threaded Perl, claiming that more modules that Interchange requires, won't work with threads enabled.
What benefits come from this choice?
On a multiprocessor (or hyper-threading) machine it buys you a WHOLE lot because the threads can execute at the same time.
That depends VERY MUCH on the machine, especially the memory setup, caches and FSB, as well as on the specific application. I've written some multithreaded numerical stuff myself, and I get very good performance on a dual-opteron machine and very sad results on older Pentium-III Hardware because if the problem is essentially memory bound you gain nothing. That said, it buys the lot for day to day stuff when one application operates on a small set of data while another one, e.g. X or seti, runs on the other node. However, this does NOT necessarily apply to hyper-threading, for this technique requires the actual instructions to be parallelizable. This is not possible in my case, so hyper threading only buys me a confused top, time(), etc. I find it hard to write good MT code and impossible to write good HT code, though examples of the latter are said to exist. Maybe that's why I'm only Optio and not Centurio (Asterix and the soothsayer) :-)
On a single processor it is wasteful of machine cycles, but does allow the programmer to background process things in a slightly more efficient manner than forking allows for.
If I understand this right that's not true: man fork: ... Under Linux, fork is implemented using copy-on-write pages, so the only penalty incurred by fork is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child. ... I take this to mean that fork()'ing and pthread_create()'ing are essentially the same. I have no clue though about native pthreads. Again, I'm only optio, if that much.
SuSE probably would have been wise to have included both.
I concur fully. KK
Kolja, On Monday 06 September 2004 16:08, Kolja Kauder wrote:
Mark Gray wrote:
...
On a multiprocessor (or hyper-threading) machine it buys you a WHOLE lot because the threads can execute at the same time.
That depends VERY MUCH on the machine, especially the memory setup, caches and FSB, as well as on the specific application. I've written some multithreaded numerical stuff myself, and I get very good performance on a dual-opteron machine and very sad results on older Pentium-III Hardware because if the problem is essentially memory bound you gain nothing. That said, it buys the lot for day to day stuff when one application operates on a small set of data while another one, e.g. X or seti, runs on the other node. However, this does NOT necessarily apply to hyper-threading, for this technique requires the actual instructions to be parallelizable. This is not possible in my case, so hyper threading only buys me a confused top, time(), etc. I find it hard to write good MT code and impossible to write good HT code, though examples of the latter are said to exist. Maybe that's why I'm only Optio and not Centurio (Asterix and the soothsayer) :-)
Performance analysis, let alone optimization, just gets harder and harder as hardware gets more and more sophisticated. Oops. I mean it gets more and more interesting... I'm not familiar with the breakdown of execution units and how they relate to x86 instructions (let alone to high-level language constructs) nor how much redundancy there is in execution units and data pathways within a Hyper-Threaded Pentium 4 or Xeon processor. My intuition is that a priori there'd be considerable opportunity for overlap within the CPU itself and that patterns of primary storage access (especially overall L2 and L3 cache hit rates) are the dominant ones. After all, for lots of common execution patterns, access to RAM is the limiting factor.
On a single processor it is wasteful of machine cycles, but does allow the programmer to background process things in a slightly more efficient manner than forking allows for.
If I understand this right that's not true: man fork: ... Under Linux, fork is implemented using copy-on-write pages, so the only penalty incurred by fork is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child. ... I take this to mean that fork()'ing and pthread_create()'ing are essentially the same. I have no clue though about native pthreads. Again, I'm only optio, if that much.
Creating a thread does not require a full set of new page table entries, since all the threads of a given process share with their siblings all of their virtual memory environment except for their stacks.
...
KK
Randall Schulz
Randall R Schulz wrote:
Kolja,
Performance analysis, let alone optimization, just gets harder and harder as hardware gets more and more sophisticated. Oops. I mean it gets more and more interesting...
I'm not familiar with the breakdown of execution units and how they relate to x86 instructions (let alone to high-level language constructs) nor how much redundancy there is in execution units and data pathways within a Hyper-Threaded Pentium 4 or Xeon processor. My intuition is that a priori there'd be considerable opportunity for overlap within the CPU itself and that patterns of primary storage access (especially overall L2 and L3 cache hit rates) are the dominant ones. After all, for lots of common execution patterns, access to RAM is the limiting factor.
To clarify, my knowledge of HT comes primarily from http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html and own experiments. As far as I know, in the P4 architecture there are only two parallel execution paths, though only one can be used for "complex" instructions, thus it's only "one and and a half" Processor the best of times. Now, if RAM access is the delimiting factor no SMP solution can help and the Opteron's NUMA capability is a great improvement. An optimal HT situation would be one simple process and a more complex one simultaneously, with one operating on a small data set, i.e. one that fits into L1 (which in turn is very small in a P4) with space left, and the other one maybe on a larger, but contiguous data set to allow streaming and SIMD. These would still have to be written and scheduled well to intertwine fruitfully. If, on the other hand, the scheduler sees two (virtual) processors, it would give "one", i.e. half the CPU time to your one important process while wasting the other half on not important processes, e.g. seti@home.
Randall Schulz
KK
Hi Sorry for the late answer, but time is sparse.
On a multiprocessor (or hyper-threading) machine it buys you a WHOLE lot because the threads can execute at the same time. On a single
But still. Is it worth it when there still are many modules that breaks, even standard ones like CPAN ? I'm installing Interchange, and it won't even begin the install on a threaded system. Not because Interchange itself has issues with threads, but it relies on some modules that do. And the users will blame Interchange :-)
have included both. You _can_ change two lines in the spec file for perl and compile a non-threaded version, but that could break all the
Well, the solution is to compile a perl from scratch and install all necessary modules. Takes a while, but I'm thankful that the days of 56 kbps modems are gone :-) -- Kaare Rasmussen --Linux, spil,-- Tlf: 3816 2582 Kaki Data tshirts, merchandize Fax: 3816 2501 Nordre Fasanvej 12 Åben 12.00-18.00 Email: kar@kakidata.dk 2000 Frederiksberg Lørdag 12.00-16.00 Web: www.suse.dk
participants (4)
-
Kaare Rasmussen
-
Kolja Kauder
-
Mark Gray
-
Randall R Schulz