Peter Hofer wrote:
show_stat() in fs/proc/stat.c generates the /proc/stat content and prints the CPU-wide values, updated indirectly from account_user_time(), and also converts them via cputime_to_clock_t().
I can easily have missed something -- and if I have, please tell me -- but it seems to me that user and system times from /proc/stat and /proc/pid/stat should have matching units. My next best guess is that there are things attributed to the utime/stime of a process which are attributed to different activities on the CPU level.
Double and triple sigh! The docs out of date w/r/t/ the code? Unheard of!! ;^) But the jiffies would seem to be a questionable indicator of cpu time, since they don't seem to be based on an absolute time, but an amount of cpu time. The amount of that would vary based on what speed each of the cores are going. I don't know about your specific cpu model, but things like 'turbo mode', where the clock runs at 1-4, more, added multiples of 133MHz when some or all of the other cores are idle. From data displayed by 'cpufreq-info', it shows my cpu with: maximum transition latency: 10.0 us. hardware limits: 1.60 GHz - 2.79 GHz available frequency steps: 2.79 GHz, 2.66 GHz, 2.53 GHz, 2.39 GHz, 2.26 GHz, 2.13 GHz, 2.00 GHz, 1.86 GHz, 1.73 GHz, 1.60 GHz or 12x - 21x times the base bus speed of 133MHz. My chip X5660, while rated at 2.8GHz, can go as fast as 24x base bus speed or 3.2GHz in turbo mode when only 1 core is active. Of note the .01GHz max latency time -- which skew the numbers a bit depending on how often the cpu transitions. Maybe if you disabled power control and forced your cpu into 1 speed the numbers might get closer? Erratum depressing: ------------------ The worst news to me is how accurate perf events and timer events really are. In the processor erratum on the 5600 series, it includes dropped interrupts for the timer among others. (see http://www.intel.com/content/dam/www/public/us/en/documents/specification-up... for the full 5-page erratum which is rather depresssing). Thread v. Procs & parallelism -----------------------------
interesting to see how well python does parallelization. [...] Your 8 threads utilize about 1.9 cores.
Interesting! That's probably because of CPython's global interpreter lock (GIL). I didn't really have full utilization in mind for that script, since the effect seems to occur whenever there's more than one thread involved.
Python's multiprocessing package should offer better utilization since it avoids the GIL by spawning processes instead of threads, which in this case would defeat the purpose of the script.
--- But mapping it to procs would be a better mapping on linux to reality, as linux threads are built on procs with varying amounts of memory shared. On linux, there's rarely a use case where threads will outperform procs. On windows is a different matter -- there thread spawning is cheaper vs. process creation, but I think, because linux process create is so efficient vs. windows, that even building threads on top of procs linux's thread creation time is still faster than window's thread-creation time. Wwasn't always the case, but more work was put in to the kernel to optimize thread creation as threads were notably, "some 'delta'", above the process create time when they first came out. I think they might be some small percent faster now, but I wouldn't wager one way or the other -- certainly no where near the benefit threads are on windows. Perl, BTW, went the way of linux -- building it's threads on procs, which is a likely contributing cause to their efficiency on linux. ---- How timing is calculated...(?!) -------------------------------- Oh, another "gotcha", is how timing is done. During boot, on my system, I see: [ 0.000000] hpet clockevent registered [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.001000] tsc: Detected 2792.844 MHz processor [ 0.000004] Calibrating delay loop (skipped), value calculated using timer frequency.. 5585.68 BogoMIPS (lpj=2792844) It appears to calculate jiffies (I think lpj=loops/jiffy) it is using the Hz of the processor. Then sometime later I see: [ 1.307104] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0 [ 1.312644] hpet0: 4 comparators, 64-bit 14.318180 MHz counter [ 1.320680] Switched to clocksource hpet So the jiffies calculated via the processors's speed, are counted in terms of the HPET clocksource, which is accurate to about .014ms. @ 5586 jiffies, that equates to a maximum accuracy of ~80 jiffies (79.98) / tick. That, alone could account for quite a skew. Given that skewing, and the erratum problems, I wouldn't expect too much in the way of accurate timing info on one of these processors... *cough*...(sigh) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org