Mailinglist Archive: opensuse-kernel (75 mails)

< Previous Next >
[opensuse-kernel] Re: user/sys ticks for process exceed overall user/sys ticks
Peter Hofer wrote:


show_stat() in fs/proc/stat.c generates the /proc/stat content and
prints the CPU-wide values, updated indirectly from
account_user_time(), and also converts them via
cputime_to_clock_t().

I can easily have missed something -- and if I have, please tell me
-- but it seems to me that user and system times from /proc/stat and
/proc/pid/stat should have matching units. My next best guess is
that there are things attributed to the utime/stime of a process
which are attributed to different activities on the CPU level.
---

Double and triple sigh! The docs out of date w/r/t/ the code?
Unheard of!! ;^) But the jiffies would seem to be a questionable
indicator of cpu time, since they don't seem to be based on an
absolute time, but an amount of cpu time. The amount of that would
vary based on what speed each of the cores are going.

I don't know about your specific cpu model, but things like 'turbo
mode', where the clock runs at 1-4, more, added multiples of 133MHz
when some or all of the other cores are idle.


From data displayed by 'cpufreq-info', it shows my cpu with:

maximum transition latency: 10.0 us.
hardware limits: 1.60 GHz - 2.79 GHz
available frequency steps: 2.79 GHz, 2.66 GHz, 2.53 GHz, 2.39 GHz,
2.26 GHz, 2.13 GHz, 2.00 GHz, 1.86 GHz,
1.73 GHz, 1.60 GHz

or 12x - 21x times the base bus speed of 133MHz. My chip X5660,
while rated at 2.8GHz, can go as fast as 24x base bus speed or
3.2GHz in turbo mode when only 1 core is active.

Of note the .01GHz max latency time -- which skew the numbers a bit
depending on how often the cpu transitions.

Maybe if you disabled power control and forced your cpu into 1 speed
the numbers might get closer?


Erratum depressing:
------------------


The worst news to me is how accurate perf events and timer events
really are. In the processor erratum on the 5600 series, it includes
dropped interrupts for the timer among others.
(see http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5600-specification-update.pdf
for the full 5-page erratum which is rather depresssing).





Thread v. Procs & parallelism
-----------------------------



interesting to see how well python does parallelization. [...]
Your 8 threads utilize about 1.9 cores.

Interesting! That's probably because of CPython's global interpreter
lock (GIL). I didn't really have full utilization in mind for that
script, since the effect seems to occur whenever there's more than
one thread involved.

Python's multiprocessing package should offer better utilization
since it avoids the GIL by spawning processes instead of threads,
which in this case would defeat the purpose of the script.

---


But mapping it to procs would be a better mapping on linux to
reality, as linux threads are built on procs with varying amounts of
memory shared.

On linux, there's rarely a use case where threads will outperform
procs. On windows is a different matter -- there thread spawning is
cheaper vs. process creation, but I think, because linux process
create is so efficient vs. windows, that even building threads on top
of procs linux's thread creation time is still faster than window's
thread-creation time. Wwasn't always the case, but more work was put
in to the kernel to optimize thread creation as threads were notably,
"some 'delta'", above the process create time when they first came
out. I think they might be some small percent faster now, but
I wouldn't wager one way or the other -- certainly no where near the
benefit threads are on windows.

Perl, BTW, went the way of linux -- building it's threads on procs,
which is a likely contributing cause to their efficiency on linux.

----
How timing is calculated...(?!)
--------------------------------


Oh, another "gotcha", is how timing is done. During boot,
on my system, I see:

[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.001000] tsc: Detected 2792.844 MHz processor
[ 0.000004] Calibrating delay loop (skipped), value calculated using timer frequency.. 5585.68 BogoMIPS (lpj=2792844)

It appears to calculate jiffies (I think lpj=loops/jiffy) it
is using the Hz of the processor. Then sometime later I see:

[ 1.307104] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
[ 1.312644] hpet0: 4 comparators, 64-bit 14.318180 MHz counter
[ 1.320680] Switched to clocksource hpet

So the jiffies calculated via the processors's speed, are counted
in terms of the HPET clocksource, which is accurate to about
.014ms. @ 5586 jiffies, that equates to a maximum accuracy of
~80 jiffies (79.98) / tick.

That, alone could account for quite a skew.


Given that skewing, and the erratum problems, I wouldn't expect too much in the way of accurate timing info on one of these processors...
*cough*...(sigh)






--
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@xxxxxxxxxxxx
To contact the owner, e-mail: opensuse-kernel+owner@xxxxxxxxxxxx

< Previous Next >
List Navigation
Follow Ups
References