RE: [suse-amd64] many lost ticks
On Thursday 27 October 2005 19:42, Langsdorf, Mark wrote:
Unofficially, clock=pmtmr should at least reduce the amount of lost ticks.
I doubt it will help on 64bit kernels, which don't have this option (ok placebos are sometimes known to help on computers too clock= only exists on 32bit.
I'll defer to you, Andi, but 2.6.13.4 definitely has PM timer support for AMD64. Is there really no way to force a particular time source on AMD64?
notsc might do the trick on 64bit though. However it will only work on multi processor systems. On uniprocessor there is currently no way to force another clock.
Okay. Would a patch to change this be accepted? -Mark Langsdorf AMD, Inc.
On Thursday 27 October 2005 21:42, Langsdorf, Mark wrote:
On Thursday 27 October 2005 19:42, Langsdorf, Mark wrote:
Unofficially, clock=pmtmr should at least reduce the amount of lost ticks.
I doubt it will help on 64bit kernels, which don't have this option (ok placebos are sometimes known to help on computers too clock= only exists on 32bit.
I'll defer to you, Andi, but 2.6.13.4 definitely has PM timer support for AMD64. Is there really no way to force a particular time source on AMD64?
64bit doesn't have the modular timer drivers of 32bit. There is notsc and nohpet and nopmtimer to disable specific time sources. But TSC use on UP is somewhat hardcoded and it doesn't use the decision path of the SMP version, so far it would always use TSC. As a workaround for SUSE users it might work to manually install a SMP kernel though - during the installation unselect the kernel-default kernel and manually select kernel-smp in the package selection. Then notsc and the other options should work I think.
notsc might do the trick on 64bit though. However it will only work on multi processor systems. On uniprocessor there is currently no way to force another clock.
Okay. Would a patch to change this be accepted?
Probably yes -Andi
Andi Kleen wrote:
On Thursday 27 October 2005 21:42, Langsdorf, Mark wrote:
On Thursday 27 October 2005 19:42, Langsdorf, Mark wrote:
Unofficially, clock=pmtmr should at least reduce the amount of lost ticks. I doubt it will help on 64bit kernels, which don't have this option (ok placebos are sometimes known to help on computers too clock= only exists on 32bit. I'll defer to you, Andi, but 2.6.13.4 definitely has PM timer support for AMD64. Is there really no way to force a particular time source on AMD64?
64bit doesn't have the modular timer drivers of 32bit.
There is notsc and nohpet and nopmtimer to disable specific time sources.
But TSC use on UP is somewhat hardcoded and it doesn't use the decision path of the SMP version, so far it would always use TSC.
As a workaround for SUSE users it might work to manually install a SMP kernel though - during the installation unselect the kernel-default kernel and manually select kernel-smp in the package selection. Then notsc and the other options should work I think.
notsc might do the trick on 64bit though. However it will only work on multi processor systems. On uniprocessor there is currently no way to force another clock. Okay. Would a patch to change this be accepted?
Probably yes
-Andi
Hi All, Quite enjoying finding out just how screwed up my system is ;) Err, I think my machine was automatically detected as SMP, with the appropriate kernel installed (it's a dual core machine). Incidentally, do these problems only relate to X2 chips and dual Opterons? If so, won't they all get SMP kernels by default? So the only thing needed is the nostc adding to the boot command? The only reason I ask is that I amended the following to my boot command: noapic clock=pmtmr nostc, but I still find that the clock is running too fast - it probably gains 10 mins over the space of an hour. Last thing - and apologies if someone has already mentioned this, but what is the definitive way to check whether a machine is losing ticks? Best wishes, Jon. -- Jonathan Brooks (Ph.D.) Research Assistant. PaIN Group, Department of Human Anatomy & Genetics, University of Oxford, South Parks Road, Oxford, OX1 3QX tel: +44(0)1865-282654 fax: +44(0)1865-282656 web: http://www.fmrib.ox.ac.uk/~jon
On Thursday 27 October 2005 22:12, Jonathan Brooks wrote:
Err, I think my machine was automatically detected as SMP, with the appropriate kernel installed (it's a dual core machine).
Then notsc should work if it's really needed.
Incidentally, do these problems only relate to X2 chips and dual Opterons? If so, won't they all get SMP kernels by default? So the only thing needed is the nostc adding to the boot command?
The only reason I ask is that I amended the following to my boot command: noapic clock=pmtmr nostc, but I still find that the clock is running too fast - it probably gains 10 mins over the space of an hour.
It would surprise me if notsc would fix that, but you can try.
Last thing - and apologies if someone has already mentioned this, but what is the definitive way to check whether a machine is losing ticks?
The kernel prints a message in the log. If you don't see it you don't have the problem. -Andi
Andi Kleen wrote:
On Thursday 27 October 2005 22:12, Jonathan Brooks wrote:
Err, I think my machine was automatically detected as SMP, with the appropriate kernel installed (it's a dual core machine).
Then notsc should work if it's really needed.
Incidentally, do these problems only relate to X2 chips and dual Opterons? If so, won't they all get SMP kernels by default? So the only thing needed is the nostc adding to the boot command?
The only reason I ask is that I amended the following to my boot command: noapic clock=pmtmr nostc, but I still find that the clock is running too fast - it probably gains 10 mins over the space of an hour.
It would surprise me if notsc would fix that, but you can try.
Last thing - and apologies if someone has already mentioned this, but what is the definitive way to check whether a machine is losing ticks?
The kernel prints a message in the log. If you don't see it you don't have the problem.
-Andi
Hi Andi, I recently saw a suggestion for running cat /proc/interrupts, or something like that to check on the timing of updates. Periodically I do see many lost ticks with dmesg - hence starting this thread ;) I confess to knowing absolutely nothing about timing in Linux, so can't comment on whether these kernel parameters should affect the time reported by the OS, but it seems logical to me....? Best wishes, Jon. -- Jonathan Brooks (Ph.D.) Research Fellow PaIN Group, Department of Human Anatomy & Genetics University of Oxford, South Parks Road, Oxford, OX1 3QX tel: 01865 282654 fax: 01865 282656
On Thursday 27 October 2005 23:54, Jonathan Brooks wrote:
I recently saw a suggestion for running cat /proc/interrupts, or something like that to check on the timing of updates.
watch -n1 cat /proc/interrupts and check that you roughly get 250 interrupt 0s each second (or 1000 on 9.3). But it doesn't sound like this is your problem.
Periodically I do see many lost ticks with dmesg - hence starting this
You mean you see some of them each hour where you see the slowdown? Check it for a few hours and compare the time stamps in the log.
thread ;) I confess to knowing absolutely nothing about timing in Linux, so can't comment on whether these kernel parameters should affect the time reported by the OS, but it seems logical to me....?
You can test if notsc helps, but it's only a small chance imho. -Andi
Hi, Further to the last email I found the command to see what was happening with the time: watch -n1 'date ; cat /proc/interrupts' Oh - and it seems like one of the following 2 things has "cured" my missing seconds: 1.) adding noapic clock=pmtmr notsc to the boot command or 2.) using Nvidia's Nforce driver (nvnet) for the Nforce4's CK802(?) NIC, instead of forcedeth with the module settings optimization=1 (CPU load optimization) and the polling interval set to 1000 us (poll_interval_in_us=1000) I hate it when I come to threads like this, with a big long list of things that might possibly be helping cure the problem, but never getting to the bottom of it - and hence never properly understanding it :( Anyone want to take a step back and summarise the problem for the enlightenment of the non-understanding plebs out here (like me)! Cheers, Jon Jonathan Brooks wrote:
Andi Kleen wrote:
On Thursday 27 October 2005 22:12, Jonathan Brooks wrote:
Err, I think my machine was automatically detected as SMP, with the appropriate kernel installed (it's a dual core machine).
Then notsc should work if it's really needed.
Incidentally, do these problems only relate to X2 chips and dual Opterons? If so, won't they all get SMP kernels by default? So the only thing needed is the nostc adding to the boot command?
The only reason I ask is that I amended the following to my boot command: noapic clock=pmtmr nostc, but I still find that the clock is running too fast - it probably gains 10 mins over the space of an hour.
It would surprise me if notsc would fix that, but you can try.
Last thing - and apologies if someone has already mentioned this, but what is the definitive way to check whether a machine is losing ticks?
The kernel prints a message in the log. If you don't see it you don't have the problem.
-Andi
Hi Andi,
I recently saw a suggestion for running cat /proc/interrupts, or something like that to check on the timing of updates.
Periodically I do see many lost ticks with dmesg - hence starting this thread ;) I confess to knowing absolutely nothing about timing in Linux, so can't comment on whether these kernel parameters should affect the time reported by the OS, but it seems logical to me....?
Best wishes,
Jon.
-- Jonathan Brooks (Ph.D.) Research Fellow PaIN Group, Department of Human Anatomy & Genetics University of Oxford, South Parks Road, Oxford, OX1 3QX tel: 01865 282654 fax: 01865 282656
Andi Kleen wrote:
On Thursday 27 October 2005 23:54, Jonathan Brooks wrote:
I recently saw a suggestion for running cat /proc/interrupts, or something like that to check on the timing of updates.
watch -n1 cat /proc/interrupts and check that you roughly get 250 interrupt 0s each second (or 1000 on 9.3). But it doesn't sound like this is your problem.
Okay for CPU 0 I get approximately 250 ticks each second, and on CPU 1 I get 7 or 8.
Periodically I do see many lost ticks with dmesg - hence starting this
You mean you see some of them each hour where you see the slowdown? Check it for a few hours and compare the time stamps in the log.
thread ;) I confess to knowing absolutely nothing about timing in Linux, so can't comment on whether these kernel parameters should affect the time reported by the OS, but it seems logical to me....?
You can test if notsc helps, but it's only a small chance imho.
Well something seems to have helped :)
-Andi
-- Jonathan Brooks (Ph.D.) Research Fellow PaIN Group, Department of Human Anatomy & Genetics University of Oxford, South Parks Road, Oxford, OX1 3QX tel: 01865 282654 fax: 01865 282656
Hi Andi, Just that - nothing else. Machine is now keeping perfect time, and there are now no messages about lost ticks (at least not for the last 12 hours). Last thing when I look at the output of dmesg I get the following (see below), I am slightly worried, since I have read somewhere that the kernel needs to be patched to include a more recent version of powernow-k8. Is this error expected, or should I patch? Best wishes, jon. powernow-k8: Found 2 AMD Athlon 64 / Opteron processors (version 1.50.3) powernow-k8: 0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV) powernow-k8: 1 : fid 0x2 (1000 MHz), vid 0x12 (1100 mV) cpu_init done, current fid 0xe, vid 0x8 limiting to cpu 2 failed limiting to cpu 3 failed limiting to cpu 4 failed limiting to cpu 5 failed limiting to cpu 6 failed limiting to cpu 7 failed limiting to cpu 8 failed limiting to cpu 9 failed limiting to cpu 10 failed limiting to cpu 11 failed limiting to cpu 12 failed limiting to cpu 13 failed limiting to cpu 14 failed limiting to cpu 15 failed limiting to cpu 16 failed limiting to cpu 17 failed limiting to cpu 18 failed limiting to cpu 19 failed limiting to cpu 20 failed limiting to cpu 21 failed limiting to cpu 22 failed limiting to cpu 23 failed limiting to cpu 24 failed limiting to cpu 25 failed limiting to cpu 26 failed limiting to cpu 27 failed limiting to cpu 28 failed limiting to cpu 29 failed limiting to cpu 30 failed limiting to cpu 31 failed limiting to cpu 32 failed limiting to cpu 33 failed limiting to cpu 34 failed limiting to cpu 35 failed limiting to cpu 36 failed limiting to cpu 37 failed limiting to cpu 38 failed limiting to cpu 39 failed limiting to cpu 40 failed limiting to cpu 41 failed limiting to cpu 42 failed limiting to cpu 43 failed limiting to cpu 44 failed limiting to cpu 45 failed limiting to cpu 46 failed limiting to cpu 47 failed limiting to cpu 48 failed limiting to cpu 49 failed limiting to cpu 50 failed limiting to cpu 51 failed limiting to cpu 52 failed limiting to cpu 53 failed limiting to cpu 54 failed limiting to cpu 55 failed limiting to cpu 56 failed limiting to cpu 57 failed limiting to cpu 58 failed limiting to cpu 59 failed limiting to cpu 60 failed limiting to cpu 61 failed limiting to cpu 62 failed limiting to cpu 63 failed limiting to cpu 64 failed limiting to cpu 65 failed limiting to cpu 66 failed limiting to cpu 67 failed limiting to cpu 68 failed limiting to cpu 69 failed limiting to cpu 70 failed limiting to cpu 71 failed limiting to cpu 72 failed limiting to cpu 73 failed limiting to cpu 74 failed limiting to cpu 75 failed limiting to cpu 76 failed limiting to cpu 77 failed limiting to cpu 78 failed limiting to cpu 79 failed limiting to cpu 80 failed limiting to cpu 81 failed limiting to cpu 82 failed limiting to cpu 83 failed limiting to cpu 84 failed limiting to cpu 85 failed limiting to cpu 86 failed limiting to cpu 87 failed limiting to cpu 88 failed limiting to cpu 89 failed limiting to cpu 90 failed limiting to cpu 91 failed limiting to cpu 92 failed limiting to cpu 93 failed limiting to cpu 94 failed limiting to cpu 95 failed limiting to cpu 96 failed limiting to cpu 97 failed limiting to cpu 98 failed limiting to cpu 99 failed limiting to cpu 100 failed limiting to cpu 101 failed limiting to cpu 102 failed limiting to cpu 103 failed limiting to cpu 104 failed limiting to cpu 105 failed limiting to cpu 106 failed limiting to cpu 107 failed limiting to cpu 108 failed limiting to cpu 109 failed limiting to cpu 110 failed limiting to cpu 111 failed limiting to cpu 112 failed limiting to cpu 113 failed limiting to cpu 114 failed limiting to cpu 115 failed limiting to cpu 116 failed limiting to cpu 117 failed limiting to cpu 118 failed limiting to cpu 119 failed limiting to cpu 120 failed limiting to cpu 121 failed limiting to cpu 122 failed limiting to cpu 123 failed limiting to cpu 124 failed limiting to cpu 125 failed limiting to cpu 126 failed limiting to cpu 127 failed Andi Kleen wrote:
On Friday 28 October 2005 00:29, Jonathan Brooks wrote:
You can test if notsc helps, but it's only a small chance imho.
Well something seems to have helped :)
What did you do exactly? Add these options? Anything else?
-Andi
-- Jonathan Brooks (Ph.D.) Research Assistant. PaIN Group, Department of Human Anatomy & Genetics, University of Oxford, South Parks Road, Oxford, OX1 3QX tel: +44(0)1865-282654 fax: +44(0)1865-282656 web: http://www.fmrib.ox.ac.uk/~jon
On Friday 28 October 2005 11:45, Jonathan Brooks wrote:
Hi Andi,
Just that - nothing else. Machine is now keeping perfect time, and there are now no messages about lost ticks (at least not for the last 12 hours).
Ok so it is safe to say that notsc helped in your case? Or did you change other things too?
Last thing when I look at the output of dmesg I get the following (see below), I am slightly worried, since I have read somewhere that the kernel needs to be patched to include a more recent version of powernow-k8. Is this error expected, or should I patch?
Known problem, but harmless. Just ignore for now. Some future update kernel will likely fix it. -Andi
Hi Andi, Providing that a) me switching from the forcedeth to the nvnet NIC module didn't somehow fix this, and b)clock=pmtmr does nothing on my system, then yes, nostc has fixed the problem. Oh, I could try removing noapic from the boot command and see if that has any influence. Will do that now. Can't switch back to forcedeth since it was causing me so many problems and I need to work :( Cheers, Jon Andi Kleen wrote:
On Friday 28 October 2005 11:45, Jonathan Brooks wrote:
Hi Andi,
Just that - nothing else. Machine is now keeping perfect time, and there are now no messages about lost ticks (at least not for the last 12 hours).
Ok so it is safe to say that notsc helped in your case? Or did you change other things too?
Last thing when I look at the output of dmesg I get the following (see below), I am slightly worried, since I have read somewhere that the kernel needs to be patched to include a more recent version of powernow-k8. Is this error expected, or should I patch?
Known problem, but harmless. Just ignore for now. Some future update kernel will likely fix it.
-Andi
-- Jonathan Brooks (Ph.D.) Research Assistant. PaIN Group, Department of Human Anatomy & Genetics, University of Oxford, South Parks Road, Oxford, OX1 3QX tel: +44(0)1865-282654 fax: +44(0)1865-282656 web: http://www.fmrib.ox.ac.uk/~jon
participants (3)
-
Andi Kleen
-
Jonathan Brooks
-
Langsdorf, Mark