[Bug 350980] New: The system stops doing things till I press a key or move the mouse: every things halts.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Summary: The system stops doing things till I press a key or move the mouse: every things halts. Product: openSUSE 10.3 Version: Final Platform: i686 OS/Version: openSUSE 10.3 Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: robin.listas@telefonica.net QAContact: qa@suse.de Found By: --- This is very weird. Sometimes, the system simply stops "working". Doesn't crash, simply all tasks seems stopped. Nothing refreshes in the display. The clock stops. A script running a loop in a console: #!/bin/bash while true ; do # date +"%T" | tee -a /home/cer/marca.log date +"%T" >> /home/cer/marca.log sleep 1 done stops as well. Then, as soon as I move the mouse or touch a key (shift, for instance) everything continues and the clock jumps some seconds or minutes to the correct time. I have tried several things, as discussed in the "opensuse" mail list (<http://lists.opensuse.org/opensuse/2007-12/msg01780.html>). No effect. - check the desktop for powersave features - nothing, and there is nothing configurable that would kick in in seconds of no user activity. - pass cpufreq=no as a boot argument --> no good. - Recompile the kernel with "NO_HZ" --> even worse (this is the current state). I will recompile again to undo. And more things I forgot, but you can read the whole story in the mail list or the link above. Sometimes, very random, the system stops completely and does not respond to keyboard, but it awakes when I boot another PC, which I think tries to connect or ask for a service (ntp or dns). My current hack is to leave my router continuously pinging me. My guess is that the pings cause the network card to interrupt the kernel, so it keeps awake. Keyboard or mouse also provokes interrupts. I'm not sure about usb activity. This might be related to other bugs reported by me, I don't know (spamd timeout?) I have been "playing" with the clock (Bug 344356 and a new one not reported yet). The thing is that the kernel defaults to using "acpi_pm" clock and this one misses whole minutes. I have to force use "tsc" (which kernel disabled at boot). jiffies crashes. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c3 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #3 from Oliver Neukum <oneukum@novell.com> 2008-01-07 02:11:41 MST --- Could you add this line: cat /proc/interrupts >> /home/cer/marca.log to your script and post a part of the log taken when the system runs normally and while it was stopped? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c5 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|robin.listas@telefonica.net | --- Comment #5 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-07 18:01:41 MST --- Created an attachment (id=189654) --> (https://bugzilla.novell.com/attachment.cgi?id=189654) requested info Reply to Comment #3 From Oliver Neukum
Could you add this line:
cat /proc/interrupts >> /home/cer/marca.log
Done. Today of all days it did not want to cooperate being lazy while I was looking: I guess it knew you were waiting and behaved naughtily. I had to stop mail fetch, ntp daemon, and wait for a minute in several tries, but at last I got it. Search for the time mark "=====> 01:35:15" in the attached file: it jumps to "01:35:43", almost half a minute stopped. There are a few smaller jumps, like "01:32:46..48", "01:33:55..57"; ah, another big one: 01:34:31..01:35:06". A one second jump could be doubtful, but a half a minute one I suppose not. I don't know what you are looking for, but the timer interrupt seems to be ticking :-? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c6 --- Comment #6 from Oliver Neukum <oneukum@novell.com> 2008-01-08 14:59:40 MST --- In the jump from 01:35:15 to 01:35:43 you just have 73 timer ticks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c7 --- Comment #7 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-08 16:59:44 MST --- Ah! How can that be? If they are still 19.2 ticks/second, I should have had 537 ticks. I don't know how the timer works in linux, but I assume that, as the clock shows the correct time, the system uses TSC to estimate the lost interrupts correctly. Is that so? Now, questions... can this be a hardware problem and the timer chip doesn't produces the interrupts? However, in the test the other day with 10.2 there were no symptoms. Is the kernel reprogramming the chip to go slower? Is the kernel ignoring the interrupts? The only thing I can do, as I lack the knowledge of kernel internals, is to try again in 10.2 and measure the ticks over a period. Do you want me to try that? Note 1: This thing must be related to Bug 350981 Note 2: These days I'm using my personalized kernel, instead of the default suse kernel I used the other day for testing (which has the same symptoms). I play with different settings in an attempt to solve this. These are my different settings (diff -y --suppress-common-lines .config /boot/config-2.6.22.13-0.3-default): mine yours # Linux kernel version: 2.6.22.13 < # Sun Dec 30 14:30:44 2007 < CONFIG_BROKEN_ON_SMP=y | CONFIG_LOCK_KERNEL=y CONFIG_LOCALVERSION="-cer" | CONFIG_LOCALVERSION="-default" > CONFIG_CPUSETS=y > CONFIG_STOP_MACHINE=y CONFIG_HIGH_RES_TIMERS=y | # CONFIG_HIGH_RES_TIMERS is not set # CONFIG_SMP is not set | CONFIG_SMP=y # CONFIG_PARAVIRT is not set | CONFIG_PARAVIRT=y > CONFIG_VMI=y # CONFIG_M586 is not set | CONFIG_M586=y CONFIG_MPENTIUM4=y | # CONFIG_MPENTIUM4 is not set > CONFIG_X86_PPRO_FENCE=y > CONFIG_X86_F00F_BUG=y CONFIG_X86_GOOD_APIC=y | CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_USE_PPRO_CHECKSUM=y < CONFIG_X86_TSC=y < CONFIG_X86_CMOV=y < > CONFIG_NR_CPUS=32 > CONFIG_SCHED_SMT=y > CONFIG_SCHED_MC=y > # CONFIG_PREEMPT_BKL is not set > CONFIG_X86_MCE_P4THERMAL=y # CONFIG_TOSHIBA is not set | CONFIG_TOSHIBA=m # CONFIG_I8K is not set | CONFIG_I8K=m > # CONFIG_IRQBALANCE is not set # CONFIG_HZ_250 is not set | CONFIG_HZ_250=y CONFIG_HZ_300=y | # CONFIG_HZ_300 is not set CONFIG_HZ=300 | CONFIG_HZ=250 > CONFIG_HOTPLUG_CPU=y > CONFIG_SUSPEND_SMP=y > CONFIG_ACPI_HOTPLUG_CPU=y # CONFIG_DMASCC is not set < # CONFIG_IRPORT_SIR is not set < # CONFIG_SBPCD is not set < # CONFIG_CM206 is not set < # CONFIG_CDU31A is not set < # CONFIG_NI5010 is not set < # CONFIG_PCMCIA_XIRTULIP is not set < # CONFIG_ISDN_DRV_LOOP is not set < # CONFIG_HYSDN is not set < # CONFIG_RISCOM8 is not set < # CONFIG_STALLION is not set < # CONFIG_ISTALLION is not set < # CONFIG_I2C_ELEKTOR is not set < # CONFIG_DEBUG_INFO is not set | CONFIG_DEBUG_INFO=y CONFIG_DEBUG_RODATA=y | # CONFIG_DEBUG_RODATA is not set > CONFIG_GENERIC_PENDING_IRQ=y > CONFIG_X86_SMP=y > CONFIG_X86_HT=y > CONFIG_X86_TRAMPOLINE=y -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User jeffm@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c8 Jeff Mahoney <jeffm@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #8 from Jeff Mahoney <jeffm@novell.com> 2008-01-08 17:17:22 MST --- What happens when you boot an openSUSE kernel with nohz=off? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c9 --- Comment #9 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-08 17:45:37 MST --- One of my tests was to recompile with NO_HZ last December 25th: no effect. But if you want to be sure, I'll reboot with the default kernel with that option. cpufreq=no --> <http://lists.opensuse.org/opensuse/2007-12/msg02136.html> (no effect) <http://lists.opensuse.org/opensuse/2007-12/msg02754.html> kernel recompiled with NO_HZ disabled: no effect. Ok, I edited my grub menu.lst and will reboot now. I will report back. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c10 --- Comment #10 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-08 19:22:03 MST --- Ok, I rebooted with title MAIN openSUSE 10.3 (default) root (hd0,5) kernel /vmlinuz root=/dev/disk/by-id/ata-ST3160021A_5JS4VV1F-part6 vga=0x317 hwprobe=-modules.pata resume=/dev/hda5 splash=verbose showopts apic nohz=off initrd /initrd and I watched the clock for a few minutes, with no "laziness" on the part of the clock. I will have to leave the script running for a longer period, and then analyze the log watching for lapses. Do you have an idea to detect those? With 'grep "=====>" marca.log' I can get the time marks, but then I need to evaluate the time difference from one to the next and calculate the time... Ah, got it, I'll use this script: #!/bin/bash ANTERIOR_U=`date +%s` sleep 1 while true ; do ACTUAL_U=`date +%s` ACTUAL=`date +"%T"` DIFF=$(( $ACTUAL_U - $ANTERIOR_U )) if test $DIFF -gt 1 ; then echo -e "=====> " $ACTUAL\\t "(*******" $DIFF "*******)" \ | tee -a /home/cer/marca.log else echo -e "=====> " $ACTUAL\\t\($DIFF\) | \ tee -a /home/cer/marca.log fi ANTERIOR_U=$ACTUAL_U cat /proc/interrupts >> /home/cer/marca.log sleep 1 done It took me some time to get this running, don't laugh too much at my scripting O:-) I'll leave it running some time. Too late already, I should be sleeping, perhaps tomorrow. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c11 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|robin.listas@telefonica.net | --- Comment #11 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-08 20:38:45 MST --- Worse luck. :-( I sent the computer to hibernate to disk, as I always do, and it crashed (keyboard stuck), about 1.5 centimeters in the progress bar: and that's all I could see, as there is no verbose mode for hibernating, except that damm cute green display with a progress bar that doesn't allow us "hackers" to see what is going on, or not going on, as this time. First time in one or two months. I hesitate to restart it again in nohz=off mode, I hope you understand that. I had also a warning during boot in this mode: first time it stopped during the network start phase, and I rebooted it (I thought the adsl might be down just at that moment). The second time it slowed for a few seconds there, then went on, so I didn't give it much thought. But seeing that it crashed/locked later when attempting to hibernate, all I can say is that the nohz=off mode causes my computer to be unstable. I don't remember it to be unstable when I selected this option at kernel compile time, though... And now I'm really off to sleep. You don't see it in the bugzilla log, but here it is 4:35 AM. I just rebooted to write this for you now instead of tomorrow. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c12 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #12 from Oliver Neukum <oneukum@novell.com> 2008-01-21 02:32:07 MST --- Time to break out the sledgehammer. Could you try with "noapic" added to the kernel command line? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c13 --- Comment #13 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-21 03:43:53 MST --- Hum! Long ago I had to add "apic" to the kernel, because if not it was very slow, if I remember correctly... but ok, I'll try. What I tried two days ago was to boot again with "nohz=off" in runlevel 3. The computer doesn't doze off, but it locks when I order it to hibernate - and the cute green splash screen with progress bar impedes me of seeing a log and reporting the point where it locks. Unfortunately, I hibernate this machine two or three times a day, so that's a no-no. I can try recompiling again with that option: I did that already, but perhaps I made a mistake. Ok, off to reboot... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c14 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | --- Comment #14 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-21 05:03:59 MST --- Ok, I'm now in "noapic" mode. I left the computer running in runlevel 3, router powered off, with my script detecting pauses, and none was felt - except a curious 2" pause about every minute: =====> 12:04:50 (******* 2 *******) =====> 12:05:50 (******* 2 *******) =====> 12:06:52 (******* 2 *******) =====> 12:07:54 (******* 2 *******) =====> 12:08:56 (******* 2 *******) =====> 12:09:58 (******* 2 *******) =====> 12:10:59 (******* 2 *******) =====> 12:12:01 (******* 2 *******) =====> 12:13:02 (******* 2 *******) I don't feel the computer slow, contrary to what I though. I have saved the output of dmesg and the /var/log/boot.msg in case you want them. I will try now to hibernate. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c15 --- Comment #15 from Carlos Robinson <robin.listas@telefonica.net> 2008-01-21 09:56:22 MST --- Bad news: after thawing, it locks completely. It recovered fine after hibernation. I thought the computer was a bit slow. I went to tty1 ([ctrl][alt][F1]). I typed [Alt][Left] to see the text console dedicated to output messages from the hibernation process: it only showed the single line "resuming..." or something similar. I then continued typing [Alt][Left], a process that would normally get my to the tty10 after some keys, and... the screen was zipping past with messages so fast I couldn't read them, something about an unhandled interrupt. The keyboard locked: I couldn't even toggle the [bloq num] light, a good indicator of a strong lock or crash. I tried pinging, no response. I had to hit the power off button. On restart, I went to the kernel log, and sure enough, there were these (and we are fortunate that the filesystem could cope and record them): Jan 21 14:41:41 nimrodel kernel: swsusp: Basic memory bitmaps created Jan 21 17:12:33 nimrodel kernel: Stopping tasks ... 29, desc: c0381e80, depth: 1, count: 0, unhandled: 0 Jan 21 17:12:35 nimrodel kernel: ->handle_irq(): c014f3c7, handle_bad_irq+0x0/0x1e9 Jan 21 17:12:35 nimrodel kernel: ->chip(): c03598dc, 0xc03598dc Jan 21 17:12:35 nimrodel kernel: ->action(): 00000000 Jan 21 17:12:35 nimrodel kernel: IRQ_DISABLED set Jan 21 17:12:35 nimrodel kernel: unexpected IRQ trap at vector 81 Jan 21 17:12:35 nimrodel kernel: irq 129, desc: c0381e80, depth: 1, count: 0, unhandled: 0 Jan 21 17:12:35 nimrodel kernel: ->handle_irq(): c014f3c7, handle_bad_irq+0x0/0x1e9 Jan 21 17:12:35 nimrodel kernel: ->chip(): c03598dc, 0xc03598dc Jan 21 17:12:35 nimrodel kernel: ->action(): 00000000 Jan 21 17:12:35 nimrodel kernel: IRQ_DISABLED set Jan 21 17:12:35 nimrodel kernel: unexpected IRQ trap at vector 81 Jan 21 17:12:35 nimrodel kernel: irq 129, desc: c0381e80, depth: 1, count: 0, unhandled: 0 Jan 21 17:12:39 nimrodel kernel: ->handle_irq(): c014f3c7, handle_bad_irq+0x0/0x1e9 Jan 21 17:12:39 nimrodel kernel: ->chip(): c03598dc, 0xc03598dc Jan 21 17:12:39 nimrodel kernel: ->action(): 00000000 Jan 21 17:12:39 nimrodel kernel: IRQ_DISABLED set Jan 21 17:12:39 nimrodel kernel: unexpected IRQ trap at vector 81 Jan 21 17:12:39 nimrodel kernel: irq 129, desc: c0381e80, depth: 1, count: 0, unhandled: 0 There are about twelve thousand lines of that, in under a minute and a half! No wonder the machine felt slow. So... good shot that "noapic" kind of solves the laziness, but the machine is unstable, terribly so. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c16 --- Comment #16 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-02 05:04:24 MST --- I have tried the new kernel: vmlinuz-2.6.22.16-0.1-default. At first I thought things were better, but no, the problem remains. Then I recompiled the kernel with the "tickless" option: # CONFIG_NO_HZ is not set which I assume is more or less equivalent to the "nohz=off" in the startup line; however, the computer gets lazy, as usual (ie, the problem remains), but, hybernate works (with "nohz=off" it crashes). This is the kernel version I have currently active. Go figure. This is a log of "laziness" with my kernel during normal use: =====> 19:08:43 (******* 2 *******) =====> 19:08:57 (******* 5 *******) =====> 19:09:21 (******* 14 ***************************) =====> 19:09:58 (******* 34 ***************************) =====> 19:10:35 (******* 28 ***************************) =====> 19:10:41 (******* 2 *******) =====> 19:11:13 (******* 30 ***************************) =====> 19:11:51 (******* 34 ***************************) =====> 19:11:52 (******* 2 *******) =====> 19:12:28 (******* 36 ***************************) =====> 19:13:05 (******* 36 ***************************) =====> 19:13:43 (******* 37 ***************************) =====> 19:14:20 (******* 36 ***************************) =====> 19:14:52 (******* 31 ***************************) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c17 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #17 from Oliver Neukum <oneukum@novell.com> 2008-02-04 04:58:09 MST --- Can you try that kernel with your current config options with a) noapic b) irqfixup in the kernel command line? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c18 --- Comment #18 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-04 06:36:58 MST --- (In reply to comment #17 from Oliver Neukum)
Can you try that kernel with your current config options with
a) noapic b) irqfixup
in the kernel command line?
Is that two different tests, or one test with both parameters? As to "noapic" I already did, see #15. No good. Do you expect different results with this kernel? I will try "irqfixup" perhaps tonight. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c19 --- Comment #19 from Oliver Neukum <oneukum@novell.com> 2008-02-05 01:30:26 MST --- Yes, two separate tests. Yes, I hope "noapic" works better with the newer kernel. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c20 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | --- Comment #20 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-05 06:15:32 MST ---
a) noapic
The boot "feels" slow. No "laziness" in the brief test (brief because I didn't go for a coffee to let it time to be lazy). Hibernate works. It does work better. There is this imprecise feeling of being slow, though. I'd have to use a chronometer to time the boot to make sure.
b) irqfixup
I used "apic irqfixup" The boot stopped for ±30" on the line: Starting udevd udevd-event[1691]: node_symlink: device node '/dev/rtc0' already exists, link '/dev/rtc' will not overwrite it No idea why. Using a test user (cer2) pressing the power off button did not offer the option to hibernate. The normal user (cer) is hibernated, no questions asked (same thing for test "a"). Saw this in the log - it's new to me: Feb 5 12:33:19 nimrodel syslog-ng[3811]: last message repeated 4 times Feb 5 12:33:19 nimrodel gdm[7475]: GLib-CRITICAL: g_key_file_get_string: assertion `key_file != NULL' failed Feb 5 12:33:19 nimrodel gdm[7475]: GLib-CRITICAL: g_key_file_get_string: assertion `key_file != NULL' failed Feb 5 12:33:19 nimrodel gdm[7475]: GLib-CRITICAL: g_key_file_free: assertion `key_file != NULL' failed The computer felt ok, no laziness. Then I went for a coffee, and then, yes, I saw a 37" pause at 12:52:35 =====> 12:51:58 (1) CPU0 0: 91435 IO-APIC-edge timer 1: 2452 IO-APIC-edge i8042 6: 4 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 12 IO-APIC-edge rtc 9: 6 IO-APIC-fasteoi acpi 10: 2 IO-APIC-edge MPU401 UART 12: 18436 IO-APIC-edge i8042 14: 30660 IO-APIC-edge ide0 15: 82384 IO-APIC-edge ide1 16: 1751 IO-APIC-fasteoi ehci_hcd:usb1, eth0 17: 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 1780 IO-APIC-fasteoi Intel 82801BA-ICH2 NMI: 0 LOC: 126068 ERR: 0 MIS: 0 =====> 12:52:35 (******* 37 ***************************) CPU0 0: 91468 IO-APIC-edge timer 1: 2452 IO-APIC-edge i8042 6: 4 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 12 IO-APIC-edge rtc 9: 6 IO-APIC-fasteoi acpi 10: 2 IO-APIC-edge MPU401 UART 12: 18436 IO-APIC-edge i8042 14: 30668 IO-APIC-edge ide0 15: 82425 IO-APIC-edge ide1 16: 1751 IO-APIC-fasteoi ehci_hcd:usb1, eth0 17: 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 1780 IO-APIC-fasteoi Intel 82801BA-ICH2 NMI: 0 LOC: 126110 ERR: 0 MIS: 0 I have the boot logs saved as /var/log/boot_test_apic_irqfixup.msg /var/log/boot_test_noapic.msg, if you want them I'll attach them. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c21 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #21 from Oliver Neukum <oneukum@novell.com> 2008-02-06 01:49:12 MST --- Please test with noapic long enough to be sure. And please provide /proc/interrupts for both test cases and the new kernel with an unmodified kernel command line. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c22 --- Comment #22 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-06 12:02:32 MST --- Created an attachment (id=193458) --> (https://bugzilla.novell.com/attachment.cgi?id=193458) Test number 5, noapic -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c23 --- Comment #23 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-06 12:10:34 MST --- Created an attachment (id=193463) --> (https://bugzilla.novell.com/attachment.cgi?id=193463) Second test (7) with noapic, two hours long I left the computer alone for about two hours, and it didn't show lazziness. Attached goes the marking log, with interrupts, logged every second for two hours. Note: remember that I'm forcing clock TSC, even though the kernel says it is unstable, because clock acpi_pm looses the time (Bug 350981). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c24 --- Comment #24 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-06 12:11:45 MST --- Created an attachment (id=193464) --> (https://bugzilla.novell.com/attachment.cgi?id=193464) test (6) with apic irqfixup active -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c25 --- Comment #25 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-06 12:16:34 MST --- Created an attachment (id=193466) --> (https://bugzilla.novell.com/attachment.cgi?id=193466) boot log of noapic test (7) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c26 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | --- Comment #26 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-06 12:18:19 MST --- Created an attachment (id=193467) --> (https://bugzilla.novell.com/attachment.cgi?id=193467) test 6, boot log (irqfixup) Attached go the boot logs of both tests, which have the real kernel command line used. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c27 --- Comment #27 from Oliver Neukum <oneukum@novell.com> 2008-02-07 01:58:58 MST --- That shows that your box loses interrupts when the APIC is used. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c29 --- Comment #29 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-07 03:51:55 MST --- (In reply to comment #27 from Oliver Neukum)
That shows that your box loses interrupts when the APIC is used.
Only when I use 10.3. Remember that I tested 10.2 from my other partition and it runs OK. That proves that it is the 10.3 kernel who is bad, not my hardware. https://bugzilla.novell.com/show_bug.cgi?id=350981#c4 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c30 --- Comment #30 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-07 06:24:07 MST --- (In reply to comment #29 from Carlos Robinson)
(In reply to comment #27 from Oliver Neukum)
That shows that your box loses interrupts when the APIC is used.
Only when I use 10.3. Remember that I tested 10.2 from my other partition and it runs OK. That proves that it is the 10.3 kernel who is bad, not my hardware.
In case you don't believe me, I have just rebooted my 10.2 partition, stopped the ntp daemon, and did the marker test here; I left it running, and went out shopping for half an hour. Not a single problem, asside from beagle using the cpu, and many updates pending. I'm currently in 10.2 minas-tirith:~ # grep "Kernel command line" /var/log/boot.msg <5>Kernel command line: root=/dev/disk/by-label/320_test_a vga=0x317 resume=/dev/hda5 splash=verbose apic minas-tirith:~ # See? apic. I will now attach files. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c31 --- Comment #31 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-07 06:26:10 MST --- Created an attachment (id=193604) --> (https://bugzilla.novell.com/attachment.cgi?id=193604) marker log inside 10.2 with apic -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c32 --- Comment #32 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-07 06:27:03 MST --- Created an attachment (id=193605) --> (https://bugzilla.novell.com/attachment.cgi?id=193605) The marker script I'm using -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c33 --- Comment #33 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-07 06:28:28 MST --- Created an attachment (id=193606) --> (https://bugzilla.novell.com/attachment.cgi?id=193606) Boot log of 10.2 session -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User oneukum@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c34 --- Comment #34 from Oliver Neukum <oneukum@novell.com> 2008-02-08 02:31:05 MST --- Sorry for the misunderstanding. Yes, this is a kernel bug. But it is in the part of the kernel concerned with interrupt handling, not in the timer handling code. That is an important distinction. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c35 --- Comment #35 from Carlos Robinson <robin.listas@telefonica.net> 2008-02-08 03:56:39 MST --- Ah, ok. However, it is curious that both things fail, and that few people are affected. Couldn't the timer go wrong because the kernel looses interrupts? I really don't know how the timer works nowdays, and even less in Linux; the only one I know about is the old ticker interrupt at about 17 times per second. But it is a curious coincidence. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Oliver Neukum <oneukum@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|oneukum@novell.com |kernel-maintainers@forge.provo.novell.com Status|ASSIGNED |NEW -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User stefan.fent@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c37 --- Comment #37 from Stefan Fent <stefan.fent@novell.com> 2008-05-13 05:26:53 MST --- It would be nice to know which system we're talking about, so could you please add the output of lspci -nn? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c38 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |trenn@novell.com --- Comment #38 from Thomas Renninger <trenn@novell.com> 2008-05-13 06:08:50 MST --- And cat /proc/cpuinfo -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c39 --- Comment #39 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-13 06:26:54 MST --- Created an attachment (id=214761) --> (https://bugzilla.novell.com/attachment.cgi?id=214761) lspci -nn output -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c40 --- Comment #40 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-13 06:27:46 MST --- Created an attachment (id=214762) --> (https://bugzilla.novell.com/attachment.cgi?id=214762) cat /proc/cpuinfo -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c41 --- Comment #41 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-13 06:39:10 MST --- (In reply to comment #37 from Stefan Fent)
It would be nice to know which system we're talking about, so could you please add the output of lspci -nn?
Done (In reply to comment #38 from Thomas Renninger)
And cat /proc/cpuinfo
Done. I have detected on the mail list somebody with possibly the same problem: http://lists.opensuse.org/opensuse/2008-05/msg00293.html I told him to report here but he hasn't replied. I have insisted (a minute ago) again by direct mail, it's all I can do. But at least you can see that there are more people out there with this problem, only that they don't report. I suppose that's typical. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c42 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX --- Comment #42 from Thomas Renninger <trenn@novell.com> 2008-05-13 07:00:06 MST --- Better try with 11.0. This probably came in with the clockevents/no_hz patches. There has quite some work been done meanwhile. It's not worth looking at for 10.3. Please reopen if you still see this problem in 11.0. If this is still in 11.0, this could even get a bit higher priority... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c43 --- Comment #43 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-13 07:11:39 MST --- (In reply to comment #42 from Thomas Renninger)
Better try with 11.0. This probably came in with the clockevents/no_hz patches. There has quite some work been done meanwhile. It's not worth looking at for 10.3. Please reopen if you still see this problem in 11.0. If this is still in 11.0, this could even get a bit higher priority...
I want to, but I can't. For two weekends I haven't been able to upgrade factory because the OSS repo was broken. Now that's been solved, but it is a 1.4 GiB download that takes a minimum of 4 hours to do, actually 8 hours, meaning that the repo has changed again while I try to update... so, hopefully, I will not be able to upgrade my factory till saterday afternoon. Hopefully. There are several tests that I want to do in factory and I can't. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Torsten Duwe <duwe@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #214761|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #214762|application/octet-stream |text/plain mime type| | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c44 --- Comment #44 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-14 12:10:39 MST --- (In reply to comment #41 from Carlos Robinson)
I have detected on the mail list somebody with possibly the same problem:
http://lists.opensuse.org/opensuse/2008-05/msg00293.html
I told him to report here but he hasn't replied. I have insisted (a minute ago) again by direct mail, it's all I can do. But at least you can see that there are more people out there with this problem, only that they don't report. I suppose that's typical.
That's Bug #389774 Could be this same thing, perhaps not. I tell you so that you can look at it from this point of view ;-) Regarding factory, I haven't been able to update it yet. I'll try again tonight. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c45 --- Comment #45 from Carlos Robinson <robin.listas@telefonica.net> 2008-05-16 05:13:45 MST --- (In reply to comment #42 from Thomas Renninger)
Better try with 11.0. This probably came in with the clockevents/no_hz patches. There has quite some work been done meanwhile. It's not worth looking at for 10.3. Please reopen if you still see this problem in 11.0. If this is still in 11.0, this could even get a bit higher priority...
Finally, I have been able to update factory, and I have run a test. I unplugged the network, left my marker script running, and went away for a cup of tea and a stroll. Twenty minutes later, I detected no "laziness". The screen saver had kicked in, beagle had finished, and the script didn't appear to have logged any problem. So far so good... I can't certify that this bug is finally solved, but it certainly looks that way. Nice! :-) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c46 --- Comment #46 from Thomas Renninger <trenn@novell.com> 2008-05-16 05:31:03 MST --- Digging in the depth of high-res and clocksource patches for 10.3 is not worth it and chances are high that a fix is too risky anyway. Thanks for veryfing that things work on 11.0! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c47 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Component|Kernel |Kernel OS/Version|openSUSE 10.3 |openSUSE 11.0 Product|openSUSE 10.3 |openSUSE 11.0 Resolution|WONTFIX | --- Comment #47 from Carlos Robinson <robin.listas@telefonica.net> 2008-11-23 13:25:17 MST --- Sorry, chaps... it's back! :-( The symptoms are back in 11.0 with kernel 2.6.25.18-0.2-pae - it did not happen till the kernel was updated. Linux version 2.6.25.18-0.2-pae (geeko@buildhost) (gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE Linux) ) #1 SMP 2008-10-21 16:30:26 +0200 I didn't notice it earlier because the network was active (network interrupts keep this problem from hapening). That job finished, network is quiet, and the symptoms came back. Quite rare occurrences, but I have had a few. This is what called my attention on the "warn" log (yes, I keep a terminal tailing that log): Nov 23 13:59:55 nimrodel upsmon[11226]: Poll UPS [myups@localhost] failed - Data stale Nov 23 13:59:55 nimrodel named[3192]: *** POKED TIMER *** Nov 23 13:59:55 nimrodel spamd[484]: prefork: sysread(10) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/SpamdF orkScaling.pm line 648. Nov 23 13:59:55 nimrodel spamd[514]: prefork: sysread(11) failed after 300 secs at /usr/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/SpamdF orkScaling.pm line 648. Nov 23 13:59:55 nimrodel spamd[11694]: Use of uninitialized value $selerr in concatenation (.) or string at /usr/lib/perl5/vendor_perl/5.10.0 /Mail/SpamAssassin/SpamdForkScaling.pm line 329. Nov 23 13:59:56 nimrodel spamd[11694]: prefork: select returned error on server filehandle: And sure enough, at that hour I had in my "marker" script output a hiccup: =====> 13:59:56.051383857 (******* 1453 ***************************) machine paused for a full 1453 seconds while I lunched! No, it was not an hybernation pause, those are logged in my system. Interrupts were lost, of course: =====> 13:35:43.415850755 (1) CPU0 0: 29297542 IO-APIC-edge timer 1: 332079 IO-APIC-edge i8042 3: 8 IO-APIC-edge 4: 8 IO-APIC-edge 6: 7 IO-APIC-edge floppy 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 10: 9 IO-APIC-edge MPU401 UART 12: 1292007 IO-APIC-edge i8042 14: 1887391 IO-APIC-edge ide0 15: 2101920 IO-APIC-edge ide1 16: 1530281 IO-APIC-fasteoi ehci_hcd:usb2, eth0 17: 779636 IO-APIC-fasteoi Intel 82801BA-ICH2 18: 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 471397 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 NMI: 0 Non-maskable interrupts LOC: 13316637 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 =====> 13:59:56.051383857 (******* 1453 ***************************) CPU0 0: 29297656 IO-APIC-edge timer 1: 332079 IO-APIC-edge i8042 3: 8 IO-APIC-edge 4: 8 IO-APIC-edge 6: 7 IO-APIC-edge floppy 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 10: 9 IO-APIC-edge MPU401 UART 12: 1292007 IO-APIC-edge i8042 14: 1887404 IO-APIC-edge ide0 15: 2101966 IO-APIC-edge ide1 16: 1530282 IO-APIC-fasteoi ehci_hcd:usb2, eth0 17: 779636 IO-APIC-fasteoi Intel 82801BA-ICH2 18: 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 471420 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4 NMI: 0 Non-maskable interrupts LOC: 13316723 Local timer interrupts RES: 0 Rescheduling interrupts CAL: 0 function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 =====> 13:59:57.088127523 (1 I have more pauses logged. Often they last 10..20 seconds. The system clocksource is currently tsc. Any more info you want? Hopefully you can track it to some change done on the last kernel update. And no, I'm afraid I can't test in factory, it hard locks on me. Sorry. Old related bugs: 350981, 344356. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 Stefan Fent <stefan.fent@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |aosthof@novell.com AssignedTo|stefan.fent@novell.com |trenn@novell.com Status|REOPENED |NEW -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c48 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #48 from Thomas Renninger <trenn@novell.com> 2008-11-24 07:55:41 MST --- Have you already searched for a BIOS update? Please also attach dmidecode and best also hwinfo to get an even better idea what machine we have. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c49 Carlos Robinson <robin.listas@telefonica.net> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|robin.listas@telefonica.net | --- Comment #49 from Carlos Robinson <robin.listas@telefonica.net> 2008-11-24 11:29:10 MST --- (In reply to comment #48 from Thomas Renninger)
Have you already searched for a BIOS update?
No, because: - this machine is from about 2001 (P-IV @ 1800Mhz) - it is a kernel issue. It happens only with certain suse versions and not with anothers, there is nothing wrong with the machine. OpenSUSE 11.0 was working fine till the last security kernel update. All versions till and including 10.2 worked fine, 10.3 was the first one to show this problem. Anyway, I did search for a bios update years ago. There is one for the other bios make, not mine.
Please also attach dmidecode and best also hwinfo to get an even better idea what machine we have.
Ok. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c50 --- Comment #50 from Carlos Robinson <robin.listas@telefonica.net> 2008-11-24 11:30:28 MST --- Created an attachment (id=254913) --> (https://bugzilla.novell.com/attachment.cgi?id=254913) dmidecode requested -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c51 --- Comment #51 from Carlos Robinson <robin.listas@telefonica.net> 2008-11-24 11:31:23 MST --- Created an attachment (id=254914) --> (https://bugzilla.novell.com/attachment.cgi?id=254914) "hwinfo --all --log hwinfo.log" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c52 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Priority|P5 - None |P3 - Medium --- Comment #52 from Thomas Renninger <trenn@novell.com> 2008-11-25 05:11:50 MST ---
It happens only with certain suse versions Ok, I think I got it, sorry about that.
Let's try to limit this to as less patches as possible, then try out the ones that look guilty. This 11.0 kernel was the first which did not work: 2.6.25.18-0.2-pae Did you do upates before, can you remember the last working kernel? For 10.3 all kernels (you tested) were broken. 10.2 and before worked well. This is all about i386-pae kernels or did you also try x86_64 or i386 without -pae support kernels? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c53 --- Comment #53 from Carlos Robinson <robin.listas@telefonica.net> 2008-11-25 08:40:03 MST --- (In reply to comment #52 from Thomas Renninger)
It happens only with certain suse versions Ok, I think I got it, sorry about that.
Don't worry, this bug has a long story.
Let's try to limit this to as less patches as possible, then try out the ones that look guilty.
This 11.0 kernel was the first which did not work: 2.6.25.18-0.2-pae
Correct.
Did you do upates before, can you remember the last working kernel?
I did all the updates offered by Yast Online Update. I don't remember which one was the previous kernel, but it has to be the previous one in the update server. Perhaps the info is in y2logRPM - yes, here: nimrodel:/var/log/YaST2 # grep kernel-pae- y2logRPM 2008-09-09 22:53:16 kernel-pae-2.6.25.5-1.1.i586.rpm installed ok 2008-09-10 11:44:16 kernel-pae-2.6.25.11-0.1.i586.rpm installed ok 2008-09-11 20:56:28 kernel-pae-2.6.25.16-0.1.i586.rpm installed ok 2008-10-30 02:14:24 kernel-pae-2.6.25.18-0.2.i586.rpm installed ok So, this kernel was installed on 2008-10-30, but I didn't notice the problem earlier because I had a lot of network activity (from 08-11-01 to 08-11-11), and this produces interrupts which keep awake the machine. And grepping for the string "failed after 300 secs" in the warn log, which is the tell-tale, I first see it on Nov 19 15:19:40 I also see I installed a bunch of things on Nov 12, but nothing on the 19th, or at least nothing I can think interesting. I could attach if you want the list with dates, produced by: rpm -q -a --queryformat "%{INSTALLTIME}\t%{INSTALLTIME:day} \ %{BUILDTIME:day} %-30{NAME}\t%15{VERSION}-%-7{RELEASE}\t%{arch} \ %25{PACKAGER}\n" | sort | cut --fields="2-"
For 10.3 all kernels (you tested) were broken.
Yes, as far as I remember. At least, we couldn't find anything that worked. I played recompiling kernels, too. Most of it is logged in this Bugzilla and related ones. Maybe later I can dig out the numbers, the history of this problem.
10.2 and before worked well.
Right.
This is all about i386-pae kernels or did you also try x86_64 or i386 without -pae support kernels?
Well, 10.2 and 10.3 were non pae, pae started to be shipped with 11.0, afaik. This machine is 32 bits, so x86_64 is out of the question. Another detail I remembered: [Bug 426592] clocksource 'acpi_pm' looses time. So I'm forcing the TSC clocksource; I think I have been doing that since 10.3 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c54 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |robin.listas@telefonica.net --- Comment #54 from Thomas Renninger <trenn@novell.com> 2009-01-30 08:32:37 MST --- Sorry, I lost track of this one. Can you give 11.1 a try, please. This might have been fixed meanwhile. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c55 --- Comment #55 from Carlos Robinson <robin.listas@telefonica.net> 2009-01-30 11:51:33 MST --- (In reply to comment #54)
Sorry, I lost track of this one. Can you give 11.1 a try, please. This might have been fixed meanwhile.
I'm sorry, but due to Bug 448007 (*) I can't use 11.1 at all, it locks before I can test anything. (*) 11.1 GM hard-locks (beagle and reiserfs problem) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User robin.listas@telefonica.net added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c56 --- Comment #56 from Carlos Robinson <robin.listas@telefonica.net> 2009-02-10 07:22:17 MST --- Bug 448007 counts as fixed, but the fix means I have to use an experimental kernel: kernel-debug-2.6.27.13-SL111_BRANCH_20090203134609_5784a3e1.i586.rpm There is no info on whether this fix will be propagated to standard 11.1, which means that, with the info I have, I'll have to skip upgrading to 11.1, even though I have your complimentary DVD (and torch!) for my collaboration as translator and tester. So, if you want me to test this bug with the above experimental kernel (in my test partition), I'll try. If not, we'll have to wait till 11.2. And no, before you ask: I can not install factory, because that's the 11.1 test partition. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c57 --- Comment #57 from Thomas Renninger <trenn@novell.com> 2009-02-10 08:45:00 MST --- It's not about the debug kernel, but the kernel of the day (20090203134609 this includes the date and time the kernel was built)? This is not an experimental kernel and it will be the 11.1 update kernel at some time. Taking this one as 11.1 kernel is safe. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c58 --- Comment #58 from Thomas Renninger <trenn@novell.com> 2009-02-10 08:45:55 MST --- To be clear: do not take the debug, but -default + -default-base kernel packages from the SLE11-BRANCH/arch directory. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=350980 User trenn@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=350980#c59 Thomas Renninger <trenn@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|robin.listas@telefonica.net | Resolution| |WONTFIX --- Comment #59 from Thomas Renninger <trenn@novell.com> 2009-03-12 03:06:59 MST --- As the machine is from 2001 and the bug history is getting really long, I better close this one. I cannot put more time into this, sorry. It would be great if you give latest kernels a try at some time and inform us and others whether it works or not: ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/kernel-default.rpm ftp://ftp.suse.com/pub/projects/kernel/kotd/HEAD/x86_64/kernel-default-base.rpm -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com