Mailinglist Archive: opensuse-kernel (119 mails)
| < Previous | Next > |
[opensuse-kernel] Re: x86_86 kernel lock-ups with recent (post Beta1) kernels II
- From: Warren Stockton <wns@xxxxxxxxxxx>
- Date: Wed, 22 Aug 2007 19:34:40 -0600
- Message-id: <200708221934.41165.wns@xxxxxxxxxxx>
On Wednesday 22 August 2007 14:00:39 Andi Kleen wrote:
> For testing I built you a couple of kernel RPMS with various groups
> of patches disabled.
>
> ftp://ftp.suse.com/pub/people/ak/test2/0 ... 4
> [sync still running, might be up in a few minutes)
>
> 0 is the kernel with all patches for a control run, 1-4 are without
> ACPI, libata, ALSA, ieee1394 respectively.
>
> Can you please test which kernel doesn't show the problem?
63df85b3deb2e413cd22d761f0e29dc2
0/kernel-default-2.6.22.4-20070822145631.x86_64.rpm
Locked up after 7 min
8001b2044c30759eaa1cd0f156bf179b 1/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 14 mintes
3c2cc20694224819c32fb007af3fad83 2/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 15 minutes
d758e90c49b033b79fa6a9a7eb479d7c 3/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 19 minutes
827eb27d1529ba70969ae63dbc91fd98 4/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 26 minutes
I don't think there is much value in how long each one ran before the lockup
occurred. I have seen the lockups occur while executing boot scripts (which
also happened with the '3 - no ALSA patch' kernel).
> The netconsole output would be also still useful.
I did the "insmod ...netconsole..." from /etc/init.d/boot.local (Wasted some
time until I realized it did not need an insserv.)
The only netconsole output captured for each of these kernels was the
netconsole "device eth0 not up yet" and "Disabling IRQ #7" as below:
netconsole: local port 4444
netconsole: local IP 192.168....
netconsole: interface eth0
netconsole: remote port 9353
netconsole: remote IP 192.168....
netconsole: remote ethernet address 00:03:47:23:e3:df
netconsole: device eth0 not up yet, forcing it
netconsole: carrier detect appears untrustworthy, waiting 4 seconds
netconsole: network logging started
Disabling IRQ #7
The "Disabling IRQ #7" always took a couple of minutes to appear and always
preceeded the lockup by several minutes.
I also noticed that the "Disabling IRQ" never occurs when I booted a -default
kernel with pollirq.
While these kernels were running, I looked at /proc/interrupts a couple of
times and noted there were always more IRQ7 interrupts than any other
device... especially since this device is not being actively used. (There is
not even a SD card in the reader and there are nearly always as many
interrupts as the timer.) This pattern appeared no matter which of the above
kernels was running.
# cat /proc/interrupts
CPU0 CPU1
0: 631500 42345 XT-PIC-XT timer
1: 45 348 IO-APIC-edge i8042
5: 0 2 IO-APIC-fasteoi ohci1394
7: 41585 628689 IO-APIC-fasteoi sdhci:slot0
8: 0 1 IO-APIC-edge rtc
9: 84 1112 IO-APIC-fasteoi acpi
12: 7624 2493 IO-APIC-edge i8042
14: 1 24 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
20: 58 272987 IO-APIC-fasteoi eth0
21: 728 866 IO-APIC-fasteoi HDA Intel
22: 0 0 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2
23: 39853 11734 IO-APIC-fasteoi sata_nv
NMI: 838 734
LOC: 673356 673614
ERR: 0
I will blacklist sdhci and then see if a -default kernel will run without
pollirq.
--
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse-kernel+help@xxxxxxxxxxxx
> For testing I built you a couple of kernel RPMS with various groups
> of patches disabled.
>
> ftp://ftp.suse.com/pub/people/ak/test2/0 ... 4
> [sync still running, might be up in a few minutes)
>
> 0 is the kernel with all patches for a control run, 1-4 are without
> ACPI, libata, ALSA, ieee1394 respectively.
>
> Can you please test which kernel doesn't show the problem?
63df85b3deb2e413cd22d761f0e29dc2
0/kernel-default-2.6.22.4-20070822145631.x86_64.rpm
Locked up after 7 min
8001b2044c30759eaa1cd0f156bf179b 1/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 14 mintes
3c2cc20694224819c32fb007af3fad83 2/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 15 minutes
d758e90c49b033b79fa6a9a7eb479d7c 3/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 19 minutes
827eb27d1529ba70969ae63dbc91fd98 4/kernel-default-2.6.22.4-7.x86_64.rpm
locked up after 26 minutes
I don't think there is much value in how long each one ran before the lockup
occurred. I have seen the lockups occur while executing boot scripts (which
also happened with the '3 - no ALSA patch' kernel).
> The netconsole output would be also still useful.
I did the "insmod ...netconsole..." from /etc/init.d/boot.local (Wasted some
time until I realized it did not need an insserv.)
The only netconsole output captured for each of these kernels was the
netconsole "device eth0 not up yet" and "Disabling IRQ #7" as below:
netconsole: local port 4444
netconsole: local IP 192.168....
netconsole: interface eth0
netconsole: remote port 9353
netconsole: remote IP 192.168....
netconsole: remote ethernet address 00:03:47:23:e3:df
netconsole: device eth0 not up yet, forcing it
netconsole: carrier detect appears untrustworthy, waiting 4 seconds
netconsole: network logging started
Disabling IRQ #7
The "Disabling IRQ #7" always took a couple of minutes to appear and always
preceeded the lockup by several minutes.
I also noticed that the "Disabling IRQ" never occurs when I booted a -default
kernel with pollirq.
While these kernels were running, I looked at /proc/interrupts a couple of
times and noted there were always more IRQ7 interrupts than any other
device... especially since this device is not being actively used. (There is
not even a SD card in the reader and there are nearly always as many
interrupts as the timer.) This pattern appeared no matter which of the above
kernels was running.
# cat /proc/interrupts
CPU0 CPU1
0: 631500 42345 XT-PIC-XT timer
1: 45 348 IO-APIC-edge i8042
5: 0 2 IO-APIC-fasteoi ohci1394
7: 41585 628689 IO-APIC-fasteoi sdhci:slot0
8: 0 1 IO-APIC-edge rtc
9: 84 1112 IO-APIC-fasteoi acpi
12: 7624 2493 IO-APIC-edge i8042
14: 1 24 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
20: 58 272987 IO-APIC-fasteoi eth0
21: 728 866 IO-APIC-fasteoi HDA Intel
22: 0 0 IO-APIC-fasteoi ohci_hcd:usb1, ehci_hcd:usb2
23: 39853 11734 IO-APIC-fasteoi sata_nv
NMI: 838 734
LOC: 673356 673614
ERR: 0
I will blacklist sdhci and then see if a -default kernel will run without
pollirq.
--
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse-kernel+help@xxxxxxxxxxxx
| < Previous | Next > |