[Bug 231149] New: Thinkpad T23 hangs under load
https://bugzilla.novell.com/show_bug.cgi?id=231149 Summary: Thinkpad T23 hangs under load Product: openSUSE 10.2 Version: Final Platform: Other OS/Version: SuSE Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: kq8z67r6309fo9001@sneakemail.com QAContact: qa@suse.de A T23 upgraded from 10.0 (with which it worked well) to 10.2 doesn't like something about the new version-it freezes after a time when doing any sort of disk or processor intensive work. Most interestingly, the rescue system, repair install, and installer have the same problem. Somehow the initial upgrade install worked fine, but subsequent attempts to do anything with the system involving 10.2 cause it to freeze completely (can't even do anything with sysrq or via network) after it has been loaded for awhile. A 10.1 rescue system also seems to be OK. As this occurs with non-installed 10.2 environments on this hardware, I suspect it may be kernel related, but I have no direct evidence for that as the system dies without writing anything interesting in /var/log/messages. Normal combinations of boot options (acpi=off, nolapic, noapic) don't have any effect in the installer, rescue system, or the installed system. This is somewhat difficult to provide better information on as anything that loads the system freezes it very quickly. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 ------- Comment #1 from kq8z67r6309fo9001@sneakemail.com 2006-12-30 15:26 MST ------- After a little additional experimentation, the combination of apm=off acpi=off noapic maxcpus=0 nosmp (which seems redundant...) works, at least in the sense that the machine no longer hangs. Unfortunately, running with acpi=off is not at all convenient as it means foregoing S3/S4. Is there any chance this is an SMP issue? Would building a uniprocessor kernel help? ACPI worked perfectly on this laptop with 10.0 and 10.1 (and earlier with other distros), but I haven't had a chance to try it with another distribution that has a kernel newer than 10.1's, so I can't guess if this is a vanilla 2.6 kernel bug, suse kernel bug, or kernel config bug. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 ------- Comment #2 from kkeil@novell.com 2007-01-02 15:55 MST ------- You can try the UP kernel from ftp://ftp.suse.com//pub/people/kkeil/testing/10.2/i386 to verify this. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 ------- Comment #3 from kq8z67r6309fo9001@sneakemail.com 2007-01-02 22:04 MST ------- The kernel in comment #2 works fine so far-no lockups, no need for kernel command line args, and S3 and S4 both work. Having in the meantime tried the SMP kernel quite a few times, I think the problem is that the laptop is overheating-I don't think the fans are being run properly with acpi. This would also explain why the initial installation worked: it was performed in a very cold (ca. 5 degrees centigrade) environment. Why this should be the case, I don't know. There was nothing in any of the log files to indicate such a problem. This is a somewhat serious state of affairs if it occurs with other T23s as it could cause permanent damage if it is in fact overheating. Whether it is better to make the UP kernel more generally available, try to fix ACPI in the SMP kernel, or just ignore what is probably a small set of people, I can't say. Presumably there was a good reason for making kernel-smp=kernel-default apart from saving some build time, but on this system at least, SMP+ACPI doesn't seem to work. In case this isn't convenient to address before there is a security update for the kernel, is there any difference between the UP kernel above and the SMP kernel apart form CONFIG_SMP? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |kq8z67r6309fo9001@sneakemail.com ------- Comment #4 from gregkh@novell.com 2007-01-11 20:49 MST ------- How about running with just "nosmp" on the kernel command line? Does that help out? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 kq8z67r6309fo9001@sneakemail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|kq8z67r6309fo9001@sneakemail| |.com | ------- Comment #5 from kq8z67r6309fo9001@sneakemail.com 2007-01-12 12:41 MST ------- No, even with just nosmp, acpi=off was required. noapic was not actually necessary as the machine seems to be blacklisted. apm=off is probably also unnecessary, but I didn't test it. maxcpus=0 I picked up from another bug report I think, and it doesn't appear to be necessary in this case. Probably nosmp acpi=off is minimal. Why nosmp does not work but the UP kernel does I cannot guess-I don't really know how a running SMP kernel booted with nosmp differs from a UP one. Probably the only additional information that I can offer is that in non-working configurations (lack of acpi=off + apparently anything, with the SMP kernel) I occassionally noticed an error message during boot. It may not be related, as the system froze whether or not the message appeared (it is also possible that I missed it, but I watched for it closely after first noticing), and I cannot supply the exact message as whatever was printing it did so in such a way that it didn't end up anywhere in /var/log or in the kernel ring buffer. The message said something about inability to load /lib/modules/.../thermal.ko and appeared fairly early in the boot process (loaded from the initrd?). I ran rpm -V on kernel-default, which was apparently unaltered. I then rebuilt the initrd, which had no effect. This information may be spurious, but it seemed interesting in light of my own overheating hypothesis. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel- |trenn@novell.com |maintainers@forge.provo.nove| |ll.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #6 from trenn@novell.com 2007-02-28 03:01 MST ------- Hmm, I wonder if this is the same problem than #216205, but on the other report, the system does not freeze completely..., the rest would make sense. You can try with the broken kernel by passing max_cstate=1. If this helps, the problem should get fixed with the next update kernel which is coming out soon. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 trenn@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |kq8z67r6309fo9001@sneakemail.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149 ------- Comment #7 from kq8z67r6309fo9001@sneakemail.com 2007-03-01 15:22 MST ------- It doesn't sound like bug #216205 has anything to do with this; the symptoms seem very different. In that case, there was a high load at idle, in this case there was the expected load at idle and hard lockups under real load. I'm afraid I can't do any further testing on this at the moment. This bug was reported two months ago, and as no activity was evident and I had exhausted my own troubleshooting options, I loaded the UP kernel on the machine, removed the SMP kernel, disabled kernel updates, and sent the laptop out after a couple weeks. Nevertheless, I will try to provide the requested information at the earliest opportunity, but I am unsure when that will be possible as the laptop is now several thousand kilometers distant, and the user is unlikely to want to deal with it right now. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=231149#c8
Thomas Renninger
participants (1)
-
bugzilla_noreply@novell.com