[Bug 672008] New: Complete system freeze at start
https://bugzilla.novell.com/show_bug.cgi?id=672008 https://bugzilla.novell.com/show_bug.cgi?id=672008#c0 Summary: Complete system freeze at start Classification: openSUSE Product: openSUSE 11.4 Version: RC 1 Platform: x86-64 OS/Version: SuSE Other Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: vadimuzzz@inbox.ru QAContact: qa@suse.de Found By: --- Blocker: --- Created an attachment (id=414064) --> (http://bugzilla.novell.com/attachment.cgi?id=414064) Default kernel log User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; ru; rv:1.9.2.13) Gecko/20101203 SUSE/3.6.13-3.1 Firefox/3.6.13 After update from 11.3 to 11.4-M6 (x86_64) my system (laptop hp-compaq 6720s) totally freezes on boot. It's happens almost always (~9 times of 10, roughly). In console I see only "Creating device nodes with udev", it's all. This problem I saw in 11.3 with newer kernels (2.6.36, 2.6.37). After update to 11.4-RC1 the problem still persists. I experimented with the kernel parameters (like nomodeset) but no effect. System hangs very early (before switching mode by KMS). Reproducible: Always Steps to Reproduce: 1. reboot 2. ~9 times of 10 system freezes 3. then the system is booting as normally and works fine. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c1
--- Comment #1 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c2
--- Comment #2 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c3
--- Comment #3 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c4
Jiri Slaby
totally freezes on boot. It's happens almost always (~9 times of 10, roughly). In console I see only "Creating device nodes with udev", it's all.
Could you boot the default kernel with debug option and _without_ quiet and splash=silent options? And if there is a trace, take a photo... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c5
--- Comment #5 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c6
--- Comment #6 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c7
--- Comment #7 from Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c8
--- Comment #8 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c9
--- Comment #9 from Jiri Slaby
Created an attachment (id=414698) --> (http://bugzilla.novell.com/attachment.cgi?id=414698) [details] default kernel + debug initcall_debug
Great, so i915 and uhci_hcd init functions didn't return. Could you move uhci_hcd.ko away from /lib/modules/*/kernel/drivers/usb/host/uhci-hcd.ko (e.g. to root), run mkinitrd and retry initcall_debug? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c10
--- Comment #10 from Jiri Slaby
(In reply to comment #8)
Created an attachment (id=414698) --> (http://bugzilla.novell.com/attachment.cgi?id=414698) [details] [details] default kernel + debug initcall_debug
Great, so i915 and uhci_hcd init functions didn't return. Could you move uhci_hcd.ko away from /lib/modules/*/kernel/drivers/usb/host/uhci-hcd.ko (e.g. to root), run mkinitrd and retry initcall_debug?
And if it doesn't help, move it back and try to kill /lib/modules/*/kernel/drivers/gpu/drm/i915/i915.ko mkinitrd and retry. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c11
--- Comment #11 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c12
--- Comment #12 from Vadim Kotelnikov
And if it doesn't help, move it back and try to kill /lib/modules/*/kernel/drivers/gpu/drm/i915/i915.ko mkinitrd and retry.
Yes, it works. System boots always. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c13
Jiri Slaby
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c14
Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c15
--- Comment #15 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c16
--- Comment #16 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c19
--- Comment #19 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c20
--- Comment #20 from Vadim Kotelnikov
Then try to boot without ssb module, at least.
I've tried to do it, see above.
edit /etc/sysconfig/kernel and set NO_KMS_IN_INITRD="yes", run mkinitrd
Interesting results: system hangs _later_ (but with the same probability). And before switching video mode. I then removed i915.ko and reboot into runlevel 3. modprobe worked, KMS switched mode, but I have not seen the messages from modprobe. It`s too long to reboot everytime in debug mode. Can I do "modporobe i915; rmmod i915" several times to collect statistics? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c21
--- Comment #21 from Takashi Iwai
Then try to boot without ssb module, at least.
I've tried to do it, see above.
edit /etc/sysconfig/kernel and set NO_KMS_IN_INITRD="yes", run mkinitrd
Interesting results: system hangs _later_ (but with the same probability). And before switching video mode.
When you boot in runlevel 3 and NO_KMS_IN_INITRD, i915 won't be loaded. So you don't have to remove it. i915 module is loaded automatically first when you start X.
I then removed i915.ko and reboot into runlevel 3. modprobe worked, KMS switched mode, but I have not seen the messages from modprobe.
So, no hang happened now? Interesting. Can you start X at this stage? (But anyway test without removing i915 in runlevel 3.) If you get any crash after boot by loading i915 or starting X, then we have a better chance to get logs. As mentioned, you can take kernel messages via netconsole on another machine connected via ethernet.
It`s too long to reboot everytime in debug mode. Can I do "modporobe i915; rmmod i915" several times to collect statistics?
Once when the boot succeeded, basically we don't need for initcall_debug option any more. Now it's a usual debugging. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c22
--- Comment #22 from Vadim Kotelnikov
When you boot in runlevel 3 and NO_KMS_IN_INITRD, i915 won't be loaded. So you don't have to remove it. i915 module is loaded automatically first when you start X.
Are you sure? But lsmod shows that i915 is loaded. And KMS switched video mode. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c23
--- Comment #23 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c24
--- Comment #24 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c25
--- Comment #25 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c26
--- Comment #26 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c27
--- Comment #27 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c28
--- Comment #28 from Vadim Kotelnikov
If you have another machine, could you set up netconsole and catch up the kernel log there?
I have another machine but without network adapter. Installing the netconsole client takes time (no earlier than Monday).
Also, one more interesting test would be to try kernel-vanilla package to see whether the same problem occurs.
Note that you can install multiple kernel
Well, vanilla-kernel works fine. 15 times of 15, no freezes ("blacklist i915" removed). But experiment is not quite clear - vanilla kernel does not support bootsplash. packages via "rpm -ivh kernel-*.rpm --force", and choose in grub menu. zypper rules! Just "zypper in kernel-vanilla". I'll try desktop-kernel too. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c29
--- Comment #29 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c30
--- Comment #30 from Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c31
--- Comment #31 from Vadim Kotelnikov
Does the hang happen when you pass splash=none, too?
Yes but not so often (~ 1 times of 10) Also, I could not reproduce bug with -vanilla or -desktop kernel -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c
Egbert Eich
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c32
Egbert Eich
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c33
Florian Grannemann
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c34
--- Comment #34 from Florian Grannemann
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c35
Takashi Iwai
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c36
--- Comment #36 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c37
--- Comment #37 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c38
--- Comment #38 from Vadim Kotelnikov
Did you (re-)test with 11.4 GM kernel (2.6.37.1 or higher, the netconsole debugged kernel seem to be a plain 2.6.37). There this patch is in: rpm -qp --changelog kernel-desktop.rpm |less
* Fri Feb 18 2011 jslaby@suse.cz - Update to 2.6.37.1: - obsoletes: - patches.arch/x86-mtrr-avoid-MTRR-reprogramming-on-BP-during-boot-on.patch
Please make sure that the kernel you test has the patch included, might be a duplicate of bug #623393.
Nothing has changed: "-vanilla" and "-desktop" kernels works, but not "-default" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c39
--- Comment #39 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c40
--- Comment #40 from Vadim Kotelnikov
Does this one appear only once (or seldom?) and you can retrieve full, sane battery info? I expect this also shows up with i915 workarounds? Hm, best should be to resolve the one issue first and open another Embedded Controller bug if above persists.
I do not quite understand what you wrote. Please, give detailed instructions on how to collect information. When I type: grep ACPI /var/log/messages | grep Error or grep ACPI /var/log/messages | grep Exception there are no errors. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c41
Thomas Renninger
I do not quite understand what you wrote Forget about the ACPI error (AE_TIME), as said it's probably unrelated.
The mtrr issue does not happen on a -desktop kernel and i915 is able to change memory settings via set_mtrr there, but things get stuck on -default flavoured kernels? Only related option I can see is PREEMPTion. On -desktop preemption is on (working) on -default it's off (not working), could it be that? I compiled a default kernel with the same preempt settings from desktop flavor: ftp.suse.com/pub/people/trenn/mtrr_problems_11.4/preempt_default/* and additionally the exact same sources with default -default settings to double check whether the problem still exists: ftp.suse.com/pub/people/trenn/mtrr_problems_11.4/plain_default/* If the first works, it's related to (non-)preemption. Possibly this is related to the stop_cpu framework (has this been added in 2.6.36 already which is reported to also not work?). Suresh/Tejun: Have you seen backtrace from comment #39 already? Is this known or can you help? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c42
--- Comment #42 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c43
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c
Egbert Eich
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c44
Egbert Eich
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c45
Youquan Song
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c46
--- Comment #46 from Youquan Song
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c47
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c48
--- Comment #48 from Youquan Song
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c49
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c50
--- Comment #50 from Suresh Siddha
If not, I'll look deeper in Youqan's suggestions.
MTRR code is clearly buggy and I could reproduce the deadlock by introducing some delays in the code. Anyway, I just posted couple of patches to address the MTRR rendezvous implementation using __stop_machine. http://marc.info/?l=linux-kernel&m=130740236011259&w=2 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c51
--- Comment #51 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c52
--- Comment #52 from Vadim Kotelnikov
Kernels to test (-destkop and -desktop-base) can be found here:
Can you provide -default version? I have not seen the problem in -desktop and -vanilla kernels. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Can you provide -default version? Sure. All flavors will be built and exported. This will still take some time (an hour or more). Best check the date of the files and double check: rpm -qp --changelog kernel-xy.rpm |head
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c53
--- Comment #53 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c54
--- Comment #54 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c55
--- Comment #55 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c56
--- Comment #56 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c57
Suresh Siddha
I keep need-info, Suresh is probably happy for another test. Not sure whether it's urgently necessary, though.
Vadim, Thomas: Looks like we are coming close to a consensus and would like one more test of the proposed patches. I am going to attach a patchset of 4 patches. Can you please do tests? One test with the kernel that has only the patch "mtrr_stop_machine_quick_fix.patch" and another test with the kernel that has all the four patches in the patchset applied. Due to the complexity of the patches, we want to push only the first patch into 3.0 and the stable series and the rest of the cleanup patches to 3.1. Hence the request for two separate tests. Thanks for your help. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c58
--- Comment #58 from Suresh Siddha
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c59
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c60
--- Comment #60 from Suresh Siddha
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c61
--- Comment #61 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c62
--- Comment #62 from Vadim Kotelnikov
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c63
--- Comment #63 from Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c64
Thomas Renninger
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c65
Swamp Workflow Management
https://bugzilla.novell.com/show_bug.cgi?id=672008
https://bugzilla.novell.com/show_bug.cgi?id=672008#c66
Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com