[Bug 944978] New: Installation hangs every time
http://bugzilla.opensuse.org/show_bug.cgi?id=944978 Bug ID: 944978 Summary: Installation hangs every time Classification: openSUSE Product: openSUSE Distribution Version: 42.1 Milestone 2 Hardware: x86-64 OS: openSUSE 42.1 Status: NEW Severity: Critical Priority: P5 - None Component: Installation Assignee: yast2-maintainers@suse.de Reporter: alberto.zacchetti@tiscali.it QA Contact: jsrain@suse.com Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36 Build Identifier: After booting from the DVD, everything is going well until the first graphical screen "initializing installation". At this point, after a few seconds the system hangs completely. Reproducible: Always Steps to Reproduce: 1. boot from dvd 2. select "install" Actual Results: The PC hangs. The my PC is ASUS model PU551J, Intel i3, 4GB RAM, Intel integrated graphics. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c1
--- Comment #1 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c3
--- Comment #3 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c4
--- Comment #4 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c5
Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c6
--- Comment #6 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c7
Dominique Leuenberger
Ok, I found the problem: in runlevel >= 3 is the NetworkManager.service to hangs the system. Therefore, when installing, the system hangs due to kms (nomodeset solve this), while from the first reboot, the system hangs due to NetworkManager.service.
Do you have any logfile supporting the theory that NM is the task hanging? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c8
--- Comment #8 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c9
--- Comment #9 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c10
--- Comment #10 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c11
Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c13
--- Comment #13 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c14
--- Comment #14 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c15
--- Comment #15 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c16
--- Comment #16 from Takashi Iwai
I'm sorry, maybe I can not express myself well in English... Tumbleweed has started having the same problem (panic and cpu stall) in early September, then I dropped back to 13.2. I do not think it would be useful to try the kernel of Tumbleweed if it has the same problem. However now I try to update the kernel 4.1.8 and will let you know.
I see. In that case, try to install the old openSUSE-13.2 kernel on top of Leap system. It should still work in general, and in that way, we can see whether it's a kernel regression. If 4.1.8 kernel still doesn't work, try the kernels in OBS Kernel:stable and Kernel:HEAD repos. They contain 4.2.x and 4.3-rc kernels, respectively. If these upstream kernels have the issue, we should report it to upstream. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c17
--- Comment #17 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c18
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c19
--- Comment #19 from Alberto Zacchetti
The latest kernel changes in Tumbleweed were:
r284 | dimstar_suse | 2015-09-01 22:34:59 CET | 6d1d260b454cded6af4d70132227e386 | 4.1.6 | rq327899
Automatic submission by obs-autosubmit ---------------------------------------------------------------------------- r283 | coolo | 2015-08-26 08:00:49 CET | 39ded2c9e4e1308d6b36bd1f76484759 | 4.1.6 | rq325300
The one checked in on September 1 has been released to the FTP Mirrors as part of the 20150903 snapshot (with build and QA, it ended up on the FTP mirrors on Sep 5)
Does that date about co-incide with your statement of 'beginning of September' ?
Please, you can tell me the full url of repo for kernel 4.1.8? Thank you! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c20
--- Comment #20 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c21
--- Comment #21 from Alberto Zacchetti
The latest kernel changes in Tumbleweed were:
r284 | dimstar_suse | 2015-09-01 22:34:59 CET | 6d1d260b454cded6af4d70132227e386 | 4.1.6 | rq327899
Automatic submission by obs-autosubmit ---------------------------------------------------------------------------- r283 | coolo | 2015-08-26 08:00:49 CET | 39ded2c9e4e1308d6b36bd1f76484759 | 4.1.6 | rq325300
The one checked in on September 1 has been released to the FTP Mirrors as part of the 20150903 snapshot (with build and QA, it ended up on the FTP mirrors on Sep 5)
Does that date about co-incide with your statement of 'beginning of September' ?
Unfortunately I left Tumbleweed to return to 13.2, but I think it was the change of 2015-09-01, because I run updates quite often. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c22
--- Comment #22 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c23
--- Comment #23 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c24
--- Comment #24 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c25
--- Comment #25 from Larry Finger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c26
--- Comment #26 from Alberto Zacchetti
Thanks. Can you get the kernel message at crashing by any chance?
If not, try enabling kdump. Unfortunately, kdump package on Leap beta1 doesn't work well as is for now (reported in boo#947816). Try the following instead:
- Add kdump OBS repo: # zypper ar obs://Kernel:/kdump/openSUSE_Factory kdump
- Install kdump and kexec-tools from that repo # zypper in -r kdump kdump kexec-tools
- Install yast2-kdump package # zypper in yast2-kdump
- Set up kdump via yast2 kdump. Enable the kdump in the checkbox, then configure the memory size. yast2-kdump asks some stupid values there. Just ignore them, give 256 to low memory, and leave 0 to high memory.
- Try to enable magic sysrq. Add the following line to /etc/sysctl.d/99-sysctl.conf:
kernel.sysrq = 1
- Reboot the system. Check "systemctl status kdump" and see that it's loaded properly without error.
- Check whether kdump works. You can trigger it via magic sysrq, Alt-SysRq-c key combination. (sysrq key is sometimes print-screen key, often activated with Fn key on laptops.)
If nothing happens (or just keeping freezing), something went wrong. Usually it switches to the crash dump kernel after some seconds, then starts dumping verbosely. When the dump worked, run "reboot -f" there.
- If you confirmed that kdump works, try to reboot again and load the wifi driver manually. If the kernel panics, it should trigger kdump by itself. If nothing happens, try to do alt-sysrq-c combo to trigger kdump manually.
Once when you get the kdump, please attach the dmesg output found in the obtained crash directory to Bugzilla.
Ok, I installed kdump and I did everything you said, but when I try to copy dmesg.txt somewhere in the filesystem, once rebooted the system, do not find it. How should I do? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c27
--- Comment #27 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c28
--- Comment #28 from Alberto Zacchetti
The files are saved in /var/crash/* directory. Look into it.
This directory is empty. I have looked the whole tree... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c29
--- Comment #29 from Takashi Iwai
(In reply to Takashi Iwai from comment #27)
The files are saved in /var/crash/* directory. Look into it.
This directory is empty. I have looked the whole tree...
You didn't rollback or boot from btrfs snapshot, right? If the directory is empty, it means that the crash dump didn't succeed. You should try at first without loading wifi module but just trigger the crash to see whether it works. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c30
--- Comment #30 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c31
--- Comment #31 from Takashi Iwai
I had already tried to do a test without load the driver, and the dmesg.txt file is created, but when I copy it manually somewhere in the filesystem, it appears in the position that I have chosen, but after rebooting the system, is no more, no original and no copy.
Where did you copy? The crash dump kernel boots in its own ramdisk, so its root fs will be gone after reboot. Usually kdump mounts the local fs in /kdump/mnt/*, so you can copy there, too. Also you can copy to a usb-disk or remote ssh copy, IIRC. The likely reason you didn't get any dumps is the lack of free space. As default, openSUSE assigns fairly small amount of root partition, and the dump doesn't fit in /var/crash. You can change it to another partition, e.g. /home/crash, via yast2 kdump, then reboot and retest. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c32
--- Comment #32 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c33
--- Comment #33 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c34
--- Comment #34 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c35
--- Comment #35 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c36
--- Comment #36 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c37
--- Comment #37 from Takashi Iwai
Hi. It is possible to increase the level of system log?
Well, dmesg output already contains the information. You can increase the console log level (run "dmesg -n 8", for example). Also, you might have a better chance with kernel-debug package instead of kernel-desktop.
Could I get more information on the causes of the hangs. Did you see if there is a repo from which to install a kernel release of Tumbleweed preceding the month of September?
You can find some old kernels in OBS home:tiwai:kernel:3.18, 3.19, 4.0, etc. They contain the latest stable updates of SUSE kernels. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c38
--- Comment #38 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c39
--- Comment #39 from Larry Finger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c40
--- Comment #40 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c41
--- Comment #41 from Takashi Iwai
OK, I set up a quick fix KMP for Leap kernel in OBS home:tiwai:bnc944978/rtl8821ae repo. It takes some time until it's built and published.
To be clear: I've set up a repo that contains a kernel module package including the fix patch Larry proposed. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c42
--- Comment #42 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c43
--- Comment #43 from Takashi Iwai
I tried using your module (from home:/tiwai:/bnc944978/standard) with kernel-desktop standard (4.1.6) and kernel-desktop 4.1.9 from your repo, but when I load the module with modprobe I get this message:
modprobe: ERROR: could not insert 'rtl8821ae': Exec format error
Maybe I'm wrong to do anything?
The standard one looks incompatible for now, since there is no kernel 4.1.8 available on repo although the KMP was built against it. So remove it for now. Instead, install only the one in devel repo, currently rtl8821ae-kmp-desktop-4.1_k4.1.9-*.rpm. The easiest way would be like below: zypper ar obs://home:/tiwai:/bnc944978/devel rtl-test zypper ar obs://Kernel:/openSUSE-42.1/standard kernel-test zypper ref zypper in rtl8821ae-kmp-desktop Once after installation, you can remove these repos, zypper rr rtl-test zypper rr kernel-test Then boot with 4.1.9 kernel and try modprobe rtl8821ae. If you still get an error, please give dmesg output. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c44
--- Comment #44 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c45
--- Comment #45 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c46
--- Comment #46 from Dominique Leuenberger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c47
--- Comment #47 from Larry Finger
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c48
Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c50
--- Comment #50 from Alberto Zacchetti
The patch was sent upstream with the explanation that it should be applied to kernel 4.3 and the appropriate stable versions. Despite my language and efforts, it was applied to wireless-drivers-next, not wireless-drivers. As such, it will not be applied to mainline until the post 4.3 merge. At that point, it will be applied to stable.
This patch was applied to wireless-drivers-next as commit 54328e64047a54b8fc2362c2e1f0fa16c90f739f.
But this patch will be released in the coming days or months? Sorry for this my question, but I do not know the developer workflow. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c51
--- Comment #51 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c52
--- Comment #52 from Alberto Zacchetti
The fix was already backported to openSUSE kernels, so already fixed for openSUSE users. Upstream kernel will have it in the upcoming 4.4-rc1.
But my 42.1 system hangs yet... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c53
--- Comment #53 from Larry Finger
(In reply to Takashi Iwai from comment #51)
The fix was already backported to openSUSE kernels, so already fixed for openSUSE users. Upstream kernel will have it in the upcoming 4.4-rc1.
But my 42.1 system hangs yet...
Based on the similarity of the symptoms, I thought that patch would fix your problems. Unfortunately, none of your posted logs give any traceback info that is helpful. Please boot a rescue system, mount the partition containing / for your 42.1 system on /mnt, and run the command echo "blacklist rtl8821ae" > /mnt/etc/modprobe.d/50-rtl8821ae.conf If rtl8821ae is the problem, you should then be able to boot the system normally as loading of that driver will be blocked. After you have verified that everything now works OK, then "sudo modprobe -v rtl8821ae", which will load the wifi driver. If it then fails, generate and post the crash dump. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c54
Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c55
--- Comment #55 from Larry Finger
Created attachment 655064 [details] panic log after modprobe -v rtl8821ae
When I asked about the patch, it was because I had already verified that the rtl8821ae driver still crash the system. I have attached dump of panic following the driver loads.
There is an error in the logic of this parameter. To handle the critical cases, the logic should enable clear interrupts, and allow the user to disable them if that works OK. The problem is that the code makes the wrong test. At this point, I'm not sure what change I should make. In the meantime, you should edit /etc/modprobe.d/50-rtl8821ae.conf, and change the blacklist line to say "options rtl8821ae int_clear=0". -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c56
--- Comment #56 from Alberto Zacchetti
In the meantime, you should edit /etc/modprobe.d/50-rtl8821ae.conf, and change the blacklist line to say "options rtl8821ae int_clear=0".
Ok, in this way it works fine. But I noticed that the configuration of the wireless parameters blocked the desktop, but I think this is a kde bug. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c57
--- Comment #57 from Takashi Iwai
(In reply to Alberto Zacchetti from comment #54)
Created attachment 655064 [details] panic log after modprobe -v rtl8821ae
When I asked about the patch, it was because I had already verified that the rtl8821ae driver still crash the system. I have attached dump of panic following the driver loads.
There is an error in the logic of this parameter. To handle the critical cases, the logic should enable clear interrupts, and allow the user to disable them if that works OK. The problem is that the code makes the wrong test. At this point, I'm not sure what change I should make.
Maybe some wild interrupts are left unhandled? If so, setting the debug option might show something? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c58
--- Comment #58 from Takashi Iwai
Maybe some wild interrupts are left unhandled?
Scratch it, I was confused. But...
If so, setting the debug option might show something?
... this may be still valid. Alberto, could you try to blacklist the driver then load it manually with debug=5 option but without int_clear=0? (Remove it when set in modprobe.d/*) This should lead to a stall again but with more verbose logs. Reading back this bug, I wonder why the old KMP works. Do you still have 4.1.9 and KMP on your system? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c59
--- Comment #59 from Alberto Zacchetti
... this may be still valid. Alberto, could you try to blacklist the driver then load it manually with debug=5 option but without int_clear=0? (Remove it when set in modprobe.d/*) This should lead to a stall again but with more verbose logs.
Ok, I try and let you know.
Reading back this bug, I wonder why the old KMP works. Do you still have 4.1.9 and KMP on your system?
I'm sorry, but I did a fresh installation. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c60
--- Comment #60 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c61
--- Comment #61 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c62
--- Comment #62 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c63
--- Comment #63 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c64
--- Comment #64 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c65
--- Comment #65 from Alberto Zacchetti
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c66
--- Comment #66 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c67
--- Comment #67 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c68
--- Comment #68 from Takashi Iwai
There is an error in the logic of this parameter. To handle the critical cases, the logic should enable clear interrupts, and allow the user to disable them if that works OK. The problem is that the code makes the wrong test. At this point, I'm not sure what change I should make.
Let's start from the clear definition of the flag and the parameter :) In one place: MODULE_PARM_DESC(int_clear, "Set to 1 to disable interrupt clear before set (default 0)\n"); In another place: struct rtl_mod_params { .... /* default 0: 1 means do not disable interrupts */ bool int_clear; and yet it's evaluated like: if (!rtlpci->int_clear) rtl8821ae_clear_interrupt(hw);/*clear it here first*/ IMO, what "int_clear" implies is to clear int -- so if it's 1, it should do clear interrupts. The description should be aligned with it. One thing I'm not sure is about your intention of the default int_clear behavior. Should it clear as default (for fixing this kind of bugs with risk of other breakage), or not clear (keeping 4.0 behavior as is but just provide an option to fix)? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c69
--- Comment #69 from Larry Finger
(In reply to Larry Finger from comment #55)
There is an error in the logic of this parameter. To handle the critical cases, the logic should enable clear interrupts, and allow the user to disable them if that works OK. The problem is that the code makes the wrong test. At this point, I'm not sure what change I should make.
Let's start from the clear definition of the flag and the parameter :)
In one place: MODULE_PARM_DESC(int_clear, "Set to 1 to disable interrupt clear before set (default 0)\n");
In another place: struct rtl_mod_params { .... /* default 0: 1 means do not disable interrupts */ bool int_clear;
and yet it's evaluated like: if (!rtlpci->int_clear) rtl8821ae_clear_interrupt(hw);/*clear it here first*/
IMO, what "int_clear" implies is to clear int -- so if it's 1, it should do clear interrupts. The description should be aligned with it.
One thing I'm not sure is about your intention of the default int_clear behavior. Should it clear as default (for fixing this kind of bugs with risk of other breakage), or not clear (keeping 4.0 behavior as is but just provide an option to fix)?
The interrupt clear change proposal came from Realtek, where they found that clearing the interrupts caused pauses in transmissions when using iperf. When it was reported that this commit caused a regression that led to this lockup for a few rtl8821ae units, reversion of the entire patch was proposed. I NACKed that request and came up with this change. As you noted, I botched it. For maximum availability, the default value of int_clear should enable the clearing of interrupts. Then if a user complains of the TX pauses, then we suggest that they change the parameter. The parameter should be set to 1 (true) and interrupts should be cleared under that option. I will be submitting a patch through the normal wireless channels. Once it hits wireless-drivers-next, I will report the commit here. Just for the record, this problem affects very few systems. Neither Realtek's or my setup have this problem, which complicates the debugging. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c70
--- Comment #70 from Takashi Iwai
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c71
--- Comment #71 from Larry Finger
Yeah, thanks, that'll be helpful. Meanwhile we can live with a workaround for now (passing the option).
The patch has been pushed to the "net" maintainer as commit eeec5d0ef7ee (" rtlwifi: rtl8821ae: Fix lockups on boot"). It will appear in 4.4-rcX. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=944978
http://bugzilla.opensuse.org/show_bug.cgi?id=944978#c72
Takashi Iwai
participants (1)
-
bugzilla_noreply@novell.com