[Bug 774557] New: Kernel 3.4.6 needs patch to make Realtek 8111 NIC work (backport from 3.5)
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c0 Summary: Kernel 3.4.6 needs patch to make Realtek 8111 NIC work (backport from 3.5) Classification: openSUSE Product: openSUSE 12.2 Version: RC 2 Platform: x86-64 OS/Version: openSUSE 12.2 Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: omega@online.de QAContact: qa-bugs@suse.de Found By: --- Blocker: --- User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0 Realtek 8111 NIC with r8169 kernel modul frequently gets stuck; network connectivity is lost for several seconds until the kernel watchdog kicks in and brings the NIC back to life. This makes any box with this NIC unusable. I observed this issue already with openSUSE 12.1 Tumbleweed and kernel 3.4 and could only solve the problem by stepping back to kernel 3.1. This is a quite frequently observed problem with these on-board Realtek chipsets. A patch was recently posted here: https://bugzilla.kernel.org/attachment.cgi?id=73504 (see https://bugzilla.kernel.org/show_bug.cgi?id=14962#c27 for details). The patch has made it into kernel 3.5 but not into 3.4.6 or 3.4.7. To my knowledge openSUSE delivers kernels with extra patches. I therefore suggest to integrate that simple patch into the openSuSE kernel 3.4. Reproducible: Always Steps to Reproduce: Use a box with a mainboard with Realtek 8111B chipset, install 3.4er kernel, initiate some network activity while pinging another host. The pings will stop every now and then and, when the NIC comes back to live, the roundtrip time shown is about 10 to 20 seconds(!). I will ask Francois Romieu to backport his patch to 3.4.6 and 3.4.7 kernel series. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.pr |bpoirier@suse.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c1 Benjamin Poirier <bpoirier@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |omega@online.de --- Comment #1 from Benjamin Poirier <bpoirier@suse.com> 2012-08-06 20:18:26 UTC --- Given your email, I presume that you are the one who reported that the patch fixes your issue in https://bugzilla.kernel.org/show_bug.cgi?id=14962#c29 Can you confirm that's the case and I'll apply the following commit to 12.2: eb2dc35 r8169: RxConfig hack for the 8168evl. Could you also post the XID line for your nic? It appears in dmesg when the card is probed (at boot) and looks something like this: r8169 0000:0b:00.0: eth0: RTL8168c/8111c at 0xffffc9000005e000, 00:23:54:91:8a:2b, XID 1c4000c0 IRQ 49 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c2 --- Comment #2 from Boris Neubert <omega@online.de> 2012-08-06 20:44:38 UTC --- (In reply to comment #1)
Given your email, I presume that you are the one who reported that the patch fixes your issue in https://bugzilla.kernel.org/show_bug.cgi?id=14962#c29
Oh, that's a random coincidence - it was not me and I cannot confirm that it works. Francois Romieu answered me that he submitted the patch for 3.4-stable on 29/07/2012 and that he will handle things back starting from 2012/08/17. As far as I have seen the patch has entered the 3.5er development series but is neither in the 3.4.6 nor 3.4.7 branch.
Could you also post the XID line for your nic? It appears in dmesg when the card is probed (at boot) and looks something like this:
Here you are: r8169 0000:02:00.0: eth0: RTL8168b/8111b at 0xffffc9001109e000, xx:xx:xx:xx:xx:xx, XID 18000000 IRQ 59 Kind regards, Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c Boris Neubert <omega@online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|omega@online.de | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c3 Benjamin Poirier <bpoirier@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |omega@online.de --- Comment #3 from Benjamin Poirier <bpoirier@suse.com> 2012-08-06 21:50:47 UTC --- (In reply to comment #2)
Oh, that's a random coincidence - it was not me and I cannot confirm that it works.
Haha, not a problem.
r8169 0000:02:00.0: eth0: RTL8168b/8111b at 0xffffc9001109e000, xx:xx:xx:xx:xx:xx, XID 18000000 IRQ 59
Given your chipset revision (RTL8168b/8111b), the patch linked above is not going to help since it changes the code flow only for RTL8168evl/8111evl. There are some more fixes to r8169 for watchdog timeouts though. Firstly, could you please try the KMP that I created for bnc#770760? It contains the 3.5 driver backported to the 12.2 3.4 kernel. It is found in the following obs project: https://build.opensuse.org/package/show?package=r8169&project=home%3Abenjamin_poirier%3Abnc770760 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c4 --- Comment #4 from Boris Neubert <omega@online.de> 2012-08-07 04:54:12 UTC ---
There are some more fixes to r8169 for watchdog timeouts though. Firstly, could you please try the KMP that I created for bnc#770760? It contains the 3.5 driver backported to the 12.2 3.4 kernel. It is found in the following obs project: https://build.opensuse.org/package/show?package=r8169&project=home%3Abenjamin_poirier%3Abnc770760
Sure. I am not familiar with the build service. Thus please allow me one question: do I need to recompile the kernel with the patched r8169.c or is the patched r8169 module already lying around somewhere? Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c5 --- Comment #5 from Benjamin Poirier <bpoirier@suse.com> 2012-08-07 18:52:37 UTC --- No need to recompile, it's all done for you. You only have to add the repository, install the package and (in this case) make sure the new module is in use. In fact I've created a new OBS project just for this bug which should avoid some confusion if we need to test updated versions of the driver to narrow down which patch fixes the issue you observed. # zypper ar "http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/..." home:benjamin_poirier:bnc774557 # zypper search r8169 # uname -r # zypper in r8169-kmp-desktop # make sure to use the flavor that corresponds to your kernel (default, desktop, pae, xen) To make sure the new module is in use, the easiest way is to reboot. A faster way is to issue: # rcnetwork stop # modprobe -r r8169 # modprobe r8169 # rcnetwork start Please let me know if the problem you observed goes away with this driver. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c6 Boris Neubert <omega@online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|omega@online.de | --- Comment #6 from Boris Neubert <omega@online.de> 2012-08-08 20:25:43 UTC --- (In reply to comment #5)
Please let me know if the problem you observed goes away with this driver.
The frequent random lockup continue with this driver. Here are some lines from the log: ... Aug 8 20:56:01 sauron kernel: [ 2.429027] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded Aug 8 20:56:01 sauron kernel: [ 2.429209] r8169 0000:02:00.0: irq 59 for MSI/MSI-X Aug 8 20:56:01 sauron kernel: [ 2.429357] r8169 0000:02:00.0: eth0: RTL8168b/8111b at 0xffffc90000656000, 00:19:66:94:53:d7, XID 18000000 IRQ 59 Aug 8 20:56:01 sauron kernel: [ 2.429472] r8169 0000:02:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko] ... Aug 8 20:56:52 sauron network-remotefs[1028]: Setting up (remotefs) network interfaces: Aug 8 20:57:00 sauron kernel: [ 72.704018] ------------[ cut here ]------------ Aug 8 20:57:00 sauron kernel: [ 72.704025] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.4.6/linux-3.4/net/sched/sch_generic.c:256 dev_watchdog+0x240/0x250() Aug 8 20:57:00 sauron kernel: [ 72.704027] Hardware name: To Be Filled By O.E.M. Aug 8 20:57:00 sauron kernel: [ 72.704028] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out Aug 8 20:57:00 sauron kernel: [ 72.704030] Modules linked in: snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore usb_storage joydev sr_mod cdrom mperf edac_core k8temp pcspkr edac_mce_amd shpchp pci_hotplug i2c_nforce2 autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc af_packet nouveau r8169(O) ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi processor button therma l_sys scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic pata_jmicron Aug 8 20:57:00 sauron kernel: [ 72.704056] Pid: 0, comm: swapper/1 Tainted: G O 3.4.6-1.1-desktop #1 Aug 8 20:57:00 sauron kernel: [ 72.704057] Call Trace: Aug 8 20:57:00 sauron kernel: [ 72.704066] [<ffffffff81004648>] dump_trace+0x88/0x300 Aug 8 20:57:00 sauron kernel: [ 72.704071] [<ffffffff8157faae>] dump_stack+0x69/0x6f Aug 8 20:57:00 sauron kernel: [ 72.704075] [<ffffffff8103fd89>] warn_slowpath_common+0x79/0xc0 Aug 8 20:57:00 sauron kernel: [ 72.704078] [<ffffffff8103fe85>] warn_slowpath_fmt+0x45/0x50 Aug 8 20:57:00 sauron kernel: [ 72.704081] [<ffffffff814b2520>] dev_watchdog+0x240/0x250 Aug 8 20:57:00 sauron kernel: [ 72.704086] [<ffffffff8104e9a7>] run_timer_softirq+0x137/0x3c0 Aug 8 20:57:00 sauron kernel: [ 72.704090] [<ffffffff81046ee6>] __do_softirq+0xb6/0x220 Aug 8 20:57:00 sauron kernel: [ 72.704094] [<ffffffff81593d8c>] call_softirq+0x1c/0x30 Aug 8 20:57:00 sauron kernel: [ 72.704097] [<ffffffff81004585>] do_softirq+0x75/0xb0 Aug 8 20:57:00 sauron kernel: [ 72.704101] [<ffffffff810473a5>] irq_exit+0xb5/0xc0 Aug 8 20:57:00 sauron kernel: [ 72.704105] [<ffffffff81021b38>] smp_apic_timer_interrupt+0x68/0xa0 Aug 8 20:57:00 sauron kernel: [ 72.704109] [<ffffffff8159343a>] apic_timer_interrupt+0x6a/0x70 Aug 8 20:57:00 sauron kernel: [ 72.704114] [<ffffffff8102cb42>] native_safe_halt+0x2/0x10 Aug 8 20:57:00 sauron kernel: [ 72.704118] [<ffffffff8100af97>] default_idle+0x47/0x280 Aug 8 20:57:00 sauron kernel: [ 72.704121] [<ffffffff8100b220>] amd_e400_idle+0x50/0x110 Aug 8 20:57:00 sauron kernel: [ 72.704125] [<ffffffff8100bdb6>] cpu_idle+0xa6/0xe0 Aug 8 20:57:00 sauron kernel: [ 72.704129] [<ffffffff8157871f>] start_secondary+0x217/0x21c Aug 8 20:57:00 sauron kernel: [ 72.704131] ---[ end trace ae2a1bc58f5bd036 ]--- Aug 8 20:57:00 sauron kernel: [ 72.704326] r8169 0000:02:00.0: eth0: link up Aug 8 20:57:06 sauron kernel: [ 78.720197] r8169 0000:02:00.0: eth0: link up Aug 8 20:57:07 sauron network-remotefs[1028]: Setting up service (remotefs) network . . . . . . . . . ...done -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c7 Benjamin Poirier <bpoirier@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO InfoProvider| |omega@online.de --- Comment #7 from Benjamin Poirier <bpoirier@suse.com> 2012-08-09 18:20:56 UTC --- Thank you for the testing and report. You've tried the 3.5 driver, let's move on to the 3.6 (net-next) version which proeminently contains the bql revert: http://thread.gmane.org/gmane.linux.network/238202 I've updated the kmp in the OBS project. You may update it on your end by running # zypper ref # zypper up r8169-kmp-desktop # or whichever -variant corresponds to your kernel Once again make sure the new module is in use by rebooting or restarting the network as explained in comment 5. Please report the results again with this driver. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c8 Boris Neubert <omega@online.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|omega@online.de | --- Comment #8 from Boris Neubert <omega@online.de> 2012-08-09 19:43:12 UTC --- (In reply to comment #7)
Please report the results again with this driver.
Thank you for providing the new driver. I am sorry but it does not work either. Same problem. Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c9 --- Comment #9 from Benjamin Poirier <bpoirier@suse.com> 2012-08-10 21:54:52 UTC --- Thanks again for all this testing. The last thing I'd like to ask you is to try the 3.6-rc1 kernel as whole in case I made some error when generating the packages. First uninstall the KMP that we've been using to run the tests so far: zypper rm r8169-kmp-desktop zypper rr home:benjamin_poirier:bnc774557 Then install and reboot into the freshly available 3.6-rc1 kernel using the instructions here: http://kernel.opensuse.org/packages/vanilla If the problem persist (as we would expect), it'll be time to report it upstream to the r8169 maintainer. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c10 --- Comment #10 from Boris Neubert <omega@online.de> 2012-08-12 10:21:39 UTC --- (In reply to comment #9) It occured to me that I made a mistake when testing your amended kernel module: My test system had NFS root. NFS root in openSUSE 12.2 RC1 has some issue (https://bugzilla.novell.com/show_bug.cgi?id=774548) that makes initrd creation and bootloader installation fail. Thus, although I had your kernel module in /lib/modules, it was not used because the module was loaded from the original initrd. Unloading/reloading the module while running with NFS root was no option. I thus repartitioned my workstation's disk and installed openSUSE 12.2 RC1 there. I then played around with it for some time and made sure that the dropouts could be observed with such configuration. Next I substituted the original module with your module from the https://build.opensuse.org/package/show?package=r8169&project=home%3Abenjamin_poirier%3Abnc770760 repo. I spent several hours working with the system and stressting it with NFS file transfers and iperf checks, and I did not experience any dropouts nor did I see the watchdog barking at a stalled driver. I will now continue working with openSUSE 12.2 RC1 for the next time to see if this situation prevails. Hopefully the issue is solved. Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c11 --- Comment #11 from Benjamin Poirier <bpoirier@suse.com> 2012-08-13 15:14:08 UTC --- This is good news. Congratulations for thinking about this and putting in the effort. Let me know how things go. In bnc#774552 another user with a RTL8168b/8111b (like yours) had troubles with the v3.5 driver (which is essentially what's in the kmp from home:benjamin_poirier:bnc770760 at the moment). He reported that his problems were fixed with one further patch that is now part of -stable: 17bcb68 r8169: revert "add byte queue limit support" It will be interesting to see if your problems go away without even a need for that patch. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c12 --- Comment #12 from Boris Neubert <omega@online.de> 2012-08-13 19:54:47 UTC --- (In reply to comment #11)
It will be interesting to see if your problems go away without even a need for that patch.
I spent some more time doing productive work with the updated driver (as at 11 Aug 2012) and the problems did not come back to me. As the dropouts were quite frequent before I would like to conclude that the problem is solved and this ticket can be closed. Thank you! Boris Personal note: I am a user of SuSE distros since version 6. I was quite dissatisfied with more and more problems that I encountered in the latest versions 11 and 12 and had it not been for the efforts to learn to deal with a new distro I already had switched to Ubuntu or whatever. Then I stumbled on the self-reflection of the openSUSE team on quality and that more testing was needed. And I decided to give it a try and start reporting all the things that annoyed me before the final release gets out. It was good to see with what professionalism issues were handled here and that encouraged me to report more issues in order to contribute to keep openSUSE my favorite distro. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c13 --- Comment #13 from Benjamin Poirier <bpoirier@suse.com> 2012-08-14 18:37:29 UTC --- (In reply to comment #12)
(In reply to comment #11)
It will be interesting to see if your problems go away without even a need for that patch.
I spent some more time doing productive work with the updated driver (as at 11 Aug 2012) and the problems did not come back to me. As the dropouts were quite frequent before I would like to conclude that the problem is solved and this ticket can be closed.
In order to close this bug entry it would be great if we could actually narrow down exactly which of the 11 commits to r8169 between v3.4 and v3.5 fixed the issue you experienced. This way we can include just this bug fix in 12.2 instead of blindly putting in everything. I went over the commits and identified the most likely candidate, namely: 7dbb491 r8169: avoid NAPI scheduling delay. Would you mind running another test? I've updated the kmp one more time. It now contains only this single patch over the 3.4 driver. It's the one in the OBS project for this bug: http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/... I think you'll have to remove the other repository and package as all the packages have the same name.
Personal note: I am a user of SuSE distros since version 6. I was quite dissatisfied with more and more problems that I encountered in the latest versions 11 and 12 and had it not been for the efforts to learn to deal with a new distro I already had switched to Ubuntu or whatever. Then I stumbled on the self-reflection of the openSUSE team on quality and that more testing was needed. And I decided to give it a try and start reporting all the things that annoyed me before the final release gets out. It was good to see with what professionalism issues were handled here and that encouraged me to report more issues in order to contribute to keep openSUSE my favorite distro.
Thank you, I'll pass on this feedback to others working on openSUSE. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c14 --- Comment #14 from Boris Neubert <omega@online.de> 2012-08-14 19:17:59 UTC --- (In reply to comment #13)
Would you mind running another test? I've updated the kmp one more time. It now contains only this single patch over the 3.4 driver. It's the one in the OBS project for this bug: http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/...
I am sorry. 450 seconds after inserting the latest stripped-down module, the watchdog bit me... [ 6729.928419] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 6729.928513] r8169 0000:02:00.0: irq 59 for MSI/MSI-X [ 6729.928762] r8169 0000:02:00.0: eth0: RTL8168b/8111b at 0xffffc90000678000, 00:19:66:94:53:d7, XID 18000000 IRQ 59 [ 6729.928765] r8169 0000:02:00.0: eth0: jumbo features [frames: 4080 bytes, tx checksumming: ko] [ 6740.159683] r8169 0000:02:00.0: eth0: link down [ 6740.159707] r8169 0000:02:00.0: eth0: link down [ 6740.160290] ADDRCONF(NETDEV_UP): eth0: link is not ready [ 6742.725272] r8169 0000:02:00.0: eth0: link up [ 6742.725689] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 7198.720014] ------------[ cut here ]------------ [ 7198.720022] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.4.6/linux-3.4/net/sched/sch_generic.c:256 dev_watchdog+0x240/0x250() [ 7198.720025] Hardware name: To Be Filled By O.E.M. [ 7198.720026] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 7198.720028] Modules linked in: r8169(O) fuse vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nfs lockd fscache auth_rpcgss nfs_acl sunrpc shpchp pci_hotplug sr_mod edac_core edac_mce_amd joydev mperf snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq snd_timer snd_seq_device snd k8temp i2c_nforce2 pcspkr soundcore cdrom autofs4 nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi button processor thermal_sys scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic pata_jmicron [last unloaded: r8169] [ 7198.720057] Pid: 0, comm: swapper/1 Tainted: G O 3.4.6-1.1-desktop #1 [ 7198.720059] Call Trace: [ 7198.720069] [<ffffffff81004648>] dump_trace+0x88/0x300 [ 7198.720074] [<ffffffff8157faae>] dump_stack+0x69/0x6f [ 7198.720078] [<ffffffff8103fd89>] warn_slowpath_common+0x79/0xc0 [ 7198.720082] [<ffffffff8103fe85>] warn_slowpath_fmt+0x45/0x50 [ 7198.720085] [<ffffffff814b2520>] dev_watchdog+0x240/0x250 [ 7198.720090] [<ffffffff8104e9a7>] run_timer_softirq+0x137/0x3c0 [ 7198.720094] [<ffffffff81046ee6>] __do_softirq+0xb6/0x220 [ 7198.720098] [<ffffffff81593d8c>] call_softirq+0x1c/0x30 [ 7198.720101] [<ffffffff81004585>] do_softirq+0x75/0xb0 [ 7198.720104] [<ffffffff810473a5>] irq_exit+0xb5/0xc0 [ 7198.720109] [<ffffffff81021b38>] smp_apic_timer_interrupt+0x68/0xa0 [ 7198.720113] [<ffffffff8159343a>] apic_timer_interrupt+0x6a/0x70 [ 7198.720118] [<ffffffff8102cb42>] native_safe_halt+0x2/0x10 [ 7198.720122] [<ffffffff8100af97>] default_idle+0x47/0x280 [ 7198.720125] [<ffffffff8100b220>] amd_e400_idle+0x50/0x110 [ 7198.720128] [<ffffffff8100bdb6>] cpu_idle+0xa6/0xe0 [ 7198.720132] [<ffffffff8157871f>] start_secondary+0x217/0x21c [ 7198.720135] ---[ end trace f97d5f5010a19c9b ]--- [ 7198.720334] r8169 0000:02:00.0: eth0: link up Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c15 --- Comment #15 from Benjamin Poirier <bpoirier@suse.com> 2012-08-17 21:25:13 UTC --- That was my best guess, how annoying! I've gone over the patches again (including the one I had forgotten to include in the package all along). Out of the few that affect RTL8168b/8111b (mac version 12 or 17) I don't see which other one could include the solution: a9d7e79 r8169.c: fix comment typo no code change e1593bb r8169: Support the get_ts_info ethtool method. timestamping related, does not work on 3.4 851e602 r8169: Config1 is read-only on 8168c and later. affects MAC_VER_18 and up d387b42 r8169: 8168c and later require bit 0x20 to be set in Config2 for PME signaling. affects MAC_VER_18 and up 0004299 r8169: modify pll power function affects MAC_VER_07, 08, 09, 10, 16, 27, 28, 29, 30, 31, 37, 39 beb1fe1 r8169: add device specific CSI access helpers. code reorganisation, no functionnal effect (I think) 7e18dca r8169: support the new RTL8402 chip. affects MAC_VER_37 5f886e0 r8169: adjust some functions of 8111f affects MAC_VER_35, 36 b3d7b2f r8169: support the new RTL8411 chip. affects MAC_VER_38 ad1be8d r8169: call netif_napi_del at errpaths and at driver unload affects unload and error path (not included in confirmed working driver) 7dbb491 r8169: avoid NAPI scheduling delay. bnc#774557c14: does not fix the problem eb2dc35 r8169: RxConfig hack for the 8168evl. affects MAC_VER_34 Beyond here, bnc#774557c10: v3.5 driver fixes the problem Would you be willing to test about four more packages to bisect the solution? $ git bisect start v3.5 v3.4 -- drivers/net/ethernet/realtek/r8169.c Bisecting: 7 revisions left to test after this (roughly 3 steps) If so, I've updated the OBS package once more with the first bisection step. Please report the results and I'll generate the next step accordingly. dmesg should also include an extra line just after the usual backtrace, ie. after "---[ end trace f97d5f5010a19c9b ]---" that contains "unmasked XID". Please include this line in your report. Thanks. http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/... -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c16 --- Comment #16 from Boris Neubert <omega@online.de> 2012-08-18 10:26:34 UTC --- (In reply to comment #15)
Would you be willing to test about four more packages to bisect the solution?
yes, although I will be away most of next week and we will probably finish testing not before the end of next week.
http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/...
Does not work. [ 185.712016] ------------[ cut here ]------------ [ 185.712024] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.4.6/linux-3.4/net/sched/sch_generic.c:256 dev_watchdog+0x240/0x250() [ 185.712026] Hardware name: To Be Filled By O.E.M. [ 185.712028] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out [ 185.712029] Modules linked in: fuse vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nfs lockd fscache auth_rpcgss nfs_acl sunrpc r8169(O) sr_mod shpchp pci_hotplug joydev pcspkr k8temp i2c_nforce2 edac_core edac_mce_amd cdrom snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq mperf snd_timer snd_seq_device snd soundcore autofs4 nouveau ttm processor drm_kms_helper drm i2c_algo_bit mxm_wmi video thermal_sys wmi button scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic pata_jmicron [ 185.712058] Pid: 0, comm: swapper/1 Tainted: G O 3.4.6-1.1-desktop #1 [ 185.712059] Call Trace: [ 185.712069] [<ffffffff81004648>] dump_trace+0x88/0x300 [ 185.712073] [<ffffffff8157faae>] dump_stack+0x69/0x6f [ 185.712077] [<ffffffff8103fd89>] warn_slowpath_common+0x79/0xc0 [ 185.712081] [<ffffffff8103fe85>] warn_slowpath_fmt+0x45/0x50 [ 185.712084] [<ffffffff814b2520>] dev_watchdog+0x240/0x250 [ 185.712089] [<ffffffff8104e9a7>] run_timer_softirq+0x137/0x3c0 [ 185.712093] [<ffffffff81046ee6>] __do_softirq+0xb6/0x220 [ 185.712097] [<ffffffff81593d8c>] call_softirq+0x1c/0x30 [ 185.712100] [<ffffffff81004585>] do_softirq+0x75/0xb0 [ 185.712104] [<ffffffff810473a5>] irq_exit+0xb5/0xc0 [ 185.712108] [<ffffffff81021b38>] smp_apic_timer_interrupt+0x68/0xa0 [ 185.712112] [<ffffffff8159343a>] apic_timer_interrupt+0x6a/0x70 [ 185.712117] [<ffffffff8102cb42>] native_safe_halt+0x2/0x10 [ 185.712121] [<ffffffff8100af97>] default_idle+0x47/0x280 [ 185.712124] [<ffffffff8100b220>] amd_e400_idle+0x50/0x110 [ 185.712127] [<ffffffff8100bdb6>] cpu_idle+0xa6/0xe0 [ 185.712131] [<ffffffff8157871f>] start_secondary+0x217/0x21c [ 185.712134] ---[ end trace a15f109f3f6626ed ]--- [ 185.712138] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 5f886e0 [ 185.712334] r8169 0000:02:00.0: eth0: link up [ 455.712017] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 5f886e0 [ 455.712184] r8169 0000:02:00.0: eth0: link up [ 953.712024] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 5f886e0 [ 953.712215] r8169 0000:02:00.0: eth0: link up -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c17 --- Comment #17 from Benjamin Poirier <bpoirier@suse.com> 2012-08-20 19:42:24 UTC --- (In reply to comment #16)
(In reply to comment #15)
Would you be willing to test about four more packages to bisect the solution?
yes, although I will be away most of next week and we will probably finish testing not before the end of next week.
Not a problem. It's kind enough of you to run these tests.
http://download.opensuse.org/repositories/home:/benjamin_poirier:/bnc774557/...
Does not work.
I've updated the package with the next bisection step: Bisecting: 2 revisions left to test after this (roughly 2 steps) I've also updated the "srcversion" printed in the XID line accordingly and I've made it so it gets printed at probe time as well, on the "r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded" line. Please make sure that when you test the updated package it prints srcversion e8650a0. The most recent code upstream agrees that MAC_VER_12 is the right one for XID 3b000600. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c18 --- Comment #18 from Boris Neubert <omega@online.de> 2012-08-24 13:31:13 UTC --- (In reply to comment #17)
(In reply to comment #16)
(In reply to comment #15)
I've also updated the "srcversion" printed in the XID line accordingly and I've made it so it gets printed at probe time as well, on the "r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded" line. Please make sure that when you test the updated package it prints srcversion e8650a0.
The most recent code upstream agrees that MAC_VER_12 is the right one for XID 3b000600.
This driver version does not work either. Aug 24 15:27:35 sauron kernel: [ 477.712017] ------------[ cut here ]------------ Aug 24 15:27:35 sauron kernel: [ 477.712027] WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.4.6/linux-3.4/net/sched/sch_generic.c:256 dev_watchdog+0x240/0x250() Aug 24 15:27:35 sauron kernel: [ 477.712029] Hardware name: To Be Filled By O.E.M. Aug 24 15:27:35 sauron kernel: [ 477.712030] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out Aug 24 15:27:35 sauron kernel: [ 477.712032] Modules linked in: r8169(O) fuse vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) nfs lockd fscache auth_rpcgss nfs_acl sunrpc shpchp edac_core snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq snd_timer snd_seq_device snd joydev pci_hotplug edac_mce_amd sr_mod i2c_nforce2 cdrom soundcore mperf k8temp pcspkr autofs4 nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi processor thermal_sys button scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ata_generic pata_jmicron [last unloaded: r8169] Aug 24 15:27:35 sauron kernel: [ 477.712061] Pid: 0, comm: swapper/1 Tainted: G O 3.4.6-1.1-desktop #1 Aug 24 15:27:35 sauron kernel: [ 477.712063] Call Trace: Aug 24 15:27:35 sauron kernel: [ 477.712072] [<ffffffff81004648>] dump_trace+0x88/0x300 Aug 24 15:27:35 sauron kernel: [ 477.712077] [<ffffffff8157faae>] dump_stack+0x69/0x6f Aug 24 15:27:35 sauron kernel: [ 477.712081] [<ffffffff8103fd89>] warn_slowpath_common+0x79/0xc0 Aug 24 15:27:35 sauron kernel: [ 477.712084] [<ffffffff8103fe85>] warn_slowpath_fmt+0x45/0x50 Aug 24 15:27:35 sauron kernel: [ 477.712087] [<ffffffff814b2520>] dev_watchdog+0x240/0x250 Aug 24 15:27:35 sauron kernel: [ 477.712092] [<ffffffff8104e9a7>] run_timer_softirq+0x137/0x3c0 Aug 24 15:27:35 sauron kernel: [ 477.712097] [<ffffffff81046ee6>] __do_softirq+0xb6/0x220 Aug 24 15:27:35 sauron kernel: [ 477.712100] [<ffffffff81593d8c>] call_softirq+0x1c/0x30 Aug 24 15:27:35 sauron kernel: [ 477.712103] [<ffffffff81004585>] do_softirq+0x75/0xb0 Aug 24 15:27:35 sauron kernel: [ 477.712107] [<ffffffff810473a5>] irq_exit+0xb5/0xc0 Aug 24 15:27:35 sauron kernel: [ 477.712111] [<ffffffff81021b38>] smp_apic_timer_interrupt+0x68/0xa0 Aug 24 15:27:35 sauron kernel: [ 477.712116] [<ffffffff8159343a>] apic_timer_interrupt+0x6a/0x70 Aug 24 15:27:35 sauron kernel: [ 477.712121] [<ffffffff8102cb42>] native_safe_halt+0x2/0x10 Aug 24 15:27:35 sauron kernel: [ 477.712124] [<ffffffff8100af97>] default_idle+0x47/0x280 Aug 24 15:27:35 sauron kernel: [ 477.712128] [<ffffffff8100b220>] amd_e400_idle+0x50/0x110 Aug 24 15:27:35 sauron kernel: [ 477.712131] [<ffffffff8100bdb6>] cpu_idle+0xa6/0xe0 Aug 24 15:27:35 sauron kernel: [ 477.712135] [<ffffffff8157871f>] start_secondary+0x217/0x21c Aug 24 15:27:35 sauron kernel: [ 477.712138] ---[ end trace 8dc62fc4a811db6b ]--- Aug 24 15:27:35 sauron kernel: [ 477.712142] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion e8650a0 Aug 24 15:27:35 sauron kernel: [ 477.712308] r8169 0000:02:00.0: eth0: link up Kind regards Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c19 --- Comment #19 from Benjamin Poirier <bpoirier@suse.com> 2012-08-27 16:38:17 UTC --- (In reply to comment #18) [...]
This driver version does not work either.
Thank you. I've updated the package with the next bisection step. Bisecting: 1 revision left to test after this (roughly 1 step) The srcversion is now "ad1be8d". In the meantime I've noticed that this commit has been added to the upstream stable kernel v3.4.5. Please report test results. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c20 --- Comment #20 from Boris Neubert <omega@online.de> 2012-08-27 17:10:42 UTC --- (In reply to comment #19)
(In reply to comment #18) [...] The srcversion is now "ad1be8d".
This driver has the same problem: Aug 27 19:08:42 sauron kernel: [43409.712016] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion ad1be8d Aug 27 19:08:42 sauron kernel: [43409.712176] r8169 0000:02:00.0: eth0: link up Regards, Boris -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c21 --- Comment #21 from Benjamin Poirier <bpoirier@suse.com> 2012-08-27 19:12:08 UTC --- Thank you. I've updated the package once more, for the final bisection step: Bisecting: 0 revisions left to test after this (roughly 0 steps) srcversion is now 7dbb491 Please report results. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c22 --- Comment #22 from Boris Neubert <omega@online.de> 2012-08-28 08:16:51 UTC --- (In reply to comment #21)
Thank you. I've updated the package once more, for the final bisection step: Bisecting: 0 revisions left to test after this (roughly 0 steps)
srcversion is now 7dbb491
Please report results.
This driver also has frequent lockups: Aug 28 10:11:42 sauron kernel: [55590.720020] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 7dbb491 Aug 28 10:11:42 sauron kernel: [55590.720208] r8169 0000:02:00.0: eth0: link up Aug 28 10:12:30 sauron kernel: [55638.720025] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 7dbb491 Aug 28 10:12:30 sauron kernel: [55638.721175] r8169 0000:02:00.0: eth0: link up Aug 28 10:12:54 sauron kernel: [55662.720020] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion 7dbb491 Aug 28 10:12:54 sauron kernel: [55662.721178] r8169 0000:02:00.0: eth0: link up Strange, shouldn't we have found a working one by now? In between the tests I used a driver from the early tests that is stable: # strings r8169.ko | grep srcversion srcversion=5E0D51589ACA097D2A990F7 What now? Kind regards Birus -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c23 --- Comment #23 from Benjamin Poirier <bpoirier@suse.com> 2012-09-06 20:18:41 UTC --- (In reply to comment #22)
This driver also has frequent lockups: [...]
Strange, shouldn't we have found a working one by now?
That was the last bisection step, given that I had specified the v3.5 version as working, following comment 10. These results seem to say that "eb2dc35 r8169: RxConfig hack for the 8168evl." is the commit that fixes your issue. Although this is the patch you had asked for in comment 0, it's completely unexpected to me. That patch affects only RTL_GIGA_MAC_VER_34 whereas you have a RTL_GIGA_MAC_VER_12. I've updated the KMP once again to include this patch. srcversion is now eb2dc35.
In between the tests I used a driver from the early tests that is stable:
# strings r8169.ko | grep srcversion srcversion=5E0D51589ACA097D2A990F7
This is the module generated from r2 or r3 of the OBS project. It contains the v3.6-rc1 driver version (minus one missing patch, my mistake).
What now?
Can you please test the updated KMP, once again? If it does work: 1) I'll be extremely surprised 2) We'll do a final test with the current 3.4 version plus only the eb2dc35 patch If it turns out that it does not work, we'll keep bisecting between 3.5..3.6-rc1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c24 --- Comment #24 from Boris Neubert <omega@online.de> 2012-09-08 07:10:09 UTC --- I tested with the latest driver Sep 8 09:00:59 sauron kernel: [153667.712018] r8169 0000:02:00.0: eth0: unmasked XID 3b000600 srcversion eb2dc35 It shows the same symptoms/lockups. I will now update the system to 12.2 final (new install). Then I am ready to continue testing with 3.5..3.6-rc1 bisecting. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=774557 https://bugzilla.novell.com/show_bug.cgi?id=774557#c25 Jeff Mahoney <jeffm@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX --- Comment #25 from Jeff Mahoney <jeffm@suse.com> 2014-08-08 16:29:03 EDT --- This report is against openSUSE 12.2 which is no longer under maintenance. If you are able to reproduce it with openSUSE 13.1 or openSUSE Factory, please re-open and reset the the "Product" field to the appropriate release. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com