[Bug 1104824] New: Bug in xen-netfront.c, concerning replacement of MAX_SKB_FRAGS with XEN_NETIF_NR_SLOTS_MIN
http://bugzilla.suse.com/show_bug.cgi?id=1104824 Bug ID: 1104824 Summary: Bug in xen-netfront.c, concerning replacement of MAX_SKB_FRAGS with XEN_NETIF_NR_SLOTS_MIN Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: x86-64 OS: Linux Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: Manfred.Haertel@rz-online.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- After a recent kernel upgrade in a DomU, which run on a system with an older kernel in Dom0, the DomU crashed on heavy incoming network load. So I filed https://bugzilla.suse.com/show_bug.cgi?id=1102997 and later I invalidated my own bug report, because I could not reproduce the bug with a current Dom0 kernel. However, after some debugging and trying to understand how the driver works, I still think that the change where MAX_SKB_FRAGS was replaced by XEN_NETIF_NR_SLOTS_MIN is not completely correct. I hope my wording is correct (or at least understandable), trying to explain what I think is wrong. The variable max in xennet_get_responses() can now become larger by one than before. BEFORE the patch, the variable was 17 or 18, depending on rx->status <= RX_COPY_THRESHOLD, which means it depends on the size of the first fragment. If the first fragment was larger than the threshold, the variable became 17, and 18 otherwise. So, if a (ethernet GSO) frame with 18 fragments was received AND the first fragment was "large" (larger than the threshold), the condition slots > max in xennet_get_responses() was met, and the whole frame was discarded (and the infamous message "Too many slots" appeared). I don't think the Xen network code allows a frame with 19 fragments, so the condition slots > max could not be met at all when the first fragment was "small". So, in xennet_fill_frags() we could be sure, that if we have a frame with 18 fragments, then the first fragment would be "small". In THIS case, if the condition shinfo->nr_frags == MAX_SKB_FRAGS was met (meaning we have 18 fragments), we knew that the first fragment was small and so __pskb_pull_tail() was able to decrement nr_frags by one. So, BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS) could never lead to a crash because nr_frags was now smaller than MAX_SKB_FRAGS. NOW the variable max in xennet_get_responses() can become 18 or 19, again depending on the size of the first fragment. Again, as (I think) a frame with 19 or even 20 fragments is not possible, the condition slots > max is NEVER met and the message "Too many slots" is actually impossible. However, in xennet_fill_frags() things are different. A frame with 18 fragments is now possible even if the first fragment is "large" (larger than RX_COPY_THRESHOLD). The problem is, that in this case __pskb_pull_tail() is NOT able to decrement shinfo->nr_frags by one. I THINK this is coded in the following lines in xennet_poll(): NETFRONT_SKB_CB(skb)->pull_to = rx->status; if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD) NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD; (I admit that I do not really understand what this actually does). So, with a "large" first fragment, shinfo->nr_frags is *still* MAX_SKB_FRAGS (18) after the pull, and so BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); causes a crash. It seems that the crash does NOT happen with a recent Dom0 kernel. I also tried to understand why. It seems as if the xen-netback driver in recent kernels does send only a maximum of *16* fragments (at least, I could not provoke it to send more). So, of course, the crash can never happen. On the other hand, I think, the code in xen-netfront is not correct in general. A Dom0 kernel which sends 18 fragments with a "large" first fragment is able to crash the DomU. For example, older linux kernels in Dom0, though I do not know when the change in netback to a fewer number of fragments happened. I also do not understand which is the universally correct solution. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104824
Manfred Härtel
http://bugzilla.suse.com/show_bug.cgi?id=1104824
Manfred Härtel
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c1
--- Comment #1 from Manfred Härtel
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c2
--- Comment #2 from Jürgen Groß
As the problem only occurs when the first fragment is larger than the RX_COPY_THRESHOLD, I had the idea to enlarge RX_COPY_THRESHOLD to the maximum size of a fragment:
#define RX_COPY_THRESHOLD 4096
There is no explanation in the source code why RX_COPY_THRESHOLD is only 256 at the moment. May be someone can explain.
I guess the main problem with setting RX_COPY_THRESHOLD to 4096 is that allocation of each RX buffer needs more than one page of memory. Only with RX_COPY_THRESHOLD smaller than 3776 bytes (PAGE_SIZE - sizeof(struct skb_shared_info)) one page will be able to hold the needed data.
However, my first tests with the enlarged value of 4096 were successful and free of side effects.
I guess with some memory pressure and long system-up times the picture might change. I'm looking at two possible approaches in the moment: Either I'd like to set the max number of slots depending on the backend version (for old backends one less than for new backends), but I guess this is a rather fragile solution. What I would like better is to replace the BUG_ON() in xennet_fill_frags() with a way to fail the receive operation gracefully. This could be made less probable by upping RX_COPY_THRESHOLD to a value like above mentioned 3776. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c3
Jürgen Groß
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c4
Manfred Härtel
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c5
--- Comment #5 from Jürgen Groß
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c6
--- Comment #6 from Manfred Härtel
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c7
Jeffrey Cheung
No side effects of your patch until now, as expected.
Thank YOU for your competent reactions and for the patch.
I close the bug report as everything looks good now. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c8
--- Comment #8 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c9
--- Comment #9 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c10
--- Comment #10 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c11
--- Comment #11 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c13
--- Comment #13 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c14
--- Comment #14 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c16
--- Comment #16 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c17
--- Comment #17 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c18
--- Comment #18 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c22
--- Comment #22 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c23
--- Comment #23 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c24
--- Comment #24 from Swamp Workflow Management
http://bugzilla.suse.com/show_bug.cgi?id=1104824
http://bugzilla.suse.com/show_bug.cgi?id=1104824#c25
--- Comment #25 from Swamp Workflow Management
participants (1)
-
bugzilla_noreply@novell.com