Comment # 35 on bug 959230 from Mel Gorman

(In reply to Oliver Neukum from comment #33)
> Mel, it looks like we are seeing an issue with MM acting up on machines with
> a lot of RAM in 42.3. I don't suppose we can take a patch that alters MM so
> fundamentally. Any possible work arounds? Creative gfp flags?

Note that I'm not very active in MM at the moment due to other
responsibilities. However, I think the fact that it bisects to this particular
commit may be a partial co-incidence. The nature of the error messages appear
to be due to a failure when calling usb_control_msg has an error when called
from the driver. usb_control_msg calls kmalloc(sizeof(struct usb_ctrlrequest),
GFP_NOIO) so what is likely to be happening is that the kmalloc fails but it's
not able to reclaim many pages due to the GFP context.

The commit in question has a side-effect of causing kswapd to wake to reclaim
from zones earlier than it does without the patch. Specifically, without the
patch, zones are evenly used until kswapd is required. With the patch, the
Normal zone in this case fills first, wakes kswapd, uses lower zones etc. As a
side-effect, the early reclaim means memory is freed earlier and a GFP_NOIO
call is more likely to successfully complete on an x86 system. Altering the
amount of memory so that there is a normal zone could co-incidentally alter the
timing of when kswapd wakes which may be why it's visible.

Applying the patch is not without consequences. The patch does not work in
isolation, it only works properly if all the dependent patches are included
that move all the LRU lists to the node and that is a non-trivial backport that
would cause KABI issues if it was included which forces it to be 42.3-specific.
If the patch is included on its own, it'll introduce page age inversion issues
and so introduce regressions that are easy to detect but very difficult to
isolate as the root cause.

Any tuning option from the MM side requires other patches. However, increasing
min_free_kbytes *may* mitigate the problem so it is worth trying but it may
also just delay the problem for longer. The other option is a 42.3-specific
hack that should not be forward-ported that specifies __GFP_HIGH in
usb_control_msg as a failure to allocate there can result in complete failure
of a driver. That would allow the drivers to dip into the page allocation
reserves similar to what can happen in IRQ context (which is not the context
here) and hope kswapd catches up before the reserve is depleted. It might be
lower risk overall.