Re: [opensuse-kernel] "SLAB: Unable to allocate memory" vs. vm.swappiness & vm/vfs_cache_pressure?

28 Feb 2013

      On 2/28/13 2:00 PM, chrysippus@operamail.com wrote:
...
On a 4GB RAM edge server running 12.2 w/ kernel 3.4.28/x86_64 I'm seeing
an increasing frequency of "SLAB: Unable to allocate memory on node 0"
(http://pastebin.com/xDPgtut8) errors in syslog, e.g.:
cat /var/log/messages
      ...
      Feb 28 09:24:30 core kernel: [777611.883853] The
      following is only an harmless informational message.
      Feb 28 09:24:30 core kernel: [777611.883860] Unless you
      get a _continuous_flood_ of these messages it means
      Feb 28 09:24:30 core kernel: [777611.883864] everything
      is working fine. Allocations from irqs cannot be
      Feb 28 09:24:30 core kernel: [777611.883868] perfectly
      reliable and the kernel is designed to handle that.
      Feb 28 09:24:30 core kernel: [777611.883873] swapper/2:
      page allocation failure. order:4, mode:0x20
      Feb 28 09:24:30 core kernel: [777611.883880] Pid: 0,
      comm: swapper/2 Tainted: P        W  O
      3.4.28-2.20-default #1
      Feb 28 09:24:30 core kernel: [777611.883885] Call Trace:
      Feb 28 09:24:30 core kernel: [777611.883908] 
      [<ffffffff81004598>] dump_trace+0x78/0x2c0
      Feb 28 09:24:30 core kernel: [777611.883921] 
      [<ffffffff81532e6a>] dump_stack+0x69/0x6f
      Feb 28 09:24:30 core kernel: [777611.883933] 
      [<ffffffff810fdf02>] warn_alloc_failed+0x102/0x1b0
      Feb 28 09:24:30 core kernel: [777611.883946] 
      [<ffffffff81101cfb>] __alloc_pages_nodemask+0x68b/0x9a0
      Feb 28 09:24:30 core kernel: [777611.883959] 
      [<ffffffff811401e2>] kmem_getpages+0x62/0x190
      Feb 28 09:24:30 core kernel: [777611.883969] 
      [<ffffffff81140ddf>] fallback_alloc+0x21f/0x270
      Feb 28 09:24:30 core kernel: [777611.883981] 
      [<ffffffff8114153b>] __kmalloc+0x14b/0x1d0
      Feb 28 09:24:30 core kernel: [777611.883994] 
      [<ffffffff8143e87c>] pskb_expand_head+0x8c/0x310
      Feb 28 09:24:30 core kernel: [777611.884006] 
      [<ffffffff814d9355>] ip6_forward+0x515/0x770
      Feb 28 09:24:30 core kernel: [777611.884027] 
      [<ffffffff81449712>] __netif_receive_skb+0x532/0x600
      Feb 28 09:24:30 core kernel: [777611.884039] 
      [<ffffffff81449881>] process_backlog+0xa1/0x180
      Feb 28 09:24:30 core kernel: [777611.884049] 
      [<ffffffff8144a280>] net_rx_action+0x140/0x230
      Feb 28 09:24:30 core kernel: [777611.884062] 
      [<ffffffff810459be>] __do_softirq+0xae/0x1c0
      Feb 28 09:24:30 core kernel: [777611.884073] 
      [<ffffffff8154724c>] call_softirq+0x1c/0x30
      Feb 28 09:24:30 core kernel: [777611.884082] 
      [<ffffffff810044e5>] do_softirq+0x75/0xb0
      Feb 28 09:24:30 core kernel: [777611.884094] 
      [<ffffffff81045d55>] irq_exit+0xa5/0xb0
      Feb 28 09:24:30 core kernel: [777611.884105] 
      [<ffffffff8100413b>] do_IRQ+0x5b/0xd0
      Feb 28 09:24:30 core kernel: [777611.884115] 
      [<ffffffff8153e5ea>] common_interrupt+0x6a/0x6a
      Feb 28 09:24:30 core kernel: [777611.884129] 
      [<ffffffff8131861c>] intel_idle+0xec/0x160
      Feb 28 09:24:30 core kernel: [777611.884141] 
      [<ffffffff81412bfd>] cpuidle_idle_call+0x9d/0x240
      Feb 28 09:24:30 core kernel: [777611.884152] 
      [<ffffffff8100b5f5>] cpu_idle+0x65/0xd0
      Feb 28 09:24:30 core kernel: [777611.884163] 
      [<ffffffff8152be3d>] start_secondary+0x208/0x20d
      Feb 28 09:24:30 core kernel: [777611.884171] SLAB:
      Unable to allocate memory on node 0 (gfp=0x20)
      Feb 28 09:24:30 core kernel: [777611.884176]   cache:
      size-65536, object size: 65536, order: 4
      Feb 28 09:24:30 core kernel: [777611.884184]   node 0:
      slabs: 19/19, objs: 19/19, free: 0
      Feb 28 09:24:30 core kernel: [777611.886007] The
      following is only an harmless informational message.
      Feb 28 09:24:30 core kernel: [777611.886015] Unless you
      get a _continuous_flood_ of these messages it means
      Feb 28 09:24:30 core kernel: [777611.886018] everything
      is working fine. Allocations from irqs cannot be
      Feb 28 09:24:30 core kernel: [777611.886022] perfectly
      reliable and the kernel is designed to handle that.
      Feb 28 09:24:30 core kernel: [777611.886027] kswapd0:
      page allocation failure. order:4, mode:0x20
      Feb 28 09:24:30 core kernel: [777611.886034] Pid: 35,
      comm: kswapd0 Tainted: P        W  O 3.4.28-2.20-default
      #1
      Feb 28 09:24:30 core kernel: [777611.886038] Call Trace:
      Feb 28 09:24:30 core kernel: [777611.886061] 
      [<ffffffff81004598>] dump_trace+0x78/0x2c0
      Feb 28 09:24:30 core kernel: [777611.886073] 
      [<ffffffff81532e6a>] dump_stack+0x69/0x6f
      Feb 28 09:24:30 core kernel: [777611.886085] 
      [<ffffffff810fdf02>] warn_alloc_failed+0x102/0x1b0
      Feb 28 09:24:30 core kernel: [777611.886097] 
      [<ffffffff81101cfb>] __alloc_pages_nodemask+0x68b/0x9a0
      Feb 28 09:24:30 core kernel: [777611.886110] 
      [<ffffffff811401e2>] kmem_getpages+0x62/0x190
      Feb 28 09:24:30 core kernel: [777611.886120] 
      [<ffffffff81140ddf>] fallback_alloc+0x21f/0x270
      Feb 28 09:24:30 core kernel: [777611.886132] 
      [<ffffffff8114153b>] __kmalloc+0x14b/0x1d0
      Feb 28 09:24:30 core kernel: [777611.886145] 
      [<ffffffff8143e87c>] pskb_expand_head+0x8c/0x310
      Feb 28 09:24:30 core kernel: [777611.886157] 
      [<ffffffff814d9355>] ip6_forward+0x515/0x770
      Feb 28 09:24:30 core kernel: [777611.886178] 
      [<ffffffff81449712>] __netif_receive_skb+0x532/0x600
      Feb 28 09:24:30 core kernel: [777611.886194] 
      [<ffffffff81449881>] process_backlog+0xa1/0x180
      Feb 28 09:24:30 core kernel: [777611.886210] 
      [<ffffffff8144a280>] net_rx_action+0x140/0x230
      Feb 28 09:24:30 core kernel: [777611.886229] 
      [<ffffffff810459be>] __do_softirq+0xae/0x1c0
      Feb 28 09:24:30 core kernel: [777611.886245] 
      [<ffffffff8154724c>] call_softirq+0x1c/0x30
      Feb 28 09:24:30 core kernel: [777611.886259] 
      [<ffffffff810044e5>] do_softirq+0x75/0xb0
      Feb 28 09:24:30 core kernel: [777611.886276] 
      [<ffffffff81045d55>] irq_exit+0xa5/0xb0
      Feb 28 09:24:30 core kernel: [777611.886291] 
      [<ffffffff8100413b>] do_IRQ+0x5b/0xd0
      Feb 28 09:24:30 core kernel: [777611.886304] 
      [<ffffffff8153e5ea>] common_interrupt+0x6a/0x6a
      Feb 28 09:24:30 core kernel: [777611.886319] 
      [<ffffffff8153e1d9>]
      _raw_spin_unlock_irqrestore+0x9/0x30
      Feb 28 09:24:30 core kernel: [777611.886329] 
      [<ffffffff8111a1b8>] compaction_alloc+0x288/0x2b0
      Feb 28 09:24:30 core kernel: [777611.886342] 
      [<ffffffff81144d83>] migrate_pages+0xb3/0x500
      Feb 28 09:24:30 core kernel: [777611.886352] 
      [<ffffffff8111ad47>] compact_zone+0x1d7/0x320
      Feb 28 09:24:30 core kernel: [777611.886364] 
      [<ffffffff8111af3b>] __compact_pgdat+0xab/0x170
      Feb 28 09:24:30 core kernel: [777611.886374] 
      [<ffffffff8111b237>] compact_pgdat+0x27/0x30
      Feb 28 09:24:30 core kernel: [777611.886384] 
      [<ffffffff8110dc8c>] balance_pgdat+0x6cc/0x710
      Feb 28 09:24:30 core kernel: [777611.886396] 
      [<ffffffff8110de18>] kswapd+0x148/0x370
      Feb 28 09:24:30 core kernel: [777611.886407] 
      [<ffffffff8105ec55>] kthread+0x85/0x90
      Feb 28 09:24:30 core kernel: [777611.886417] 
      [<ffffffff81547154>] kernel_thread_helper+0x4/0x10
      Feb 28 09:24:30 core kernel: [777611.886426] SLAB:
      Unable to allocate memory on node 0 (gfp=0x20)
      Feb 28 09:24:30 core kernel: [777611.886430]   cache:
      size-65536, object size: 65536, order: 4
      Feb 28 09:24:30 core kernel: [777611.886439]   node 0:
      slabs: 15/19, objs: 15/19, free: 4
      Feb 28 09:24:38 core kernel: [777620.441021] The
      following is only an harmless informational message.
      Feb 28 09:24:38 core kernel: [777620.441029] Unless you
      get a _continuous_flood_ of these messages it means
      Feb 28 09:24:38 core kernel: [777620.441033] everything
      is working fine. Allocations from irqs cannot be
      Feb 28 09:24:38 core kernel: [777620.441036] perfectly
      reliable and the kernel is designed to handle that.
      Feb 28 09:24:38 core kernel: [777620.441042] swapper/1:
      page allocation failure. order:4, mode:0x20
      Feb 28 09:24:38 core kernel: [777620.441048] Pid: 0,
      comm: swapper/1 Tainted: P        W  O
      3.4.28-2.20-default #1
      Feb 28 09:24:38 core kernel: [777620.441053] Call Trace:
      Feb 28 09:24:38 core kernel: [777620.441076] 
      [<ffffffff81004598>] dump_trace+0x78/0x2c0
      Feb 28 09:24:38 core kernel: [777620.441089] 
      [<ffffffff81532e6a>] dump_stack+0x69/0x6f
      ...
Atm, `slabtop` shows 99% usage for "ext4_inode_cache":
slabtop
       Active / Total Objects (% used)    : 1275578 / 1419483
       (89.9%)
       Active / Total Slabs (% used)      : 187764 / 187786
       (100.0%)
       Active / Total Caches (% used)     : 109 / 232 (47.0%)
       Active / Total Size (% used)       : 639586.50K /
       660294.86K (96.9%)
       Minimum / Average / Maximum Object : 0.02K / 0.46K /
       4096.00K
OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE
        NAME                  
      606252 601064  99%    0.85K 151563        4    606252K
      ext4_inode_cache
      380397 283799  74%    0.10K  10281       37     41124K
      buffer_head
      241600 231794  95%    0.19K  12080       20     48320K
      dentry
       34986  34973  99%    0.54K   4998        7     19992K
       inode_cache
       27435  12674  46%    0.06K    465       59      1860K
       size-64
       21952  21115  96%    0.03K    196      112       784K
       size-32
       18734  18690  99%    0.11K    551       34      2204K
       sysfs_dir_cache
       16086  14359  89%    0.55K   2298        7      9192K
       radix_tree_node
       13475  11758  87%    0.05K    175       77       700K
       jbd2_inode
      ...
This is with default values of:
cat /proc/sys/vm/vfs_cache_pressure
      100
  cat /proc/sys/vm/swappiness 
      60
Reading @
http://doc.opensuse.org/documentation/html/openSUSE/opensuse-tuning/cha.tuni...,
and taking a random stab at changing
echo 200 > /proc/sys/vm/vfs_cache_pressure
  echo 25  > /proc/sys/vm/swappiness
Seems to reduce -- thought not completely eliminate -- the syslog
errors.  `slabtop` shows no change.
After dropping cache
sync && echo 3 > /proc/sys/vm/drop_caches
`slabtop` usage for "ext4-inode_cache" seems to have dropped to 74%,
although other items are still ~99%:
slabtop
      Active / Total Objects (% used)    : 130790 / 216154 (60.5%)
       Active / Total Slabs (% used)      : 12482 / 12506 (99.8%)
       Active / Total Caches (% used)     : 109 / 232 (47.0%)
       Active / Total Size (% used)       : 40954.73K / 54620.83K
       (75.0%)
       Minimum / Average / Maximum Object : 0.02K / 0.25K /
       4096.00K
OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME  
       60740  22518  37%    0.19K   3037       20     12148K
       dentry
       23423   8006  34%    0.06K    397       59      1588K
       size-64
!!           21952  21119  96%    0.03K    196      112       784K
size-32
!!           19397  19126  98%    0.54K   2771        7     11084K
inode_cache
!!           18734  18690  99%    0.11K    551       34      2204K
sysfs_dir_cache
       16687   4514  27%    0.10K    451       37      1804K
       buffer_head
        4543   2173  47%    0.05K     59       77       236K
        anon_vma_chain
        4200   3013  71%    0.12K    140       30       560K
        size-128
!!            4088   4072  99%    0.13K    146       28       584K
ext4_groupinfo_4k
!!            4084   4079  99%    1.00K   1021        4      4084K
size-1024
        3784   2956  78%    0.17K    172       22       688K
        vm_area_struct
        3283    962  29%    0.55K    469        7      1876K
        radix_tree_node
        2596   1484  57%    0.06K     44       59       176K
        anon_vma  
        2160   1798  83%    0.02K     15      144        60K
        dm_target_io
        2116   1820  86%    0.04K     23       92        92K dm_io
        2020   1687  83%    0.19K    101       20       404K
        size-192
        1925     45   2%    0.05K     25       77       100K
        jbd2_inode
        1845   1108  60%    0.25K    123       15       492K filp
        1431   1344  93%    0.07K     27       53       108K
        Acpi-Operand
-->           1408   1042  74%    0.85K    352        4      1408K
ext4_inode_cache
      ...
With this config, post clear, for ~ 20 minutes, no more SLAB errors in
syslog.
My understanding of the effect/beahvior of Memory Subsytem tuning is
rudimentary at best.
Chatting in #opensuse-server, IIUC, it seems that it's not leaking, but
not automatically freeing up enough memory.
(1) Are the SLAB errors I'm seeing indicative of a problem in the sysctl
setting for swappiness & vfs_cache_pressure, particularly for server
use?  Or something else?
No. The problem you're seeing is due to memory fragmentation, not lack
of memory. The allocation request is order-4, which means it needs 16
physically contiguous pages to be satisfied. In a system that has been
running for a while, these allocations can be hard to fill. They're
typically avoided entirely. In this case, the request must also be
fulfilled without scheduling which limits the options significantly.

You were able to work around the problem by adjusting vfs_cache_pressure
because it essentially dropped all of the VFS cache (see
ext4_inode_cache go from ~ 600MB to 1.4MB) which freed a bunch of pages,
allowing the requests to go forward -- assuming they were still being
made. I'm not a networking expert, but it looks as though the request
was to fulfill a request to receive a large (32k+) frame.
...
(2) What, if any, specific/initial guidance or recommendation exists for
these values for Opensuse server, not desktop, usage?
Well, you have other problems. You have the 'W' (WARN_ON), 'P'
(Proprietary), and 'O' (out-of-tree) bits set. Something further up in
the log has gone wrong and it's not part of your paste. We'll also want
to see, as Greg said, if you can still reproduce it without the
closed-source driver loaded.

-Jeff

-- 
Jeff Mahoney
SUSE Labs