On 2/28/13 2:00 PM, chrysippus@operamail.com wrote:
On a 4GB RAM edge server running 12.2 w/ kernel 3.4.28/x86_64 I'm seeing an increasing frequency of "SLAB: Unable to allocate memory on node 0" (http://pastebin.com/xDPgtut8) errors in syslog, e.g.:
cat /var/log/messages ... Feb 28 09:24:30 core kernel: [777611.883853] The following is only an harmless informational message. Feb 28 09:24:30 core kernel: [777611.883860] Unless you get a _continuous_flood_ of these messages it means Feb 28 09:24:30 core kernel: [777611.883864] everything is working fine. Allocations from irqs cannot be Feb 28 09:24:30 core kernel: [777611.883868] perfectly reliable and the kernel is designed to handle that. Feb 28 09:24:30 core kernel: [777611.883873] swapper/2: page allocation failure. order:4, mode:0x20 Feb 28 09:24:30 core kernel: [777611.883880] Pid: 0, comm: swapper/2 Tainted: P W O 3.4.28-2.20-default #1 Feb 28 09:24:30 core kernel: [777611.883885] Call Trace: Feb 28 09:24:30 core kernel: [777611.883908] [<ffffffff81004598>] dump_trace+0x78/0x2c0 Feb 28 09:24:30 core kernel: [777611.883921] [<ffffffff81532e6a>] dump_stack+0x69/0x6f Feb 28 09:24:30 core kernel: [777611.883933] [<ffffffff810fdf02>] warn_alloc_failed+0x102/0x1b0 Feb 28 09:24:30 core kernel: [777611.883946] [<ffffffff81101cfb>] __alloc_pages_nodemask+0x68b/0x9a0 Feb 28 09:24:30 core kernel: [777611.883959] [<ffffffff811401e2>] kmem_getpages+0x62/0x190 Feb 28 09:24:30 core kernel: [777611.883969] [<ffffffff81140ddf>] fallback_alloc+0x21f/0x270 Feb 28 09:24:30 core kernel: [777611.883981] [<ffffffff8114153b>] __kmalloc+0x14b/0x1d0 Feb 28 09:24:30 core kernel: [777611.883994] [<ffffffff8143e87c>] pskb_expand_head+0x8c/0x310 Feb 28 09:24:30 core kernel: [777611.884006] [<ffffffff814d9355>] ip6_forward+0x515/0x770 Feb 28 09:24:30 core kernel: [777611.884027] [<ffffffff81449712>] __netif_receive_skb+0x532/0x600 Feb 28 09:24:30 core kernel: [777611.884039] [<ffffffff81449881>] process_backlog+0xa1/0x180 Feb 28 09:24:30 core kernel: [777611.884049] [<ffffffff8144a280>] net_rx_action+0x140/0x230 Feb 28 09:24:30 core kernel: [777611.884062] [<ffffffff810459be>] __do_softirq+0xae/0x1c0 Feb 28 09:24:30 core kernel: [777611.884073] [<ffffffff8154724c>] call_softirq+0x1c/0x30 Feb 28 09:24:30 core kernel: [777611.884082] [<ffffffff810044e5>] do_softirq+0x75/0xb0 Feb 28 09:24:30 core kernel: [777611.884094] [<ffffffff81045d55>] irq_exit+0xa5/0xb0 Feb 28 09:24:30 core kernel: [777611.884105] [<ffffffff8100413b>] do_IRQ+0x5b/0xd0 Feb 28 09:24:30 core kernel: [777611.884115] [<ffffffff8153e5ea>] common_interrupt+0x6a/0x6a Feb 28 09:24:30 core kernel: [777611.884129] [<ffffffff8131861c>] intel_idle+0xec/0x160 Feb 28 09:24:30 core kernel: [777611.884141] [<ffffffff81412bfd>] cpuidle_idle_call+0x9d/0x240 Feb 28 09:24:30 core kernel: [777611.884152] [<ffffffff8100b5f5>] cpu_idle+0x65/0xd0 Feb 28 09:24:30 core kernel: [777611.884163] [<ffffffff8152be3d>] start_secondary+0x208/0x20d Feb 28 09:24:30 core kernel: [777611.884171] SLAB: Unable to allocate memory on node 0 (gfp=0x20) Feb 28 09:24:30 core kernel: [777611.884176] cache: size-65536, object size: 65536, order: 4 Feb 28 09:24:30 core kernel: [777611.884184] node 0: slabs: 19/19, objs: 19/19, free: 0 Feb 28 09:24:30 core kernel: [777611.886007] The following is only an harmless informational message. Feb 28 09:24:30 core kernel: [777611.886015] Unless you get a _continuous_flood_ of these messages it means Feb 28 09:24:30 core kernel: [777611.886018] everything is working fine. Allocations from irqs cannot be Feb 28 09:24:30 core kernel: [777611.886022] perfectly reliable and the kernel is designed to handle that. Feb 28 09:24:30 core kernel: [777611.886027] kswapd0: page allocation failure. order:4, mode:0x20 Feb 28 09:24:30 core kernel: [777611.886034] Pid: 35, comm: kswapd0 Tainted: P W O 3.4.28-2.20-default #1 Feb 28 09:24:30 core kernel: [777611.886038] Call Trace: Feb 28 09:24:30 core kernel: [777611.886061] [<ffffffff81004598>] dump_trace+0x78/0x2c0 Feb 28 09:24:30 core kernel: [777611.886073] [<ffffffff81532e6a>] dump_stack+0x69/0x6f Feb 28 09:24:30 core kernel: [777611.886085] [<ffffffff810fdf02>] warn_alloc_failed+0x102/0x1b0 Feb 28 09:24:30 core kernel: [777611.886097] [<ffffffff81101cfb>] __alloc_pages_nodemask+0x68b/0x9a0 Feb 28 09:24:30 core kernel: [777611.886110] [<ffffffff811401e2>] kmem_getpages+0x62/0x190 Feb 28 09:24:30 core kernel: [777611.886120] [<ffffffff81140ddf>] fallback_alloc+0x21f/0x270 Feb 28 09:24:30 core kernel: [777611.886132] [<ffffffff8114153b>] __kmalloc+0x14b/0x1d0 Feb 28 09:24:30 core kernel: [777611.886145] [<ffffffff8143e87c>] pskb_expand_head+0x8c/0x310 Feb 28 09:24:30 core kernel: [777611.886157] [<ffffffff814d9355>] ip6_forward+0x515/0x770 Feb 28 09:24:30 core kernel: [777611.886178] [<ffffffff81449712>] __netif_receive_skb+0x532/0x600 Feb 28 09:24:30 core kernel: [777611.886194] [<ffffffff81449881>] process_backlog+0xa1/0x180 Feb 28 09:24:30 core kernel: [777611.886210] [<ffffffff8144a280>] net_rx_action+0x140/0x230 Feb 28 09:24:30 core kernel: [777611.886229] [<ffffffff810459be>] __do_softirq+0xae/0x1c0 Feb 28 09:24:30 core kernel: [777611.886245] [<ffffffff8154724c>] call_softirq+0x1c/0x30 Feb 28 09:24:30 core kernel: [777611.886259] [<ffffffff810044e5>] do_softirq+0x75/0xb0 Feb 28 09:24:30 core kernel: [777611.886276] [<ffffffff81045d55>] irq_exit+0xa5/0xb0 Feb 28 09:24:30 core kernel: [777611.886291] [<ffffffff8100413b>] do_IRQ+0x5b/0xd0 Feb 28 09:24:30 core kernel: [777611.886304] [<ffffffff8153e5ea>] common_interrupt+0x6a/0x6a Feb 28 09:24:30 core kernel: [777611.886319] [<ffffffff8153e1d9>] _raw_spin_unlock_irqrestore+0x9/0x30 Feb 28 09:24:30 core kernel: [777611.886329] [<ffffffff8111a1b8>] compaction_alloc+0x288/0x2b0 Feb 28 09:24:30 core kernel: [777611.886342] [<ffffffff81144d83>] migrate_pages+0xb3/0x500 Feb 28 09:24:30 core kernel: [777611.886352] [<ffffffff8111ad47>] compact_zone+0x1d7/0x320 Feb 28 09:24:30 core kernel: [777611.886364] [<ffffffff8111af3b>] __compact_pgdat+0xab/0x170 Feb 28 09:24:30 core kernel: [777611.886374] [<ffffffff8111b237>] compact_pgdat+0x27/0x30 Feb 28 09:24:30 core kernel: [777611.886384] [<ffffffff8110dc8c>] balance_pgdat+0x6cc/0x710 Feb 28 09:24:30 core kernel: [777611.886396] [<ffffffff8110de18>] kswapd+0x148/0x370 Feb 28 09:24:30 core kernel: [777611.886407] [<ffffffff8105ec55>] kthread+0x85/0x90 Feb 28 09:24:30 core kernel: [777611.886417] [<ffffffff81547154>] kernel_thread_helper+0x4/0x10 Feb 28 09:24:30 core kernel: [777611.886426] SLAB: Unable to allocate memory on node 0 (gfp=0x20) Feb 28 09:24:30 core kernel: [777611.886430] cache: size-65536, object size: 65536, order: 4 Feb 28 09:24:30 core kernel: [777611.886439] node 0: slabs: 15/19, objs: 15/19, free: 4 Feb 28 09:24:38 core kernel: [777620.441021] The following is only an harmless informational message. Feb 28 09:24:38 core kernel: [777620.441029] Unless you get a _continuous_flood_ of these messages it means Feb 28 09:24:38 core kernel: [777620.441033] everything is working fine. Allocations from irqs cannot be Feb 28 09:24:38 core kernel: [777620.441036] perfectly reliable and the kernel is designed to handle that. Feb 28 09:24:38 core kernel: [777620.441042] swapper/1: page allocation failure. order:4, mode:0x20 Feb 28 09:24:38 core kernel: [777620.441048] Pid: 0, comm: swapper/1 Tainted: P W O 3.4.28-2.20-default #1 Feb 28 09:24:38 core kernel: [777620.441053] Call Trace: Feb 28 09:24:38 core kernel: [777620.441076] [<ffffffff81004598>] dump_trace+0x78/0x2c0 Feb 28 09:24:38 core kernel: [777620.441089] [<ffffffff81532e6a>] dump_stack+0x69/0x6f ...
Atm, `slabtop` shows 99% usage for "ext4_inode_cache":
slabtop Active / Total Objects (% used) : 1275578 / 1419483 (89.9%) Active / Total Slabs (% used) : 187764 / 187786 (100.0%) Active / Total Caches (% used) : 109 / 232 (47.0%) Active / Total Size (% used) : 639586.50K / 660294.86K (96.9%) Minimum / Average / Maximum Object : 0.02K / 0.46K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 606252 601064 99% 0.85K 151563 4 606252K ext4_inode_cache 380397 283799 74% 0.10K 10281 37 41124K buffer_head 241600 231794 95% 0.19K 12080 20 48320K dentry 34986 34973 99% 0.54K 4998 7 19992K inode_cache 27435 12674 46% 0.06K 465 59 1860K size-64 21952 21115 96% 0.03K 196 112 784K size-32 18734 18690 99% 0.11K 551 34 2204K sysfs_dir_cache 16086 14359 89% 0.55K 2298 7 9192K radix_tree_node 13475 11758 87% 0.05K 175 77 700K jbd2_inode ...
This is with default values of:
cat /proc/sys/vm/vfs_cache_pressure 100 cat /proc/sys/vm/swappiness 60
Reading @ http://doc.opensuse.org/documentation/html/openSUSE/opensuse-tuning/cha.tuni..., and taking a random stab at changing
echo 200 > /proc/sys/vm/vfs_cache_pressure echo 25 > /proc/sys/vm/swappiness
Seems to reduce -- thought not completely eliminate -- the syslog errors. `slabtop` shows no change.
After dropping cache
sync && echo 3 > /proc/sys/vm/drop_caches
`slabtop` usage for "ext4-inode_cache" seems to have dropped to 74%, although other items are still ~99%:
slabtop Active / Total Objects (% used) : 130790 / 216154 (60.5%) Active / Total Slabs (% used) : 12482 / 12506 (99.8%) Active / Total Caches (% used) : 109 / 232 (47.0%) Active / Total Size (% used) : 40954.73K / 54620.83K (75.0%) Minimum / Average / Maximum Object : 0.02K / 0.25K / 4096.00K
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 60740 22518 37% 0.19K 3037 20 12148K dentry 23423 8006 34% 0.06K 397 59 1588K size-64 !! 21952 21119 96% 0.03K 196 112 784K size-32 !! 19397 19126 98% 0.54K 2771 7 11084K inode_cache !! 18734 18690 99% 0.11K 551 34 2204K sysfs_dir_cache 16687 4514 27% 0.10K 451 37 1804K buffer_head 4543 2173 47% 0.05K 59 77 236K anon_vma_chain 4200 3013 71% 0.12K 140 30 560K size-128 !! 4088 4072 99% 0.13K 146 28 584K ext4_groupinfo_4k !! 4084 4079 99% 1.00K 1021 4 4084K size-1024 3784 2956 78% 0.17K 172 22 688K vm_area_struct 3283 962 29% 0.55K 469 7 1876K radix_tree_node 2596 1484 57% 0.06K 44 59 176K anon_vma 2160 1798 83% 0.02K 15 144 60K dm_target_io 2116 1820 86% 0.04K 23 92 92K dm_io 2020 1687 83% 0.19K 101 20 404K size-192 1925 45 2% 0.05K 25 77 100K jbd2_inode 1845 1108 60% 0.25K 123 15 492K filp 1431 1344 93% 0.07K 27 53 108K Acpi-Operand --> 1408 1042 74% 0.85K 352 4 1408K ext4_inode_cache ...
With this config, post clear, for ~ 20 minutes, no more SLAB errors in syslog.
My understanding of the effect/beahvior of Memory Subsytem tuning is rudimentary at best.
Chatting in #opensuse-server, IIUC, it seems that it's not leaking, but not automatically freeing up enough memory.
(1) Are the SLAB errors I'm seeing indicative of a problem in the sysctl setting for swappiness & vfs_cache_pressure, particularly for server use? Or something else?
No. The problem you're seeing is due to memory fragmentation, not lack of memory. The allocation request is order-4, which means it needs 16 physically contiguous pages to be satisfied. In a system that has been running for a while, these allocations can be hard to fill. They're typically avoided entirely. In this case, the request must also be fulfilled without scheduling which limits the options significantly. You were able to work around the problem by adjusting vfs_cache_pressure because it essentially dropped all of the VFS cache (see ext4_inode_cache go from ~ 600MB to 1.4MB) which freed a bunch of pages, allowing the requests to go forward -- assuming they were still being made. I'm not a networking expert, but it looks as though the request was to fulfill a request to receive a large (32k+) frame.
(2) What, if any, specific/initial guidance or recommendation exists for these values for Opensuse server, not desktop, usage?
Well, you have other problems. You have the 'W' (WARN_ON), 'P' (Proprietary), and 'O' (out-of-tree) bits set. Something further up in the log has gone wrong and it's not part of your paste. We'll also want to see, as Greg said, if you can still reproduce it without the closed-source driver loaded. -Jeff -- Jeff Mahoney SUSE Labs