[opensuse-kernel] kmalloc-2048 memleak in 4.19rc kernels
Hi, I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime. Now, after almost 10 minutes uptime, it has already grown to 4MB: 2166 2166 100% 2.00K 1083 2 4332K kmalloc-2048 How can I debug this? The usage started to increase immediately after boot into an idle XFCE desktop, running a small X11 application (gkrellm system monitor) via SSH on a remote machine After disabling networking in NetworkManager, kmalloc-2048 usage went down (from about ~2500 objects to 2388 active objects right now), now on a stable, lightly fluctuating level. This is a Lenovo Thinkpad T420, Core i5-2520M CPU, 8GB RAM, Intel Centrino Advanced-N 6205 [Taylor Peak] WIFI module, Ethernet unused. Once I know how to find out more details, I'm ready to report this upstream ;-) Thanks, Stefan -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 9/28/18 2:52 PM, Stefan Seyfried wrote:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime.
Now, after almost 10 minutes uptime, it has already grown to 4MB: 2166 2166 100% 2.00K 1083 2 4332K kmalloc-2048
How can I debug this? The usage started to increase immediately after boot into an idle XFCE desktop, running a small X11 application (gkrellm system monitor) via SSH on a remote machine
After disabling networking in NetworkManager, kmalloc-2048 usage went down (from about ~2500 objects to 2388 active objects right now), now on a stable, lightly fluctuating level.
This is a Lenovo Thinkpad T420, Core i5-2520M CPU, 8GB RAM, Intel Centrino Advanced-N 6205 [Taylor Peak] WIFI module, Ethernet unused.
Once I know how to find out more details, I'm ready to report this upstream ;-)
The easiest way to find out what is causing this memory leak is to configure a kernel with KMEMLEAK enabled. The critical configuration parameters are CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=2000 If you get a log entry that kmemleak has been aborted when the kernel is started, then you may need to increase the early log size parameter. Once the system is running, do the following as root: echo scan > /sys/kernel/debug/kmemleak cat /sys/kernel/debug/kmemleak The resulting list will show you what component has leaked, and the address where it occurred. You can then use gdb to pinpoint the source line in the code. Larry -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 9/28/18 9:52 PM, Stefan Seyfried wrote:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime.
Now, after almost 10 minutes uptime, it has already grown to 4MB: 2166 2166 100% 2.00K 1083 2 4332K kmalloc-2048
How can I debug this? The usage started to increase immediately after boot into an idle XFCE desktop, running a small X11 application (gkrellm system monitor) via SSH on a remote machine
Tracing should help here, without recompiling the kernel, install trace-cmd and run: trace-cmd record -T -e kmalloc -f bytes_alloc==2048 And after a while (while kmalloc-2048 grows), stop it and check the output of trace-cmd report. The leaking allocations should dominate, although there will also be other non-leaking ones. Maybe the process names will be also related to NetworkManager. If in doubt, send the produced trace.dat
After disabling networking in NetworkManager, kmalloc-2048 usage went down (from about ~2500 objects to 2388 active objects right now), now on a stable, lightly fluctuating level.
This is a Lenovo Thinkpad T420, Core i5-2520M CPU, 8GB RAM, Intel Centrino Advanced-N 6205 [Taylor Peak] WIFI module, Ethernet unused.
Once I know how to find out more details, I'm ready to report this upstream ;-)
Thanks,
Stefan
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 9/29/18 7:53 AM, Vlastimil Babka wrote:
On 9/28/18 9:52 PM, Stefan Seyfried wrote:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime.
Now, after almost 10 minutes uptime, it has already grown to 4MB: 2166 2166 100% 2.00K 1083 2 4332K kmalloc-2048
How can I debug this? The usage started to increase immediately after boot into an idle XFCE desktop, running a small X11 application (gkrellm system monitor) via SSH on a remote machine
Tracing should help here, without recompiling the kernel, install trace-cmd and run:
trace-cmd record -T -e kmalloc -f bytes_alloc==2048
And after a while (while kmalloc-2048 grows), stop it and check the output of trace-cmd report.
Actually replace "trace-cmd report" with "trace-cmd hist" which nicely groups the events by processes and allocation sites and makes it obvious which were the frequent ones. -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Hi Vlastimil, Am 29.09.18 um 08:04 schrieb Vlastimil Babka:
On 9/29/18 7:53 AM, Vlastimil Babka wrote:
Tracing should help here, without recompiling the kernel, install trace-cmd and run:
trace-cmd record -T -e kmalloc -f bytes_alloc==2048
And after a while (while kmalloc-2048 grows), stop it and check the output of trace-cmd report.
Actually replace "trace-cmd report" with "trace-cmd hist" which nicely groups the events by processes and allocation sites and makes it obvious which were the frequent ones.
Thanks, this was really useful (why didn't I know about trace-cmd before? ;-) strolchi:~ # timeout -s 2 30 trace-cmd record -T -e kmalloc -f bytes_alloc==2048 Hit Ctrl^C to stop recording CPU0 data recorded at offset=0x5e6000 24576 bytes in size CPU1 data recorded at offset=0x5ec000 0 bytes in size CPU2 data recorded at offset=0x5ec000 4096 bytes in size CPU3 data recorded at offset=0x5ed000 0 bytes in size And I think I have a winner: %92.90 (1950) gkrellm kmalloc #144 | --- *kmalloc* | |--%100.00-- kmem_cache_alloc_trace # 144 | |--%60.42-- vmstat_start # 87 | seq_read | proc_reg_read | __vfs_read | vfs_read | ksys_read | do_syscall_64 | entry_SYSCALL_64_after_hwframe | |--%39.58-- cfg80211_sinfo_alloc_tid_stats # 57 sta_set_sinfo ieee80211_get_station | |--%50.88-- cfg80211_wireless_stats # 29 | wireless_dev_seq_show | seq_read | proc_reg_read | __vfs_read | vfs_read | ksys_read | do_syscall_64 | entry_SYSCALL_64_after_hwframe | |--%49.12-- cfg80211_wext_giwrate # 28 ioctl_standard_call wext_handle_ioctl sock_ioctl do_vfs_ioctl ksys_ioctl __x64_sys_ioctl do_syscall_64 entry_SYSCALL_64_after_hwframe At least /proc/net/wireless (used by gkrellm-wifi plugin) is leaking: seife@strolchi:~> sudo grep ^kmalloc-2048 /proc/slabinfo ;\ for i in `seq 1 10000`; do cat /proc/net/wireless > /dev/null ; done;\ sudo grep ^kmalloc-2048 /proc/slabinfo kmalloc-2048 98046 98046 2048 2 1 : tunables 24 12 8 : slabdata 49023 49023 0 kmalloc-2048 108002 108002 2048 2 1 : tunables 24 12 8 : slabdata 54001 54001 0 The same with 1000 "iwconfig" runs: kmalloc-2048 112020 112020 2048 2 1 : tunables 24 12 8 : slabdata 56010 56010 0 kmalloc-2048 114012 114012 2048 2 1 : tunables 24 12 8 : slabdata 57006 57006 0 I'll report that upstream. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am 29.09.18 um 09:17 schrieb Stefan Seyfried:
I'll report that upstream.
Done. I also found small self-contained reproducers: http://paste.opensuse.org/75377254 (C code) or even easier: #!/bin/bash for ((i=0; i<10000; i++)); do while read line; do :; done < /proc/net/wireless done => %99.28 (15289) leak.sh kmalloc #10001 | --- *kmalloc* kmem_cache_alloc_trace cfg80211_sinfo_alloc_tid_stats sta_set_sinfo ieee80211_get_station cfg80211_wireless_stats wireless_dev_seq_show seq_read proc_reg_read __vfs_read vfs_read ksys_read do_syscall_64 entry_SYSCALL_64_after_hwframe -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Am 28.09.18 um 21:52 schrieb Stefan Seyfried:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime. Found, reported upstream and posted a proposed fix. Leap (kernel 4.12) is not affected,
Patch attached for reference, but I think it (or a similar fix) should appear in mainline soon, hopefully before 4.19 is final. -- Stefan Seyfried "For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman
On 9/30/18 1:01 PM, Stefan Seyfried wrote:
Am 28.09.18 um 21:52 schrieb Stefan Seyfried:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime. Found, reported upstream and posted a proposed fix. Leap (kernel 4.12) is not affected,
Patch attached for reference, but I think it (or a similar fix) should appear in mainline soon, hopefully before 4.19 is final.
Great, thanks! Glad the ftrace suggestion helped. Vlastimil -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Hi, JFTR, the fix has been incorporated in 4.19-rc9, everything is fine again, tumbleweed will never see this bug (except people like me who always run Kernel:HEAD kernels :-), but finding such bugs is actually the reason I do that). Thanks again, Am 28.09.18 um 21:52 schrieb Stefan Seyfried:
Hi,
I think I'm seeing a continuously rising usage of kmalloc-2048 slabs in /proc/slabinfo and slabtop. Before I rebooted from 4.19rc3 to 4.19rc5, I had about 4.5GB of kmalloc-2048 after about 13d8h uptime.-- Stefan Seyfried
"For a successful technology, reality must take precedence over public relations, for nature cannot be fooled." -- Richard Feynman -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On 10/10/18 8:02 AM, Stefan Seyfried wrote:
Hi,
JFTR, the fix has been incorporated in 4.19-rc9, everything is fine again, tumbleweed will never see this bug (except people like me who always run Kernel:HEAD kernels :-), but finding such bugs is actually the reason I do that).
Very good :) Thanks! Vlastimil -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
participants (3)
-
Larry Finger
-
Stefan Seyfried
-
Vlastimil Babka