New subject: [Bug 1195311] Apparent memory leak in radeon (and amdgpu) or ttm

30 Jan 2022

http://bugzilla.opensuse.org/show_bug.cgi?id=1195311


            Bug ID: 1195311
           Summary: Apparent memory leak in radeon (and amdgpu) or ttm
    Classification: openSUSE
           Product: openSUSE Tumbleweed
           Version: Current
          Hardware: x86-64
                OS: openSUSE Tumbleweed
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: Kernel
          Assignee: kernel-bugs@opensuse.org
          Reporter: aaronpuchert@alice-dsl.net
        QA Contact: qa-bugs@suse.de
          Found By: ---
           Blocker: ---

Created attachment 855726
  --> http://bugzilla.opensuse.org/attachment.cgi?id=855726&action=edit
Last output of bcc-tools' memleak after process exit

Kernel: 5.16.2-1-default
CPU: AMD A10-5750M APU (Piledriver, Family 15h)
GPU: Builtin Radeon HD 8650G ("Richland", ARUBA, Northern Islands), used here.
     Discrete Radeon HD 8650M ("Sun Pro", HAINAN, Southern Islands), inactive.

With some rendering applications (such as openSUSE:Factory/xonotic) I'm
observing a massive increase of used memory that doesn't recover when closing
the application. I couldn't observe this with other 3D graphics games, but it
seems necessarily a kernel issue.

In fact diffing /proc/meminfo before and after gives me no idea where the
memory might be:

--- meminfo.before
+++ meminfo.after
@@ -1,44 +1,44 @@
 MemTotal:        7307764 kB
-MemFree:         5424968 kB
+MemFree:         1809372 kB
-MemAvailable:    6022788 kB
+MemAvailable:    2647392 kB
-Buffers:          145620 kB
+Buffers:          147420 kB
-Cached:           654312 kB
+Cached:           891696 kB
 SwapCached:            0 kB
-Active:           371420 kB
+Active:           415308 kB
-Inactive:        1031956 kB
+Inactive:        1261828 kB
 Active(anon):       1072 kB
-Inactive(anon):   625400 kB
+Inactive(anon):   659872 kB
-Active(file):     370348 kB
+Active(file):     414236 kB
-Inactive(file):   406556 kB
+Inactive(file):   601956 kB
 Unevictable:         132 kB
 Mlocked:             132 kB
 SwapTotal:       2097148 kB
 SwapFree:        2097148 kB
-Dirty:                44 kB
+Dirty:              1536 kB
 Writeback:             0 kB
-AnonPages:        592320 kB
+AnonPages:        592880 kB
-Mapped:           310056 kB
+Mapped:           310476 kB
-Shmem:             23028 kB
+Shmem:             22924 kB
-KReclaimable:      81092 kB
+KReclaimable:      82920 kB
-Slab:             150960 kB
+Slab:             154736 kB
-SReclaimable:      81092 kB
+SReclaimable:      82920 kB
-SUnreclaim:        69868 kB
+SUnreclaim:        71816 kB
-KernelStack:        5120 kB
+KernelStack:        5184 kB
-PageTables:        13176 kB
+PageTables:        13220 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:     5751028 kB
-Committed_AS:    2195056 kB
+Committed_AS:    2190264 kB
 VmallocTotal:   34359738367 kB
-VmallocUsed:       42340 kB
+VmallocUsed:       42356 kB
 VmallocChunk:          0 kB
 Percpu:             2800 kB
 HardwareCorrupted:     0 kB
-AnonHugePages:    249856 kB
+AnonHugePages:    202752 kB
 ShmemHugePages:        0 kB
 ShmemPmdMapped:        0 kB
-FileHugePages:         0 kB
+FileHugePages:      2048 kB
 FilePmdMapped:         0 kB
 CmaTotal:              0 kB
 CmaFree:               0 kB
@@ -48,6 +48,6 @@
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 Hugetlb:               0 kB
-DirectMap4k:      481020 kB
+DirectMap4k:     3833596 kB
-DirectMap2M:     6023168 kB
+DirectMap2M:     3719168 kB
-DirectMap1G:     1048576 kB
+DirectMap1G:           0 kB

Available memory is down 3.5G, but neither in kernel nor in user space is there
an increase that might justify it. I also diffed the output of "grep .
/proc/[0-9]*/statm" before and after, with the difference in resident set sizes
rather unremarkable. (In the order of a couple hundred pages in total.)

So after a reboot I ran /usr/share/bcc/tools/memleak from bcc-tools, and after
closing the game (and making sure the process is indeed no longer there) the
top 10 "leaks" end with this (the script just traces allocations that haven't
been freed, these are not necessarily leaks):

    3681759232 bytes in 4055 allocations from stack
        __alloc_pages+0x178 [kernel]
        __alloc_pages+0x178 [kernel]
        ttm_pool_alloc+0x24a [ttm]
        ttm_tt_populate+0x9f [ttm]
        ttm_bo_handle_move_mem+0x152 [ttm]
        ttm_bo_validate+0xc1 [ttm]
        ttm_bo_init_reserved+0x1d1 [ttm]
        ttm_bo_init+0x5a [ttm]
        radeon_bo_create+0x150 [radeon]
        radeon_gem_object_create+0xb0 [radeon]
        radeon_gem_create_ioctl+0x68 [radeon]
        drm_ioctl_kernel+0xb0 [drm]
        drm_ioctl+0x220 [drm]
        radeon_drm_ioctl+0x49 [radeon]
        __x64_sys_ioctl+0x82 [kernel]
        do_syscall_64+0x5c [kernel]
        entry_SYSCALL_64_after_hwframe+0x44 [kernel]

Line info shouldn't be necessary, at least I could easily follow the call stack
just from the function names. I'll attach the full top 10 for reference, but
the next stack only accounts for ~200M, so it's probably not that important.
(Also it seems to me like filling the page cache, which is probably
intentionally not freed.)

The title claims that this also affects amdgpu because we tried another machine
that has a (single) desktop GPU, an R9 270X (PITCAIRN, also Southern Islands)
being run with amdgpu via radeon.si_support=0 amdgpu.si_support=1. We observe
the same callstack with memleak except with radeon being replaced by amdgpu.
(Presumably they just copied that over.) So either the problem is common to
both drivers or somewhere else in the stack.

-- 
You are receiving this mail because:
You are the assignee for the bug.

    

[Bug 1195311] New: Apparent memory leak in radeon (and amdgpu) or ttm

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

bugzilla_noreply＠suse.com

tags

participants (1)