Bug ID 1117095
Summary vc4: Failed to allocate from CMA, graphics freezes
Classification openSUSE
Product openSUSE Tumbleweed
Version Current
Hardware aarch64
OS openSUSE Factory
Status NEW
Severity Normal
Priority P5 - None
Component Kernel
Assignee kernel-maintainers@forge.provo.novell.com
Reporter jimc@math.ucla.edu
QA Contact qa-bugs@suse.de
Found By ---
Blocker ---

On a Raspberry Pi 3B (not plus) with OpenSuSE Tumbleweed 
openSUSE-release-20181101-934.1.aarch64 and 
kernel-default-4.18.15-1.2.aarch64.  Kernel command line (/proc/cmdline):
    loglevel=3 splash=silent plymouth.enable=0 swiotlb=512 cma=300M 
    console=ttyS1,115200n8 console=tty resume=/dev/mmcblk0p3
/boot/efi/config.txt (minus comments):
    dtoverlay=upstream   +upstream-mmc  +upstreame-aux-interrupt
    include ubootconfig.txt
    include extraconfig.txt
        dtoverlay=vc4-kms-v3d  (similar symptom with vc4-fkms-v3d)
/etc/X11/xorg.conf.d/20-kms.conf says:
Section "Device"
    Identifier "kms gfx"
    Driver "modesetting"
    #Option "AccelMethod" "none" [Commented out]
/var/log/Xorg.0.log says:
    modeset(0): [DRI2]   DRI driver: vc4
    AIGLX: Loaded and initialized vc4
    GLX: Initialized DRI2 GL provider for screen 0

In this configuration, glmark2-0.0+git.20180608-1.1.aarch64
runs without freezing or crashing and gets an overall score of 74, 
whereas with software rendering the score is 17, so GPU acceleration is
really happening.  

>From the LightDM greeter I log in and start the default XFCE desktop. 
I start various programs and eventually get the symptom complained about;
in the simplest case I start one xterm, one xload -update 2 (secs), and
xscreensaver-5.37-4.3.aarch64 is active, blanking the screen only, DPMS 
off after 20 min.  I let it incubate overnight.  

At the start, CmaTotal (from /proc/meminfo) is 307200kB and CmaFree 
is 206856 kB; CmaFree went up gradually to 241684 kB by the time the
screensaver shut off video (DPMS).  

After 5 hours CmaFree was static at 234252 kB.  With no change in CmaFree
this message appeared in syslog:
Nov 22 01:15:25 orion kernel: [34890.524661] [drm:vc4_bo_create [vc4]] 
    *ERROR* Failed to allocate from CMA:
Nov 22 01:15:25 orion kernel: [34890.524683] [drm]     kernel:  8100kb BOs (1)
Nov 22 01:15:25 orion kernel: [34890.524691] [drm]        V3D: 26904kb BOs
Nov 22 01:15:25 orion kernel: [34890.524699] [drm] V3D shader:   272kb BOs (65)
Nov 22 01:15:25 orion kernel: [34890.524706] [drm]       dumb:    48kb BOs (3)
Nov 22 01:15:25 orion kernel: [34890.524713] [drm]     binner: 16384kb BOs (1)
Nov 22 01:15:25 orion kernel: [34890.524721] [drm] total purged BO: 8kb BOs (2)
Nov 22 01:15:25 orion kernel: [34890.524741] vc4_v3d 3fc00000.v3d: Failed to 
    allocate memory for tile binning: -12. You may need to enable CMA or 
    give it more memory.

In other tests this message appears at the same time that graphics freezes.

When I woke up the screensaver, video came on, but the screen was black,
except the cursor was visible, confined within the screensaver's 
authentication box.  In other tests the screen content at the time of
freezing remains unchanging, but the cursor changes shape according to
what it's over, including not changing shape if the program (e.g. xterm)
owning the window was killed.  Keystrokes directed to an xterm are 
received and executed (with no visible effect on the screen), e.g.
"echo Test File > /tmp/testfile", and the file appears.  I can do 
"DISPLAY=:0 XAUTHORITY=/run/lightdm/root/:0 xwd -root > image.xwd" 
and the image will be complete and will show the current windows, not 
those at the time of freezing.  

The same symptoms can be elicited quicker if I run Firefox or Chromium. 
Heavy work in the browser did not seem to make the failure happen earlier;
the 2 tests (one after the other) were to scroll quickly through 1.16Mb of 
text/html (no Javascript nor images), then 221 JPEG images in simple 
HTML pages.  The freeze typically happens when I am doing nothing on the
RPi, writing up notes on another machine.  With either web browser, but
not in the simple test case, CmaFree declined in non-reproducible 
patterns until the freeze occurred, and continued to decline to near
zero (like 3000kB).  I believe that this "death spiral" behavior is
consequential damage from something freezing up, not the actual cause of
the freeze.  

This is a known bug, though the exact symptoms seem to change with small
variations in the test conditions, and with one or another kernel commit
being excluded.  
https://github.com/raspberrypi/linux/issues/2680 (2018-09-12, OP cbxbiker61)
He reports it began for him with approx. kernel 4.14.62 and someone else
reports that it's still there in 4.18.11.  Jimc sees it in 4.18.15 .  
Other forum and bug posters in various distros (Arch, Red Hat) report 
various similar-sounding problems, starting around 2018-09-xx.  

Could the SuSE distro managers please identify a combination of commits
that gives the best results in the OpenSuSE context and push out that
kernel, and keep an eye on progress in finding and killing the actual 
bug that is causing these freezeups?  Thank you.

I'm going to try to do the same thing, and I'll report back if I succeed,
not a sure thing given my limited skills with git.

You are receiving this mail because: