[Bug 1112963] New: AST Video Kernel Mode Driver is broken
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963 Bug ID: 1112963 Summary: AST Video Kernel Mode Driver is broken Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.0 Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: evanevery@digitalintelligence.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Created attachment 786905 --> http://bugzilla.opensuse.org/attachment.cgi?id=786905&action=edit Xorg.0.log file (default post-install) It appears that the "ast" KMS driver is broken in LEAP15. It will not start under any circumstances. Details of support thread can be found here: https://forums.opensuse.org/showthread.php/533522-AST2500-Video-Driver-faili... This fails on multiple Dual-Xeon Server Motherboards with onboard ast2500 video from major manufacturers (SuperMicro X11DPH-T and Asus WS C621 Sage) The KMS driver is being loaded but is not being used: [ 50.733] (EE) ast: The PCI device 0x2000 at 02@00:00:0 has a kernel module claiming it. [ 50.733] (EE) ast: This driver cannot operate until it has been unloaded. Full Xorg.0.log attached (for default start up) I have lots of other Xorg.0.log files for various manipulations (nomodeset, manual conf settings, etc) in an attempt to getting this working. (Please contact me directly if necessary). Support community is pretty sure this is a kernel mode driver problem and suggests this bug be posted (see forum link above) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
Felix Miata
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c1
--- Comment #1 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c11
Felix Miata
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c28
--- Comment #28 from Thomas Zimmermann
Thomas, modeset depends on glamor for OpenGL acceleration, correct? What makes glamor initialization fail (comment #1)?
It's the other way round. Glamor leverages OpenGL acceleration for its own rendering. [1] Unfortunately, acceleration is not supported by the kernel's AST driver. It just provides graphics buffers to copy data in and out. So in any case, this will be slow to repaint. As Michal said, 2 seconds for a redraw appears somewhat broken though. [1] https://www.freedesktop.org/wiki/Software/Glamor/ -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c30
--- Comment #30 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c31
--- Comment #31 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c32
--- Comment #32 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c33
--- Comment #33 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c34
Michal Srb
Here are some records in the latest Xorg.0.log file which might explain the relatively poor X11 performance. Please note we are still seeing an error posted by the ast driver, glamor is not initializing, and we are apparently using software rendering.
All of those log lines and software rendering are expected on Aspeed GPU. I tried to explain it in comment 26. You can not get anything better with this GPU, but it should not be slow. You CPU is quite high-end Xeon and you are rendering just a single 1280x1024 output. I can get smooth software rendering at the same resolution in a virtual machine with just one core of a weaker CPU... How is your CPU utilization during the rendering? I expect it to be low. You are not the first person with this problem. So far it seems that it started happening after some kernel update, we are still trying to pinpoint which one. Could you try if kernel-of-the-day makes any difference for you? https://en.opensuse.org/openSUSE:Kernel_of_the_day -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c35
--- Comment #35 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c36
--- Comment #36 from Thomas Zimmermann
I'm also now getting an error message "failed to start setup virtual console" which is displayed just before the X login. "systemctl status systemd-vconsole-setup.service" reports the following:
● systemd-vconsole-setup.service - Setup Virtual Console Loaded: loaded (/usr/lib/systemd/system/systemd-vconsole-setup.service; static; vendor preset: disabled) Active: inactive (dead) since Thu 2018-10-25 09:12:27 CDT; 1min 34s ago Docs: man:systemd-vconsole-setup.service(8) man:vconsole.conf(5) Main PID: 1270 (code=exited, status=0/SUCCESS)
Oct 25 09:12:26 linux-x29a systemd[1]: Starting Setup Virtual Console... Oct 25 09:12:27 linux-x29a systemd-vconsole-setup[1270]: KD_FONT_OP_SET failed, fonts will not be copied to tty7: Invalid argument Oct 25 09:12:27 linux-x29a systemd[1]: Started Setup Virtual Console.
I had this problem this morning and 'fixed' it by updating via sudo zypper dup That upgraded several core Gnome packages and the login worked again. There was a mouse pointer shown on the screen, so X must have been running. Something in the desktop code or login manager was probably broken. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c70
Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c71
--- Comment #71 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c72
Jean Delvare
None of the Dual Xeon server boards we have tested (Supermicro and Asus) have built in GPU's. They all seem to be using the AST chipset for integrated video.
I meant that future systems would have the GPU integrated in the CPU, not the server board. Anyway... I think I finally managed to reproduce the performance regression bug on my old test system. I am currently installing all the kernel development tools on it and copying the SLE-12-SP3 git kernel tree to it, and then I will start a bisection. Don't hold your breath, it will take a lot of time. I'll post my findings (hopefully) once the bisection is over. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c73
--- Comment #73 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c74
--- Comment #74 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c75
--- Comment #75 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c76
--- Comment #76 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c77
--- Comment #77 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c78
--- Comment #78 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c79
--- Comment #79 from Jean Delvare
Note that this slow refresh problem could be seen on any screen change (...)
Please define "screen change". Do yo mean switching between X.org and a text console, or something else? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c80
--- Comment #80 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c81
--- Comment #81 from Jean Delvare
"Screen Change": when a major portion of the screen needs to be repainted or refreshed. Generally under X11. As specifically identified, easily seen when the initial login screen is "painted".
That one I can't see, not even with the Leap 15.0 kernel. May be specific to the AST2500 chipset, or depend on the Leap 15.0 user-space.
Also saw really slow text mode screen updates (as when listing a long file) outside of X11 (runlevel 3), but that is not what I've been using as my "benchmark".
That's the one I see. For completeness, could you please boot Takashi's test kernel from comment #52 and confirm that it at least solves the console (non-X11) slowness? This would confirm that we are chasing 2 different bugs. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c82
--- Comment #82 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c83
--- Comment #83 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c84
--- Comment #84 from Jean Delvare
OK, Here is what the video corruption looks like (attached).
That I can reproduce. It seems to be a resolution disagreement. By default I get a 1024x768 resolution in Plasma, which looks ugly. The best my AST2050 chipset can do is 1600x900, which is not screen-native, but at least matches the 16:9 ratio of my screen, so I used "xrandr -s 1600x900" to get that. Some time after, I called "xrandr" without argument, and the corruption started immediately. The output of xrandr at that moment was different from what it is usually (different list of resolutions and refresh frequencies). Note that SDDM does use the best resolution by default: 1600x900. And for example IceWM sticks to that. So I think the problem happens when Plasma tries to change the resolution when the user logs in, to something different from what SDDM had. When the corruption happens, I am able to get things back to a correct state by switching to a text console (Ctrl+Alt+F2) and then back to X.org (Ctrl+Alt+F7). Note that I can also get the "wrong" xrandr output, but without the corruption, under IceWM. So it is not Plasma-specific. The problem is that 1600x900 is not in the wrong list, so once it happens, I'm stuck with lower resolutions. I'll do some more tests and post my findings here. At any rate this seems to be a separate issue from the performance issue(s) reported before. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c88
--- Comment #88 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c90
--- Comment #90 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c91
--- Comment #91 from Jean Delvare
Just to clarify, WHICH problem is this supposed to resolve: 1) Slow Refresh or 2) Corrupted Video?
The slow refresh. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c92
--- Comment #92 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c93
--- Comment #93 from Thomas Zimmermann
Is there anything else you would like me to test while I have this environment still in place?
Not from my side. I'll make sure that the patches land in SUSE and in the upstream kernel soon. Just one more question: what happened to the screen corruption? Do you still see it with the patch? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c94
--- Comment #94 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c95
--- Comment #95 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c97
--- Comment #97 from Thomas Zimmermann
I have not tested for the screen corruption. Unfortunately it can not be generated on demand. Its a matter of just working in the environment long enough to see if it occurs. It can take a couple of hours or it can take a couple of days. And, of course, there is no way to ensure it has been tested "long enough" if we can't generate the problem on demand.
ASPEED believes it has something to do with switching resolutions (although not by user request). It just happens all at once for no apparent reason during some otherwise innocuous operation (like opening an xterm or something). ASPEED has even given me a new driver to integrate in the kernel and test, but I have had to fallback to getting more productive things done and not have any time to test it.
Ultimately, the screen corruption does appear to be related somewhat to the WM and DM in use. In our production image we are forced to switch away from sddm/plasma as we can't run remote xclients (on Windows (Xming) or other Suse platforms) due to xdmcp not working properly. We fought this issue long and hard both with Novell and the Xming developer and never did get it resolved. Lots of finger pointing and blaming of changes to gnome. Back then it was issues when running gdm/gnome and right now I can confirm remote X11 issues with sddm/plasma. ANYWAY, lightdm/lxde continues to meet our user interface requirements and seems to work fine with remote X11 clients so we use that configuration. Once we switched away from sddm/plasma to lightdm/lxde we have not seen the video corruption issue appear again.
This is not to say there is a problem with sddm/plasma, just that particular environment may do things in a manner which precipitates the issue...
Although I haven't had time to test the updated driver offered by ASPEED, I would happily put you in touch with the developer if you want to "refresh" the driver in the kernel... Note that this is newer than the latest driver posted on their website. I believe it is a beta driver built specifically to try and resolve this issue. As they sent it to me in a pwd protected archive I will not pass it on, but I expect they will work directly with you if desired. Please let me know!
I don't think I can easily help you with this, as I don't even have a manual for the AST hardware. It sounds like the problem could be anywhere. If you like, we can use this bug report to keep track of the problem. (In reply to Edward Van Every from comment #95)
...also, can someone please be sure to post a note to this thread when the fix to the kernel goes live? Thanks!
No problem. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c98
--- Comment #98 from Edward Van Every
I tried the test kernel and the scrolling performance is as same as adding "video=vesafb:mtrr:3" in boot option Tested-by: Y.C. Chen(yc_chen@aspeedtech.com)
YC, can you provide the kernel devs an updated video driver (for corruption issue) so it may be worked into the next release? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c99
--- Comment #99 from YC Chen
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c100
--- Comment #100 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c101
Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c102
--- Comment #102 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c103
--- Comment #103 from YC Chen
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c104
--- Comment #104 from Jean Delvare
(This is a custom kernel, right?)
I don't know what you call a "customer kernel". This is the latest (not yet released) openSUSE Leap 15.0 kernel with 2 ast driver patches on top. No config change. If you are worried that it contains too many differences compared to the latest released kernel, then I suppose I could create a different kernel package based on the latest released openSUSE Leap 15.0 kernel instead. For testing purposes, I did not think that it would matter much, as the other changes will be included in the next kernel anyway.
Since our customer environment appears stable now (using the "video=vesafb:mtrr:3" parameter), I don't want to be pushing out forked kernels unless absolutely necessary.
I'm puzzled. In comments #66 and #67 you stated that option "video=vesafb:mtrr:3" was not safe to use. Now you think it is better to run with this unsafe option than to run a kernel which contains what is believed to be the proper fix to the problem? Also I have no idea what you mean with "forked kernel".
The video option has bought us some time to wait for an updated "published" kernel in this case I think. (via online software update)
Anyone know how long that might take?
It will never happen if you don't test it first. You are the only user reporting this problem. The fix is not even upstream. My plan was to let you test the fix, and if it works for you, ping upstream for inclusion. From there, the fix can be backported to openSUSE Leap 15.0 and other products. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c105
--- Comment #105 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c106
--- Comment #106 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c107
--- Comment #107 from Jean Delvare
You did see the YC Chen (Aspeedtech) already tested 4.12.14-lp150.170 with positive feedback right? (Comment #103). Not sure why you think I'm "the only user" reporting this problem or otherwise able to test associated kernels... Based on YC Chen's feedback, I would have expected yould be moving forward with the associated updates.
Yes, I've seen it, and thank you very much YC Chen for taking the time to test and report. However, Chen is not "a user", he is a developer and the author of the candidate fix. He presumably tested his fix already before sending it upstream 1.5 month ago. As such this is not really new information. If I can tell upstream that I have a real user affected by the problem who tested the fix successfully, this will have much more weight to convince upstream to finally accept Chen's patch. I can see that several other ast driver fixes have been ignored as well, which is really sad. I'll test and review what I can and ping upstream. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c108
--- Comment #108 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c110
Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c111
--- Comment #111 from Thomas Zimmermann
That's great, I'll backport the fix to SLE15 (from where it will be automatically merges into Leap 15.0) and also add the upstream reference to the other patch.
It actually is in SLE11-SP4, SLE12-SP3, SLE15 and SLE15-SP1. I'm waiting for these kernel updates to be released. I don't know if our tools update these internal patches automatically with references to upstream. You're welcome to help, otherwise I'll fix-up the internal patches at some point later. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c112
--- Comment #112 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c113
--- Comment #113 from Thomas Zimmermann
Sorry for the confusion. Yes, your fix is in these trees already, however the other fix (from YC Chen in comment #99) was also accepted upstream and should be backported too. That's what I'm working on at the moment.
Oh, right. I saw it on the ML. Sorry for the noise here. :D
And no, our tools can't match new upstream commits with patches in our trees. That has to be done manually, and I'm working on that too.
I see. I'll keep an eye on this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c114
--- Comment #114 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
Stefan Dirsch
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c117
--- Comment #117 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c118
--- Comment #118 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c119
--- Comment #119 from Jean Delvare
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c120
--- Comment #120 from Edward Van Every
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963
http://bugzilla.opensuse.org/show_bug.cgi?id=1112963#c135
Jean Delvare
participants (1)
-
bugzilla_noreply@novell.com