[Bug 215937] New: Inestability and 3D slownes in 10.1 related to driver NVIDIA-Linux-x86-1.0-8776-pkg1.run
https://bugzilla.novell.com/show_bug.cgi?id=215937 Summary: Inestability and 3D slownes in 10.1 related to driver NVIDIA-Linux-x86-1.0-8776-pkg1.run Product: SUSE Linux 10.1 Version: Final Platform: PC OS/Version: SuSE Linux 10.1 Status: NEW Severity: Major Priority: P5 - None Component: X11 3rd Party AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: robin.listas@telefonica.net QAContact: sndirsch@novell.com Several reports of slow 3D accelerated display using the propietary nvidia driver in SuSE 10.1 came out recently in the suse-linux-e@suse.com email list, on diferent cards. We have been talking with Matthias Hopf from SuSE; some of the testing I did was sugested by him, and I'm filling this report on his request. (This report is mostly what I wrote in http://lists.suse.com/archive/suse-linux-e/2006-Oct/3883.html). I have experienced what I though was a low frame rate using glxgears, but that is no longer so clear in my case. The wheels do turn very slow, though. And I have experienced jerkiness in 3D games like "planet penguin racer" and "FlightGear". But much more worrisome is that I could consistently crash X just by resizing columns in konqueror or opening zapping-0.10cvs6-1 (gnome TV app). X crashed, freezin keyboard and display, not the mouse, By sshing I saw "X" maxing at around 99% cpu. One of the crashes was hard and I lost the rpm database and gnome desktop configurations. I recovered it, but I can't know when I did the qt YOU update (announcement last Wed, 25 Oct 2006). I mention this because I think the results I had a few days back and those in the early hours today have changed somewhat. After two or three days testings (and crashings) I decided to reinstall SuSE 9.3 in another partition and try the same 1.0-8776 driver there. It worked perfect. No slownes noticiable, no jerkiness, no crashes. I will attach later my report. Next day (early hours today) I tried again with 10.1, with the idea of getting a verbose log (the type the NVidia folks want). I could not crash it as esily. Something had changed. Konqueror did not crash it. The game "planet penguin racer" experienced few jerks. Not consistent with what I had experienced three days back - I think the qt update came here. And it did not crash in an hour or two. But! As soon as I tried to start zapping-0.10cvs6-1 (which displayed a pink screen with audio, no video) very strange things happened: my home XFS partition closed straight away, other partitions unaffected. I had to reboot, no data lost, and I was able to run "nvidia-bug-report.sh" (file attached later) sucesfully. This is a log excerpt: /var/log/warn: Oct 27 01:03:53 nimrodel gconfd (cer-6078): Could not open saved state file '/home/cer/.gconfd/saved_state.tmp' for writing: Input/output error /var/log/kernel: Oct 27 01:03:27 nimrodel kernel: xfs_iunlink_remove: xfs_itobp() returned an error 990 on hdd8. Returning error. Oct 27 01:03:27 nimrodel kernel: xfs_inactive: xfs_ifree() returned an error = 990 on hdd8 Oct 27 01:03:27 nimrodel kernel: xfs_force_shutdown(hdd8,0x1) called from line 1762 of file fs/xfs/xfs_vnodeops.c. Return address = 0xf92d9bcb Oct 27 01:03:27 nimrodel kernel: Filesystem "hdd8": I/O Error Detected. Shutting down filesystem: hdd8 Oct 27 01:03:27 nimrodel kernel: Please umount the filesystem, and rectify the problem(s) Oct 27 01:09:36 nimrodel kernel: xfs_force_shutdown(hdd8,0x1) called from line 338 of file fs/xfs/xfs_rw.c. Return address = 0xf92d9bcb Oct 27 01:09:43 nimrodel kernel: audit(1161904183.600:7): audit_pid=0 old=4155 by auid=4294967295 Oct 27 01:09:45 nimrodel kernel: pnp: Device 00:0d disabled. Oct 27 01:09:45 nimrodel kernel: gameport: kgameportd exiting Oct 27 01:09:47 nimrodel kernel: device eth0 left promiscuous mode Oct 27 01:09:50 nimrodel kernel: Kernel logging (proc) stopped. Oct 27 01:09:50 nimrodel kernel: Kernel log daemon terminating. I rebooted and tried again. Thinking that the "zapping" I tried is a cvs version (though it works fine with the open nv driver), I tried with kdetv... and it crashed instantly, but diferently (with the same pink display, I think). The symptom was that the command prompt did not return in any xterm. I had also logged in externally in advance by ssh, as a safeguard to be able to run commands after the expected 'X' crash, but this session also crashed when I exited the "top" I had running there. I was able to run "nvidia-bug-report.sh" in an xterm I fortunately had running as root (file attached later). After rebooting, I saw in the kernel log 12 Ooops, right at the crash instant; I will attach the kernel log excerpts later. It is very suspicious and worrisome that two different TV apps can crash the system in such a way, affecting kernel space. I know that NVidia driver taint the kernel, but... 9.3 is not affected, therefore something has changed in our camp. See what you can do/think. Request more data if you need, but I'm a bit afraid of crashing the system so many times: one of these I will not come out unscathed. Hardware: Pentium IV @ 1800 Mhz circa 2001. 1 GiB ram. Three HD. Video card: NVidia GeForce2 MX/MX 400 I will attach logs in an hour or two. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #1 from robin.listas@telefonica.net 2006-10-27 15:32 MST ------- Created an attachment (id=102902) --> (https://bugzilla.novell.com/attachment.cgi?id=102902&action=view) Nvidia configuration data and checks done in 9.3, for comparison -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #2 from robin.listas@telefonica.net 2006-10-27 15:34 MST ------- Created an attachment (id=102903) --> (https://bugzilla.novell.com/attachment.cgi?id=102903&action=view) NVidia configuration data and some checks done in 10.1 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #3 from robin.listas@telefonica.net 2006-10-27 15:46 MST ------- Created an attachment (id=102905) --> (https://bugzilla.novell.com/attachment.cgi?id=102905&action=view) output log of nvidia-bug-report command right after first crash reported above This file pertains to data obtained by the script nvidia-bug-report.sh right after the crash initiated by "zapping". X session was started from a console in runlevel 3, with command: "startx gnome -- -logverbose 5", as per nvidia recommendations. The attached file contains the X logs - the important one should be "Xorg.0.log". It also contains output of command dmesg (search for "xfs_iunlink_remove" to find relevant section of the kernel messages related to the XFS failure). I keep both the "Xorg.0.log" and kernel logs separately, request them if you need. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #4 from robin.listas@telefonica.net 2006-10-27 15:54 MST ------- Created an attachment (id=102907) --> (https://bugzilla.novell.com/attachment.cgi?id=102907&action=view) output log of nvidia-bug-report command right after second crash reported above This file pertains to data obtained by the script nvidia-bug-report.sh right after the crash initiated by "kdetv". System was in runlevel 5 this time, running gdm. Session type was "gnome" As before, file contains the X logs (look for ")/var/log/Xorg.0.log) and output of dmesg (look for Oops). I also keep those files if you want them. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 mhopf@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|mhopf@novell.com |lfriedman@nvidia.com ------- Comment #5 from mhopf@novell.com 2006-10-30 10:34 MST ------- Lonnie, any ideas what this could result from, or what else to check? AFAICS all reports due to jerky opengl were from low-end cards (4mx and 5200 AFAIR). I'm still hoping the other guys post their findings here as well. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lfriedman@nvidia.com Status|NEW |NEEDINFO Info Provider| |robin.listas@telefonica.net ------- Comment #6 from lfriedman@nvidia.com 2006-10-30 15:27 MST ------- I've read through the referenced email thread, and I'm not sure that there are many, if any, others experiencing the same problems as discussed in this bug. The only person consistantly talking about these problems is the one who submitted this bug. The others seem to be talking about symptoms of problems which are similar but not quite the same. Anyway, there seem to be potentially two unrelated problems being reported here. One is a performance problem, and the other is a stability problem. It might be useful to split them into separate bugs. I've looked through the attached logs, and I'm not really seeing anything that suggests that there's an NVIDIA driver bug. What I am seeing are signs of a bttv driver bug (which would explain why the problem isn't present in older versions of SuSE, and also only triggered when the TV app(s) are in use). I did some Googling on this error from the bug report: bttv0: timeout: drop=11 irq=2481/2481, risc=16eda09c, bits: HSYNC OFLOW FDSR And turned up quite a few hits. These two links seem to match up nicely with the problems being described up here in this bug: http://lists.kde.org/?l=kwintv&m=107541549725874&w=2 http://www.nerdylorrin.net/wiki/Wiki.jsp?page=Bttv I'll also note that the backtrace in the Oops in the bug report doesn't go through any nvidia kernel module paths, so that further suggests that whatever is going wrong isn't involved with the nvidia driver. As for why this doesn't happen with the nv X driver, my best guess is that the nvidia driver forces the kernel to allocate resources differently than the 'nv' driver. It probably can't hurt to give the 1.0-9626 driver a try to see if things are any more stable http://www.nvidia.com/object/linux_display_ia32_1.0-9626.html -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 robin.listas@telefonica.net changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|robin.listas@telefonica.net | ------- Comment #7 from robin.listas@telefonica.net 2006-10-30 18:28 MST ------- First, thankyou all for your interest. Let me say that I am what was termed time ago as a "power user", I'm not an expert in the internals, except here and there, perhaps. I can only try to explain what I see. Let me clarify a few points from my long post. I said nothing about the perceived slownes in 10.1 3D, because at the time I tried looking for trouble, I did not find hard proof. It did seem to me slower three days before, though, and I mentioned I think qt was updated meanwhile (not sure because the rpm database was destroyed in one of the crashes). It might have affected. While it is true that the TV crashed the system inmediately, I also mentioned that in the initial tests (no logs) resizing columns in konqueror would crash the X server inmediately (with nvidia driver), freezing the display and keyboard (no led change) but not the mouse. I tried a second time with the same result, it was reproducible. But, when 3 days later I tried, after asking in the list and searching for advice, with all logs enabled, it did not happen. I suspect the update in between spoiled it. Sorry. As to the "HSYNC OFLOW FDSR" error, I just searched my logs and find it back in 2004 (8 occurences till 2006-04-01, and 73 from that point till today). My tests the other day in SuSE 9.3 also produced it. But I have never seen my system crash in a similar way that I can aparently relate to opening the TV programs. And the NVidia driver version x86-1.0-8776 in SuSE 9.3 works fine. As to the kernel Oops, there were 12, but the dmesg output only includes 4. As I can't know if there is interesting info for you in it or not, I will add it later. Time permitting, I will try the beta driver, 9626, and cross my fingers. I can't say that it is a bug in the driver, only that the bug, where ever it is, reveals itself in that case. But you will understand, that the crashes being so hard as to cause some loss of data, I have been reluctant to try a beta driver yet. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #8 from robin.listas@telefonica.net 2006-10-30 18:32 MST ------- Created an attachment (id=103121) --> (https://bugzilla.novell.com/attachment.cgi?id=103121&action=view) Full kernel log during the 2nd crash, containing the 12 Oops. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|lfriedman@nvidia.com |bnc-team-screening@forge.provo.novell.com Status|ASSIGNED |NEW ------- Comment #9 from lfriedman@nvidia.com 2006-10-31 12:36 MST ------- Thanks for clarifying alot of these points. I looked over the kernel log that you attached, and there's nothing in there pointing to the nvidia kernel module. I basically see the same Oops over and over again. I'm going to assign this back to Matthias at this point, since the problem looks to be somewhere in the kernel. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 sndirsch@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team- |mhopf@novell.com |screening@forge.provo.novell| |.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 mhopf@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Comment #10 from mhopf@novell.com 2006-11-14 10:40 MST ------- As much as I hate it I have to close this bug now as WORKSFORME, because I cannot reproduce it here, and I just *know* that our kernel developers will not even look at a tainted kernel. Sorry for the extra work, but I'm out of ideas, and the Oopses don't realy point to the graphics driver (maybe broken memory?) - though your findings (nv driver, older SUSE versions) are a somewhat strong indication. If there's additional information that points towards possible reproduction, feel free to reopen. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #11 from mhopf@novell.com 2006-11-22 11:30 MST ------- Something found out by John D Lamb:
I've found what was slowing nVidia for me: a file called .nvidia-settings-rc in my home directory. I deleted it, restarted X and glxgears now runs 20 times as fast as it did before.
Maybe this helps WRT the slowdown issue. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #12 from robin.listas@telefonica.net 2006-11-22 17:30 MST ------- I have been testing the new driver, NVIDIA-Linux-x86-1.0-9629-pkg1.run for two weeks now. The stability problem is solved, it hasn't crashed yet. I was wating for it to crash before reporting back ;-) So somebody did do something that solved the problem; thank him, who ever he was. As to the .nvidia-settings-rc file, yes, somebody told me that, too. The file is there, I haven't touched it yet. I did have an issue, but I haven't analized it yet. I'll just comment it here briefly. I had a power outage in this period. I swithched to a text console and suspended the system to disk before the battery gave out. On restore, it came back fine, but as soon as I pressed ctrl-alt-f7, the keyboard and display frozed completely. I was able to log in from another PC, the X process was at 95% CPU. I killed it, and my external ssh session died. I had to power off to recover. I do have the nvidia-bug-report file saved of that crash; if someone is interested, let me know. I heard somewhere that I have to remove certain kernel module before suspending, I have to investigate that. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=215937 ------- Comment #13 from mhopf@novell.com 2006-11-23 04:35 MST ------- This is very good news! If you want to report the crash after suspend, please create another bug report. But AFAIK suspend with binary drivers is not supported in our products, so if it only occurred once, I wouldn't bother with it. Except if Lonni is interested, of course! -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com