[Bug 213356] New: nvidia driver hard lockup on x86_64 smp system
https://bugzilla.novell.com/show_bug.cgi?id=213356 Summary: nvidia driver hard lockup on x86_64 smp system Product: openSUSE 10.2 Version: Alpha 5 plus Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: P5 - None Component: X11 3rd Party AssignedTo: sndirsch@novell.com ReportedBy: ro@novell.com QAContact: sndirsch@novell.com apart from some occasional lockups on my desktop, there is one 100% reproducible hard lock of the whole system (no sysrq, not even on serial console). run ww2d (from ww2d.org), a java/gl application somewhat similar to google-earth. click on the "E" button (for "export picture to file") and see the system lock up hard at the time where it would normally map the dialog window. attaching nvidia-bug-report.log from asking around I get the impression that the combination of 64bit and SMP still is a bit unstable here. Using TwinView or not does not seem to change the problem. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #1 from ro@novell.com 2006-10-18 09:46 MST ------- Created an attachment (id=101922) --> (https://bugzilla.novell.com/attachment.cgi?id=101922&action=view) output of nvidia-bug-report.sh -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |lfriedman@nvidia.com Status|NEW |NEEDINFO Info Provider| |ro@novell.com ------- Comment #2 from lfriedman@nvidia.com 2006-10-18 10:18 MST ------- Ruediger, I have a few questions: 0) Does this problem persist with the 1.0-9626 driver as well? http://www.nvidia.com/object/linux_display_amd64_1.0-9626.html 1) What (if anything) appears on your serial console when the system hangs? 2) If you set NvAGP to 0 in xorg.conf, does that have any impact on the problem? Thanks, Lonni -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #3 from ro@novell.com 2006-10-18 17:39 MST ------- hi .... 0) with the 9626 driver all I get is a black screen, the logfile ends with "Enabling TwinView" and then the X-server is unkillable (does not react to kill -9 and a "cat /proc/driver/nvidia/cards/0" hangs as well). 1) with the 8774 version, when the hang happens there is no output to the serial console at all (even with kernel-loglevel up to 7). (already tried different memory configurations and even a new power-supply and several bios releases to rule these possibilities out). 2) will try as soon as I get back to the machine in the morning -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #4 from lfriedman@nvidia.com 2006-10-18 17:41 MST ------- Can you generate and attach a bug report from the hang with 1.0-9626? thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #5 from ro@novell.com 2006-10-19 04:10 MST ------- ok, using "NvAGP" "0" as option in xorg.conf did not help. played with the bios settings in the agp area as well: options: AGPrate 4x/8x 4x ApertureSize 128M 256M .... FastWrite Enable Disable the AGPrate does not change anything, neither does changing the ApertureSize. Disabling FastWrite gives no video after BIOS-POST (LCD panels go to sleep mode before bootloader start). will try to generate bug report from 9626 driver next -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #6 from ro@novell.com 2006-10-19 04:20 MST ------- had to modify the script a bit, since both cat /proc/driver/nvidia/cards/0 cat /proc/driver/nvidia/agp/status will hang indefinitely, so I disabled these two. attaching logfile next. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #7 from ro@novell.com 2006-10-19 04:21 MST ------- Created an attachment (id=102018) --> (https://bugzilla.novell.com/attachment.cgi?id=102018&action=view) log output for 9626 driver (without cards/0 and agp/status) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|ro@novell.com | ------- Comment #8 from ro@novell.com 2006-10-19 04:23 MST ------- remove needinfo -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #9 from ro@novell.com 2006-10-19 05:06 MST ------- BTW: tried 9625, same result as 9626 -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #10 from ro@novell.com 2006-10-19 05:43 MST ------- ok, 9625 and 9626 are working now after applying the patch from from http://www.nvnews.net/vbulletin/showpost.php?p=996233&postcount=20 (eeprom is loaded here) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |ro@novell.com ------- Comment #11 from lfriedman@nvidia.com 2006-10-19 08:48 MST ------- Ruediger, Are you stating that the originally reported bug is no longer present, or just that you can start X successfully? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|ro@novell.com | ------- Comment #12 from ro@novell.com 2006-10-19 08:53 MST ------- unfortunately only the latter. I can still reproduce the original problem (with all mentioned driver versions) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |ro@novell.com ------- Comment #13 from lfriedman@nvidia.com 2006-10-20 12:01 MST ------- I'm encountering some significant problems getting ww2d to run. Its crashing with assorted Java exceptions, or occasionally, just crashing with no errors at all. I haven't yet been able to run it far enough to attempt reproduction of the reported bug. Did you need any special steps or packages to run ww2d? Also, you stated that "I get the impression that the combination of 64bit and SMP still is a bit unstable here". Does this imply that this hang with ww2d does not occur unless you're using a 64bit OS with an SMP kernel? thanks, Lonni -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #14 from ro@novell.com 2006-10-22 16:51 MST ------- that's the java version I'm running it with: java-1.5.0-sun-1.5.0_07 and I think I remember having to use the 32bit java version to run it successfully. for the second part: no, I have not done any tests in that direction yet, will do so. It's just from asking around that all colleagues with either 32bit hardware (UP or SMP) have not reported any stability problems and neither have the ones running 64bit UP machines. Stefan (cc'ed in this bug) said he had seen occasional lockups on 64bit smp but without a clean pattern to reproduce. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #15 from ro@novell.com 2006-10-23 04:18 MST ------- no way to boot the system in 64bit with "nosmp" or "maxcpus=0" (IRQ routing problem in UP mode), problem is still there when booting with "maxcpus=1" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #16 from ro@novell.com 2006-10-23 09:45 MST ------- installed the box with an x86 installation, problem reproduced, so this is definitely not 64bit specific. Current guess: a chipset problem, this one has a TYAN K8W mainboard, Stefan Fent's box is a K8W-E (the pci-express variant). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #17 from lfriedman@nvidia.com 2006-10-23 12:44 MST ------- I tracked down a Tyan S2885 motherboard with a GeForce 6800, and was able to get WW2D running. Using the default dataset (Earth map), I'm not seeing any instability when clicking the E button. Does this require a specific dataset to reproduce? Was this problem also present in SuSE-10.1 (or any other stable SuSE release)? thanks, Lonni -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #18 from ro@novell.com 2006-10-23 15:09 MST ------- yes, I used the default dataset and the problem was present also on 10.1. I'll try to reproduce on a different machine with the same mainboard tomorrow. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #19 from ro@novell.com 2006-10-24 07:20 MST ------- ok, tried on AJ's box (same board, identical bios version). WW2D runs without any problem there. I'm ordering a new mainboard now ... I think we can ignore this unless I can really reproduce that on another machine or with the next mainboard. Sorry for the noise. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|ro@novell.com | Resolution| |WORKSFORME ------- Comment #20 from ro@novell.com 2006-10-27 08:06 MST ------- closing until it happens again with another mainboard, new one is ordered. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mhopf@novell.com Status|RESOLVED |REOPENED Resolution|WORKSFORME | ------- Comment #21 from ro@novell.com 2006-12-14 08:05 MST ------- well ... it did. up to now: - changed mainboard (and the CPUs again) and the memory ... -> problem still happens - reproduced crash on AJ's machine (reger) when using a NV43 card - found out that all tests work ok when using NV35 or NV36 cards - reproduced crash in all variants: - 2 monitors hooked up via DVI (xinerama and mergedfb) - 2 monitors hooped up via VGA (DVI->analog plugs) - 1 monitor (VGA) in 1680x1050 and 1024x768 resolutions -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |NEEDINFO Info Provider| |ro@novell.com ------- Comment #22 from lfriedman@nvidia.com 2006-12-14 08:51 MST ------- I'm not sure that I understand what you're reporting. This problem is reproducing again on your system, but still doesn't reproduce on someone else's identical system? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |ASSIGNED Info Provider|ro@novell.com | ------- Comment #23 from ro@novell.com 2006-12-15 07:09 MST ------- no, ... sorry for writing confusing things. I can reproduce the crash on my system and on another one when using the NV43 card (GeForce 6800) (both K8W mainboards) It works perfectly (= there is no hangup or crash) on my system (and on two others) when using NV35 or NV36 graphics cards. The monitor setup does not matter at all, I've tried various configurations as stated in comment#21 Does this clarify the situation ? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 lfriedman@nvidia.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEEDINFO Info Provider| |ro@novell.com ------- Comment #24 from lfriedman@nvidia.com 2006-12-15 08:17 MST ------- I'm still confused. Previously you stated that this didn't reproduce on someone elses system. Were you not testing with the GeForce 6800 on the other person's identical system in the past? FWIW, I'm still not able to reproduce this problem here. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #25 from ro@novell.com 2006-12-17 16:21 MST ------- when I stated previously that I couldn't reproduce it on another system (in comment #19), that system had this card installed: "nVidia Corporation NV36 [GeForce FX 5700LE] (rev a1)" this time (when I reproduced the hang) I installed the NV43 card in that system. Now I've installed this card in my machine: "nVidia Corporation NV35GL [Quadro FX 3000] (rev a1)" and it runs without problems. My NV43 is in sndirsch's machine at the moment. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ------- Comment #26 from lfriedman@nvidia.com 2006-12-18 10:48 MST ------- Thanks for clarifying. Unfortunately, I'm still not able to reproduce this problem here using the same motherboard + GeForce 6800. If you can ship me a system which reproduces the problem, I can investigate further, but barring that, there isn't much else that I can do right now. If you're interested in shipping me the system, please email me directly, and I'll provide you with the address. thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356 ro@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|ro@novell.com | Resolution| |WORKSFORME ------- Comment #27 from ro@novell.com 2006-12-18 18:24 MST ------- well, I can't send that machine away (it's my main workstation at work ...) I'll try to find more things special to the setup as time permits. for now, I'll set this one back to "WORKSFORME" ... thanks for your help investigating this, sorry for not being able to describe enough machine details to make it reproducible, I'll play around with other K8W-based boxes if I get some into my hands ;) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=213356#c28 Stefan Fent <stefan.fent@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jplack@novell.com --- Comment #28 from Stefan Fent <stefan.fent@novell.com> 2007-07-17 08:54:23 MST --- I can reproduce this here with a K8WE board + nvidia 7300GS, but _not_ with a 6600 (nv43) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com