[opensuse] display fault on Dell laptop, nVidia driver
My main machine's on the blink (quite literally, sometimes). Probably not an openSUSE-specific problem, but can't be sure. In the worst case, it could be a hardware fault brought about by a rogue driver, but I'm probably just being paranoid there. Can anybody recognise the following symptoms and tell me what's likely at fault? (openSUSE 13.1, 64-bit. KDE 4.11) I have a Dell Latitude D630 laptop, usually hooked up to a docking station, at which times I use a TV/monitor on the dock's VGA port, running at the same scaled down resolution to match the laptop display. It has an nVidia Quadro NVS 135M for graphics, and I've always used the nVidia proprietary driver with no problems. I keep the nVidia driver updated via the official repo, but in the last couple of months or so, I've noticed increasing bouts of flickering (momentary glitches), something that never used to occur. Now in the last few days, I'm getting major graphics issues, whereby the screen will suddenly become corrupted and the machine locks up. My only escape is a reboot with the Magic SysRq keys. At these times the display often shows vertical lines which cycle through some colour changes for a few seconds (always openSUSE greens for some reason) before coming to a rest. I've taken a photo, here: http://susepaste.org/74388053 This corruption sometimes persists on a cold/warm reboot, leading me to believe it's faulty hardware, but other times I might typically reach the KDE login screen, and the moment I hit a key it will go corrupted. To try and diagnose some more, I removed the laptop from the dock, removed the battery and plugged it in separately with no other peripherals. I also took out half the 4GB RAM and let the Dell BIOS diagnostics perform some lengthy testing, most of it on the remaining 2GB RAM (which was new from Crucial last year). After letting that finish a few hours later reporting no problems, I booted the machine in that state and it failed again. If this is a hardware fault, anybody know which part exactly is likely failing? Most parts on this machine are replaceable though it may not be economical to do so. Is it remotely possible that some bug in the display driver is corrupting either the graphics or main memory? For the moment I'm typing on the faulty machine but this week it's rapidly becoming less and less reliable; I fear my remaining keystrokes are limited. Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 11/27/2014 11:39 AM, Peter wrote:
If this is a hardware fault, anybody know which part exactly is likely failing? Most parts on this machine are replaceable though it may not be economical to do so. Is it remotely possible that some bug in the display driver is corrupting either the graphics or main memory?
I've seen this on other machines with failing video cards. In some cases i've been able to remove the video card, and simply re-seat it, in others I've squeezed every socketed chip on the video card and solved the problem. Laptop, not so easy. It might be a ribbon cable connector somewhere that needs a bit of re-seating, or maybe a frayed ribbon where it traverses the hinge on the laptop lid. This does look more like the video chip-set is crapping out. If it often happens when you type, I'd look at what is immediately below the keyboard in the inside of the machine. Whats down there? Is it hot? Overheating parts due to deteriorated grease under the heatsink, (common dell problem), or clogged fans? Parts: > http://www.parts-people.com/index.php?action=item&id=6437 Cool video: > http://www.parts-people.com/index.php?action=item&id=5391#video There is also a bios upgrade issue http://www.downloadplex.com/Drivers/BIOS-System-Updates/Dell/dell-latitude-d... And there was a warranty issue with many of these D630s http://forums.anandtech.com/showpost.php?p=27865565&postcount=5 -- After all is said and done, more is said than done. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 27/11/14 21:16, John Andersen wrote:
It might be a ribbon cable connector somewhere that needs a bit of re-seating, or maybe a frayed ribbon where it traverses the hinge on the laptop lid.
I was wondering about that when I had a peek inside yesterday. But the latter situation presumably wouldn't account for the same graphics problems showing on the TV/monitor attached to the dock. Also, the laptop hasn't really been moved in the time since these problems began occurring, so it would be bizarre if something had become displaced.
This does look more like the video chip-set is crapping out. If it often happens when you type, I'd look at what is immediately below the keyboard in the inside of the machine. Whats down there?
Is it hot? Overheating parts due to deteriorated grease under the heatsink, (common dell problem), or clogged fans?
This is where I get worried. I purchased the laptop and dock secondhand last year, and it came with Ubuntu preloaded. All ran fine and quiet, but after toying around with the way of the Shuttlecock for a couple of weeks out of curiosity, I wiped it and installed openSUSE. Initially, it ran perfectly silent at forty-something degrees celsius there too. I installed an SSD and updated to 13.1, but at some indeterminable point thereafter it began to run more hot and noisy. These days the dual CPUs tend to run around the sixty-something mark, sometimes venturing into the seventies celsius, and there is the accompanying fan noise. Currently it's at a steady 71° whilst playing a video in VLC. I don't know what temperature is deemed too much for a CPU / GPU.
Parts: > http://www.parts-people.com/index.php?action=item&id=6437 Cool video: > http://www.parts-people.com/index.php?action=item&id=5391#video There is also a bios upgrade issue http://www.downloadplex.com/Drivers/BIOS-System-Updates/Dell/dell-latitude-d... And there was a warranty issue with many of these D630s http://forums.anandtech.com/showpost.php?p=27865565&postcount=5
Thanks for all the info and links, and to Felix also. Quite a bit for me to get my head round when I have time. As regards the BIOS, I have the A17 update which should negate the issues in your third link. I see there is an A19 update described merely as having 'enhanced system security' but I don't think I need further NSA intrusions, besides which it requires Windows/DOS to install. So it should be fine on that front. I suppose my running twin displays doesn't help keep the GPU activity down but the temperature increases started long before I set up the second screen anyway. Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-11-28 01:23, Peter wrote:
Currently it's at a steady 71° whilst playing a video in VLC. I don't know what temperature is deemed too much for a CPU / GPU.
I think it is too hot for that relatively mild load, unless the thing is too old, or it is decoding in software. What temperature is too much, depends on each cpu. Are you using the proprietary nvidia driver? I had to go back to 331.79, the updates made it worse. It was unstable. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 28/11/14 02:33, Carlos E. R. wrote:
Are you using the proprietary nvidia driver? I had to go back to 331.79, the updates made it worse. It was unstable.
I don't know how I would be able to roll back to an earlier revision, since the only one in the official nVidia repo is the current one. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
* Peter
On 28/11/14 02:33, Carlos E. R. wrote:
Are you using the proprietary nvidia driver? I had to go back to 331.79, the updates made it worse. It was unstable.
I don't know how I would be able to roll back to an earlier revision, since the only one in the official nVidia repo is the current one.
ftp://download.nvidia.com/XFree86/Linux-x86_64/ contains a long list of previous and current versions of 64-bit. 32-bit are also available. in runlevel 3: sh ./NVidia.....run -aqs -- (paka)Patrick Shanahan Plainfield, Indiana, USA @ptilopteri http://en.opensuse.org openSUSE Community Member facebook/ptilopteri http://wahoo.no-ip.org Photo Album: http://wahoo.no-ip.org/gallery2 Registered Linux User #207535 @ http://linuxcounter.net -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 11/27/2014 07:23 PM, Peter wrote:
Also, the laptop hasn't really been moved in the time since these problems began occurring, so it would be bizarre if something had become displaced.
My experience is that chips, at least, seem to work themselves loose from sockets for no apparent reason. Maybe its micro-vibrations, the cooling fans, who knows what, but it seems to happen over time. Hence flat soldered chips and retaining clips. Oh, then there's oxidization ... Having electrical current flowing tends to encourage that, for some reason, despite that the adverts about 'electronic' car body anti-rust devices claim. Don't ever discount this. For some reason removing things that can be unplugged, wiping them down, possibly with an air-brush, and pushing them back home firmly seems to cure an amazing number of problems :-) -- /"\ \ / ASCII Ribbon Campaign X Against HTML Mail / \ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2014-11-28 02:49, Anton Aylward wrote:
Oh, then there's oxidization ... Having electrical current flowing tends to encourage that, for some reason, despite that the adverts about 'electronic' car body anti-rust devices claim.
Current flow causes corrosion because it is the same effect as in batteries. In the right direction and voltage it impedes it - or you can place a "sacrifice anode", connect it to a calculated voltage, and it is that anode which corrodes, not the ship. The metals used in electronics contacts are calculated to avoid this - the best is using gold plating. For that an other reasons. -- Cheers / Saludos, Carlos E. R. (from 13.1 x86_64 "Bottle" at Telcontar)
On 11/28/2014 09:26 AM, Carlos E. R. wrote:
On 2014-11-28 02:49, Anton Aylward wrote:
Oh, then there's oxidization ... Having electrical current flowing tends to encourage that, for some reason, despite that the adverts about 'electronic' car body anti-rust devices claim.
Current flow causes corrosion because it is the same effect as in batteries. In the right direction and voltage it impedes it - or you can place a "sacrifice anode", connect it to a calculated voltage, and it is that anode which corrodes, not the ship.
The metals used in electronics contacts are calculated to avoid this - the best is using gold plating. For that an other reasons.
True, but there are two provisos to that. Not all 'gold' contacts are fingers on circuit boards that plug in. Even the DIN two part connectors have one end soldered in to the circuit board, and solder joints corrode. In fact solder joints are the #1 'terminal' failure mode in electronics, they always have been ever since the days of valves! that has been the greatest motivation for microelectronics and LSI. However that is slightly off topic. Wiggling contacts may make an oxidizing solder joint fail but that's another matter. There's another 'evil' of electrical current in that it seems to attract charged particles such as dust. Even the gold plated two part DIN connectors seem to attract dirt. But lets face it, not all contacts are gold plated. The legs on the centeipedal chips are very rarely gold plated, even though some of the internal wiring might be. Yes, all my memory chips-boards have gold fingers, and the mobos have retaining clips to hold them in firmly, but how often do we see problems fixed by taking those (or other) boards out and wiping the contacts clean and airbrushing the mobo sockets? Yes, there's gold, and then there's 'dross', aka dirt, dust, grime, oxided finger oils and more. And in my computers the #1 problem is CAT HAIR -- A: Yes. > Q: Are you sure? >> A: Because it reverses the logical flow of conversation. >>> Q: Why is top posting frowned upon? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 11/27/2014 07:23 PM, Peter wrote:
but at some indeterminable point thereafter it began to run more hot and noisy.
The "noisy" might be the real issue. Fans running close to limits, and that limit may be !FAILURE!, are noisy. Or in my case the fans become over-stressed due to an accumulation of cat hair, which was also impeding air flow and making the laptop run hot. Use an un-bent paper clop to hold the fan blade still while apply a vacuum cleaner. You did turn the laptop off first before turning ut upside down and introducing it to the vacuum cleaner, didn't you? :-) -- /"\ \ / ASCII Ribbon Campaign X Against HTML Mail / \ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 28/11/14 02:54, Anton Aylward wrote:
On 11/27/2014 07:23 PM, Peter wrote:
but at some indeterminable point thereafter it began to run more hot and noisy.
The "noisy" might be the real issue. Fans running close to limits, and that limit may be !FAILURE!, are noisy.
Or in my case the fans become over-stressed due to an accumulation of cat hair, which was also impeding air flow and making the laptop run hot.
Use an un-bent paper clop to hold the fan blade still while apply a vacuum cleaner. You did turn the laptop off first before turning ut upside down and introducing it to the vacuum cleaner, didn't you? :-)
Well my absence of replies or updates here is partly due to long and horrible working hours, but mainly due to the laptop reaching the state of rarely working at all anymore. It's just decided to come on now at 5 in the morning, helpfully, so I'm disabling all power saving and hoping I can make it just stay on. I spent much of earlier today taking it to pieces for the first time, unseating all the cables connected to the display and replugging any other connectors I could find. I also put a bit more thermal grease on the CPU. I thought the fan was clean and fine, since that's how it appears from above when I've removed just the keyboard previously, but when I took out the heatsink, hidden under protective sheeting, there was indeed a load of fluff blocking that, just the same as used to happen in my old Sony Vaio. In that old machine I did used to take the vacuum cleaner to it, but in the end I sourced and replaced its noisy fan. My Dell is now sitting in the forty-something celsius range, back where it used to be when I got it. However, the fault remains. There were no other obvious signs of problems to be found, although my tugging at the screen cables at the point where they enter into the tube under the screen, which itself seems to be completely inaccessible were I wanting to check the cable inside, probably didn't help matters. Starting up the machine afterwards I immediately had the same lines on the screen. Sometimes there's some hard disk activity, sometimes if I press F2 for setup I get a couple of warning beeps but can't actually get to the BIOS, and other times there's no activity at all. Often a single press on the power button switches it off, or it goes off itself after half a minute or so. I can either Magic Key or Ctrl-Alt-Delete to reboot, and once in a blue moon it suddenly comes up. Even on this current occasion, I could still see some trace of the display corruption overlaid over the boot sequence. Alas I have no idea what part I could replace. Since the corruption also shows on my monitor plugged into the dock it probably can't be the screen itself that is faulty, more likely the graphics chip, but that is completely gummed onto the board and I guess can't be replaced without an extremely dab hand at soldering. I'd have to replace the entire main board. Pff, just as I thought I'd have a weekend free to turn my attentions back to my second machine, which also has problems I posted about on here a few weeks ago, now this happens. Computer hardware seems to just get suckier. Peter -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Peter composed on 2014-11-29 06:01 (UTC+0100):
...more likely the graphics chip...
Did you miss these parts of the first link I sent you? "D630...'08...close to a 100% failure rate..." Unless your chip is newer than '08, how else could you expect anything different to be your problem? Sorry to say, but those guys do tell it like it is. -- "The wise are known for their understanding, and pleasant words are persuasive." Proverbs 16:21 (New Living Translation) Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Peter composed on 2014-11-27 20:39 (UTC+0100):
My main machine's on the blink (quite literally, sometimes). Probably not an openSUSE-specific problem, but can't be sure. In the worst case, it could be a hardware fault brought about by a rogue driver, but I'm probably just being paranoid there. Can anybody recognise the following symptoms and tell me what's likely at fault?
(openSUSE 13.1, 64-bit. KDE 4.11)
I have a Dell Latitude D630 laptop...
http://www.badcaps.net/forum/showthread.php?t=19719 http://www.badcaps.net/forum/showthread.php?t=27394 -- "The wise are known for their understanding, and pleasant words are persuasive." Proverbs 16:21 (New Living Translation) Team OS/2 ** Reg. Linux User #211409 ** a11y rocks! Felix Miata *** http://fm.no-ip.com/ -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (6)
-
Anton Aylward
-
Carlos E. R.
-
Felix Miata
-
John Andersen
-
Patrick Shanahan
-
Peter