Anders Johansson wrote:
On Friday 04 April 2008 17:15:45 David C. Rankin wrote:
You seem to be misunderstanding what "mce" is. A machine check exception is the hardware itself telling you that something has gone badly wrong. There is no interpretation involved in the software. The software just logs the message
If the mce says it is a hardware problem, you can count on its being a hardware problem
Anders
Jan, Anders, List: The more I read, and the more I test, the more I am concerned that there may be a simmering issue with the x86_64 code. I installed a plain-jan pci-e ATI card running with the open source driver. Just as with the nvidia 8600GT card (using the opensource "nv" driver), the system still gives occasional MCEs. Just as with the 8600, the MCEs do not have any affect on the system. If I wasn't logging them with mcelog, I would never know they were occurring. Reading the tech-docs, it is readily apparent that MCE doesn't necessarily mean hardware. Software is more than capable of causing them: AMD64 Architecture Programmer’s Manual Volume 2: System Programming 2.6.6 New Exception Conditions "The AMD64 architecture defines a number of new conditions that can cause an exception to occur when the processor is running in long mode. Many of the conditions occur when software attempts to use an address that is not in canonical form. See “Vectors” on page 208 for information on the new exception conditions that can occur in long mode." See:http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/2459... See Also: AMD64 - http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_7044,00... Opteron Specific - http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_9003,00... My question is, "What type of additional logging or data capture should I be doing in hopes of catching or narrowing down what the real cause of the MCE is?" I'm running and capturing the MCEs with mcelog running every minute under cron to insure I buffers never get filled. But beyond that, I'm not doing any other special logging. The only hardware I haven't changed is the motherboard and that tests fine. What else could I run/log/set that would give me the best change of finding the real culprit. Any help is much appreciated. -- David C. Rankin, J.D., P.E. Rankin Law Firm, PLLC 510 Ochiltree Street Nacogdoches, Texas 75961 Telephone: (936) 715-9333 Facsimile: (936) 715-9339 www.rankinlawfirm.com -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org