https://bugzilla.novell.com/show_bug.cgi?id=376165 Summary: AMD Multicore - Lockup/Reboot with MCE errors After nvidia kernel module load/install Product: openSUSE 10.3 Version: Final Platform: x86-64 OS/Version: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: drankinatty@suddenlinkmail.com QAContact: qa@suse.de Found By: --- System: Tyan Tomcat 8KE S2865ANRF datasheet: ftp://ftp.tyan.com/datasheets/d_s2865_100.pdf Processor: Opteron 180 (Ver. 2.0) Memory: 2G OCZ Platinum PC3200 (Timings 2-3-2)(certified OK) Video Card: MSI Nvidia 8600GT Twin Turbo Power Supply: Corsair 550W The Problem: A system will lockup or reboot randomly but frequently with the nvidia kernel module loaded. If a lockup is experienced, the keyboard the cap lock and scroll lock lights flash at approximately 1 second intervals. This occurs with any simple load applied to the system (zypper refresh, running kwrite, grep, etc.) Without a load, the system will idle for days. The nvidia module was installed via 1-click install that provided packages: nvidia-gfxG01-kmp-default-169.12_2.6.22.17_0.1-0.1 and x11-video-nvidiaG01-169.12-0.1. The lockups produce Machine check events stating that it is a Hardware problem. However, the problem is caused by the "inclusion" of the nvidia kernel module. It may *not* be the module itself, but it may be caused some address map or similar issue brought into play by loading the module. Judging from the list this problem seems to affect 10.3 x86_64 installs with multicore AMD processors with some motherboard chipset/architectures. The work around (in this case): Unload and blacklist the nvidia kernel module and pass "acpi_use_timer_override" to the kernel at boot. The basic "nv" driver is used after unloading No lockups and no further mce activity even while running mprime, XP in virtual box installing updates and deleting /var/cache/zypp/zypp.db and forcing a zypper refresh (simultaneously). Of course operating without the nvidia kernel module cripples the graphic system performance. Discussion: After performing a fresh 10.3 install on the system without any install problems with md5sum verified media, the machine began experiencing lockups frequently. At the time, there was no correlation between the nvidia graphics driver install and the lockups. (I may have installed the driver late along with updates, then the lockups started sometime the next day) With the nvidia driver installed, the machine will "idle" for days at a time until any load is applied, then the lockups occur. Ram, thermal and motherboard hardware are all elimated as possibilities. Ram: memtested plus physically shipped to OCZ and verified OK Thermal: The box is an Antec p182 case w/3 120mm fans, core 1 temps idle at 30 degrees C and are 37-38 under load. The core 2 temps idle at 23 and average 30 when underload. The bois PC Health and lm-sensors temps match almost exactly. All well short of the 74 degree safe operating limit and well short of the 85 degree shutdown temp. Motherboard Hardware: The mother board has a bios code window (LCD) on the motherboard and I have caught all the bios codes (including all self tests) and all the codes say everything is OK. I have gone through the bios and turned anything non-necessary off (Ser 1, Ser 2, Parallel, AC97 sound, etc.) Doesn't make any difference. The problem here lies with the Tyan S2865 (and other manufacuters) board/chipset architecture running multicore AMD processors and the apparent "hardware" failure caused by loading the nvidia kernel module. Again, perhaps not with the module itself, but as a result of its loading whether that be with address mapping, address space, etc. which causes the lockup. That is the core issue. The driver itself worked great until the system would freeze. Included Files: I am including the hwinfo, dmesg (with and without the "acpi_use_timer_override" applied), the mcelog (note the comment I placed at the end of the file when I unloaded the nvidia driver to denote when it was removed) and the syslog (complete from initial install on 3/8/08) Close: As with all these problems, I will provide you with any additional information or testing that you want me to do. Just ask. This may be a good one to take care of before 11.0 comes out as the use of multicore AMD processors and boards is on the rise. Thanks -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.