http://bugzilla.novell.com/show_bug.cgi?id=623303 http://bugzilla.novell.com/show_bug.cgi?id=623303#c3 --- Comment #3 from Samuel Kvasnica <samuel.kvasnica@ims.co.at> 2010-07-25 15:48:00 UTC --- So, in the meantime I've got a solution and stable system. This summary might be interesting: - more detailed inspection revealed growing mcelog when running in Dom0, got several these (fake?) messages along each crash: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor MCE 1 CPU 1 BANK 5 MISC 7ffe ADDR 2c48019700b TIME 1279652142 Tue Jul 20 20:55:42 2010 MCG status: MCi status: Error overflow Uncorrected error Error enabled MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: Internal Timer error STATUS fe00000000800400 MCGSTATUS 0 MCGCAP 1809 APICID 12 SOCKETID 1 CPUID Vendor Intel Family 6 Model 26 => not sure if these are real MCEs or some xen artifact, never seen these running a native kernel on this system - more thorough tests using native kernel (>10TB transfers) revealed there is a crash as well (but no MCEs here !) => so it is not a xen issue but xen environment seems much more sensitive - looking at /proc/interrupts revealed that several IRQs including megasas and 2 mpt channels were routed onto IRQ16 by BIOS. There are pretty much free IRQs, but BIOS decided to put everything onto IRQ16. Seems like the Supermicro X8DTi board has broken BIOS, I'm in contact with them to solve this. - in the meantime, I solved the problem by enabling MSI-Interrupts on the mptbase driver and on intel/lsi megasas driver. For megasas, this involved patching the driver beacause MSI is not supported there. => system feels pretty stable now using MSI-Interrupts -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.