9.2 pro Installation fails on dual opteron
Hi we've been runing SUSE 9.1 pro on a dual opteron HP proliant server (6Gb ram, 3 disk raid) but when we try to install 9.2 it fails misserable. No matter what i select (safe mode, no acpi, normal whatever) i get the same result, the server deadlocks with no video in the end of the kernal load, before showing the language select dialog(no way to talk to the server except for powercycling...) I really can't se why previously suported hardware suddenly isn't functional? Does enyone have any ideas upon how to get the 9.2 version installed?! Is it any use to try the enterprise server version or should we just switch back to ReadHat? (must have a stable & functional server within a week or so and the .local domain issue of 9.1 makes it unusable) Best regards // Robert
Robert Brodén wrote:
Hi
we've been runing SUSE 9.1 pro on a dual opteron HP proliant server (6Gb ram, 3 disk raid) but when we try to install 9.2 it fails misserable. No matter what i select (safe mode, no acpi, normal whatever) i get the same result, the server deadlocks with no video in the end of the kernal load, before showing the language select dialog(no way to talk to the server except for powercycling...)
I have 8GB of ram and I had problems installing 9.2. I used mem=2048m (I think that was it) as a kernel option during installation. After install I did a kernel upgrade via Yast and rebooted. Then I removed the mem=2048m from the boot loader configuration via Yast and rebooted. Now it sees all 8GB of ram and boots perfectly. You might try this. Mark
Mark Horton wrote:
Robert Brodén wrote:
Hi
we've been runing SUSE 9.1 pro on a dual opteron HP proliant server (6Gb ram, 3 disk raid) but when we try to install 9.2 it fails misserable. No matter what i select (safe mode, no acpi, normal whatever) i get the same result, the server deadlocks with no video in the end of the kernal load, before showing the language select dialog(no way to talk to the server except for powercycling...)
I have 8GB of ram and I had problems installing 9.2. I used mem=2048m (I think that was it) as a kernel option during installation. After install I did a kernel upgrade via Yast and rebooted. Then I removed the mem=2048m from the boot loader configuration via Yast and rebooted.
Now it sees all 8GB of ram and boots perfectly. You might try this.
Another work around is to use the web edition of Suse 9.2. The
2.6.8-24 kernel has issues with some configs*. The web edition uses
2.6.8-24.10 which seems to have no issues.
*I believe it's related to iommu and some scsi drivers. (aacraid, and
3w-9xx) The same configs worked fine with ide drives. I'm not entirely
sure what the issue was as I stopped debugging once I discovered the
newer kernels fixed it.
--
Even more disturbing than this never-ending torrent of junk mail
is the fact that, apparently, they must actually work once in a while.
Sam Flory
On Thu, Feb 10, 2005 at 09:48:32PM +0100, Robert Brod?n wrote:
Hi
we've been runing SUSE 9.1 pro on a dual opteron HP proliant server (6Gb ram, 3 disk raid) but when we try to install 9.2 it fails misserable. No matter what i select (safe mode, no acpi, normal whatever) i get the same result, the server deadlocks with no video in the end of the kernal load, before showing the language select dialog(no way to talk to the server except for powercycling...)
Can you boot with mem=2G and then update the kernel with you? Or alternatively disable USB in the BIOS and then update the kernel anyways. -Andi
I've just announced a patch CD that might help in your case, Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Andreas Wahlert
Andreas Jaeger wrote:
I've just announced a patch CD that might help in your case, Andreas
You remember me??
Same problem on the FSC V810. When is this CD available?? We need it!
It's on ftp.suse.com, so it should be on all the mirrors soon... Download it e.g. from: ftp://ftp.gwdg.de/pub/linux/suse/ftp.suse.com/people/aj/PatchCD The FSC V810 has also another problem - a BIOS problem. This one cannot be fixed with the new kernel. As soon as I know more, I'll tell here, Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SUSE Linux Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Andreas Jaeger
Same problem on the FSC V810. When is this CD available?? We need it!
It's on ftp.suse.com, so it should be on all the mirrors soon...
Download it e.g. from:
ftp://ftp.gwdg.de/pub/linux/suse/ftp.suse.com/people/aj/PatchCD
The FSC V810 has also another problem - a BIOS problem. This one cannot be fixed with the new kernel. As soon as I know more, I'll tell here,
Ok, I tried that patch CD on a Fujitsu Siemens CELSIUS V810 and the kernel panic is still there: mtrr: v2.0 (20020519) general protection fault: 0000 [1] SMP CPU 1 Modules linked in: Pid: 0, comm: swapper Tainted: MG (2.6.8-24.10-smp SL92_BRANCH-200412221154270000) RIP: 0010:[<ffffffff8011a93e>] <ffffffff8011a93e>{generic_set_all+318} RSP: 0018:000001007ffabf48 EFLAGS: 00010006 RAX: 000000001e1e1e1e RBX: 0000000000000000 RCX: 0000000000000250 RDX: 000000001e1e1e1e RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000008 R08: 0000000006060606 R09: ffffffff8045df88 R10: 0000000000000000 R11: 0000000006060606 R12: 0000000000000000 R13: 0000000000000008 R14: 00000100cfeec7c0 R15: 0000000000000c00 FS: 0000000000000000(0000) GS:ffffffff804e2e80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 00000000c005003b CR2: 0000000000000000 CR3: 00000000cff02000 CR4: 0000000000000060 Process swapper (pid: 0, threadinfo 000001007ffa2000, task 0000010037e0b030) Stack: ffffffff804e3c10 00000100cff01ed8 0000000000000001 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffffffff8011947b 0000000000000000 0000000000000006 Call Trace:<IRQ> <ffffffff8011947b>{ipi_handler+75} <ffffffff8011c940>{smp_call_function_interrupt+64} <ffffffff8010f5c0>{default_idle+0} <ffffffff80110f2f>{call_function_interrupt+99} <EOI> <ffffffff8010f5e0>{default_idle+32} <ffffffff8010f9ea>{cpu_idle+26} Code: 0f 30 41 ba 01 00 00 00 31 ff 8d 8f 58 02 00 00 0f 32 41 89 RIP <ffffffff8011a93e>{generic_set_all+318} RSP <000001007ffabf48> <0>Kernel panic - not syncing: Aiee, killing interrupt handler! NMI Watchdog detected LOCKUP on CPU0, registers: CPU 0 Modules linked in: Pid: 1, comm: swapper Tainted: MG (2.6.8-24.10-smp SL92_BRANCH-200412221154270000) RIP: 0010:[<ffffffff80119580>] <ffffffff80119580>{set_mtrr+208} RSP: 0000:00000100cff01ec8 EFLAGS: 00000002 RAX: 0000000000000001 RBX: 00000000ffffffff RCX: 0000000000000002 RDX: 0000ffff0000ffff RSI: 0000000000000000 RDI: ffffffff8045e110 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff804e2e00(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo 00000100cff00000, task 0000010037e0b7e0) Stack: ffffffff8040e080 0000000000000246 0000000100000001 0000000000000000 0000000000000000 00000100ffffffff 0000000000000000 0000000000000008 0000000000000000 0000000000000000 Call Trace:<ffffffff804f11a0>{mtrr_init+352} <ffffffff8010c2f2>{init+514} <ffffffff8011129f>{child_rip+8} <ffffffff8010c0f0>{init+0} <ffffffff80111297>{child_rip+0} Code: f3 90 8b 44 24 10 85 c0 75 f6 be 08 00 00 00 48 c7 c7 10 e1 console shuts up ... I still wonder how the BIOS shall be involved with that. Since the problem is also occuring on other distributions I tried it out on Fedora Core 3 and installed kernels from kernel.org. Doing this the panic occurs in Kernel 2.6.9 from kernel.org, a vanilla 2.6.8 still boots on FC3 but with some Ooopses. The weird thing is that I don't see much that changed in the sources of the MTRR routines and I really don't see a difference that can cause that trouble here. So I guess there is a change somewhere else that affects our system. It really looks weird, kernels up to 2.6.8 (vanilla) are booting on FC3, so its difficult to explain to our customers that this should be a BIOS issue. Do you have an idea how we can get closer to that bug? If its really a BIOS issue I would love to tell it to the BIOS developer, but at the moment I don't have enough hard data to tell him things like "Kernel is issueing a BIOS call, BIOS is messing up" or something else. Regards Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131
mtrr: v2.0 (20020519) general protection fault: 0000 [1] SMP CPU 1 Modules linked in: Pid: 0, comm: swapper Tainted: MG (2.6.8-24.10-smp SL92_BRANCH-200412221154270000)
That's the BIOS bug. BIOS sets bad MTRR. Update the BIOS. I will do a workaround to not panic in this case, but the MTRRs will still be broken and the workaround will only be in the next version. -Andi
Andi Kleen wrote:
mtrr: v2.0 (20020519) general protection fault: 0000 [1] SMP CPU 1 Modules linked in: Pid: 0, comm: swapper Tainted: MG (2.6.8-24.10-smp SL92_BRANCH-200412221154270000)
That's the BIOS bug. BIOS sets bad MTRR. Update the BIOS.
I will do a workaround to not panic in this case, but the MTRRs will still be broken and the workaround will only be in the next version.
-Andi
See my message from the 17th (December). A new tyan BIOS worked fine on the machine where I previously got exactly this error message during boot. I tried out the newest FSC BIOS this week (not sure if the date of the file is correct: 22.09.04, I just downloaded the latest from the FSC homepage) and it hang again on mtrr errors. Message(17.12.): ___________________________ I could reproduce it here, too: oops at the exactly same function: RIP: 0010:[<ffffffff8011a8fe>] <ffffffff8011a8fe>{generic_set_all+318} The oops does not apear on a SLES9-SP1 installation However, I could solve the problem by installing a current BIOS version: It's a tyan S2885 board. The previos BIOS version was from 01.2004, the new one (v. 2885_202) is from 19.05.2004. what the update fixes (some mtrr enhancements are mentioned as well) see: http://www.tyan.com/support/html/b_s2885.html I got some scsi controller Problems on SLES9 (9.2 boots properly) now after the update, I cannot say anything on this for now, I will investigate further on this on Monday. Thomas _____________________________ BTW: The scsi problems vanished ... Conclusion: FSC BIOS: SLES9-SP1 -> good SL 9.2 -> mtrr error current kernels -> mtrr error Tyan BIOS: SLES9-SP1 -> good SL 9.2 -> good current kernels -> I don't know, probably good. However I am not involved about mtrr kernel/BIOS things and how this could be solved, but it seems clearly to be the BIOS. Thomas
Andi Kleen
mtrr: v2.0 (20020519) general protection fault: 0000 [1] SMP CPU 1 Modules linked in: Pid: 0, comm: swapper Tainted: MG (2.6.8-24.10-smp SL92_BRANCH-200412221154270000)
That's the BIOS bug. BIOS sets bad MTRR. Update the BIOS.
If the MTRR is so bad then why is it working in SuSE 9.1 and up to vanilla kernel 2.6.8? Anyway, since it looks like TYAN-BIOS from May 2004 is working I will go back to an older FSC-BIOS to check what's happening then. Our history was like that: BIOS 1.01.1692 was bad due to MTRR settings, so no 3D graphics BIOS 1.02.1692 fixed this and worked fine with SuSE 9.1 BIOS 1.04.1692 (released in June 2004 and should have the fixes from TYAN) showed the same errors as BIOS 1.01. So we asked the BIOS developer to fix this and got BIOS 1.05.1692 which worked fine again with SuSE 9.1 BIOS 1.06.1692 is actual and works fine with older 2.6 kernels, but panics with actual kernels. Ok, on my machine there's a compilation running since I found a nice little difference in arch/x86_64/kernel/nmi.c between 2.6.8 and 2.6.9 which could be releated to our problem. So I'll try out that and then anyway I'll try out the old BIOS 1.04, even if we never recommended that because of the problems with older kernels.
I will do a workaround to not panic in this case, but the MTRRs will still be broken and the workaround will only be in the next version.
Wait with that until I checked the things I told you above. And to be of help for the customers we need to find a way that the customer can install the 9.2 distribution. Regards Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131
If the MTRR is so bad then why is it working in SuSE 9.1 and up to vanilla kernel 2.6.8? Anyway, since it looks like TYAN-BIOS from May 2004 is working I will go back to an older FSC-BIOS to check what's happening then. Our history was like that:
The SUSE 9.2 MTRR driver is identical to the vanilla 2.6.8 driver. But it's strange that it didn't trigger with the older versions, agreed. Perhaps some race condition is involved.
Wait with that until I checked the things I told you above. And to be of help for the customers we need to find a way that the customer can install the 9.2 distribution.
Updating the BIOS? There is unfortunately no command line option to turn MTRR syncing off, although I'm considering to add it now. -Andi
Rainer Koenig
Ok, on my machine there's a compilation running since I found a nice little difference in arch/x86_64/kernel/nmi.c between 2.6.8 and 2.6.9 which could be releated to our problem. So I'll try out that and then anyway I'll try out the old BIOS 1.04, even if we never recommended that because of the problems with older kernels.
Ok, I tried both things out: Changing nmi.c to didn't help at all, kernel is still crashing. Changing the BIOS to the 1.04 version didn't help either. So whatever Tyan did in their BIOS, it didn't find its way into our BIOS. Anyway, my next try is with an original BIOS from Tyan, even if that means that I probably need to program the flash memory chip directly. But that I'll try out next week... Regards Rainer -- Dipl.-Inf. (FH) Rainer Koenig Project Manager Linux Fujitsu Siemens Computers VP BC E SW OS Phone: +49-821-804-3321 Fax: +49-821-804-2131
Ok, I tried both things out:
Changing nmi.c to didn't help at all, kernel is still crashing. Changing the BIOS to the 1.04 version didn't help either.
So whatever Tyan did in their BIOS, it didn't find its way into our BIOS. Anyway, my next try is with an original BIOS from Tyan, even if that means that I probably need to program the flash memory chip directly. But that I'll try out next week...
When you know e.g. 2.6.8 works and 2.6.9 is crashing you can download -bk* snapshots from kernel.org and do a binary search. -Andi
Andi Kleen wrote:
Ok, I tried both things out:
Changing nmi.c to didn't help at all, kernel is still crashing. Changing the BIOS to the 1.04 version didn't help either.
So whatever Tyan did in their BIOS, it didn't find its way into our BIOS. Anyway, my next try is with an original BIOS from Tyan, even if that means that I probably need to program the flash memory chip directly. But that I'll try out next week...
When you know e.g. 2.6.8 works and 2.6.9 is crashing you can download -bk* snapshots from kernel.org and do a binary search.
Tell me if I can help testing new BIOS/kernel verions ... Thomas
participants (8)
-
Andi Kleen
-
Andreas Jaeger
-
Andreas Wahlert
-
Mark Horton
-
Rainer Koenig
-
Robert Brodén
-
Samuel Flory
-
Thomas Renninger