IOMMU...PCI memory hole...MTRR...I give up.

I have an MSI K8D-Master-F motherboard with two Opteron 244's. It recently got retasked as a desktop machine due to another system's failure...I thought the little ATI Rage built into the motherboard would make more than a sufficient 2d desktop type framebuffer, and as far as other stuff inside this box... this should be a pretty kick ass, high performance workstation. But I can literally watch the screen redraw if I switch virtual desktops. Painfully slow. Thinking this was just some crippled built-in video bug, I threw in an old nVidia card we had lying around in one of the regular PCI slots. Slightly slower, if you can believe it. I think this: mtrr: type mismatch for e5000000,1000000 old: uncachable new: write-combining And of course this: $ cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 Has something to do with it (and may also explain why I'm getting less than stellar performance out of other PCI devices, such as the RAID array that has a 128MB memory region the driver directly reads and writes on). The above is with the nVidia PCI card inserted, the built-in ATI card would also emit a similar mtrr rejection message, just the address is different, and the system would only list 512MB uncachable without the nVidia. Also notice that there's 4GB installed in this machine, in 2 2GB DIMMs installed in one bank together. And yet the BIOS map: BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable) BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data) BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS) BIOS-e820: 00000000ff7c0000 - 0000000100000000 (reserved) Lists nothing beyond the 4GB mark where that memory might reasonably have been moved to. Assuming this is just a neglectful BIOS, I tried setting 'mem=4G' and 'memmap=1G@4G' kernel options, to define the 1GB 'hole' I presume is being placed here, and which got me this added line to dmesg: user: 0000000100000000 - 0000000140000000 (usable) But still only 3GB available. To add even more insult to rising injury, I noticed while looking through dmesg that the IOMMU is being disabled. It appears the value being advertised by the CPU's: CPU 0: aperture @ 1b80000000 size 128 MB Is being rejected because it lies (WELL) above the 4GB mark. I've been going through archives and I'm not finding anything that looks like actionable advice. The latest BIOS MSI's webpage lists for this motherboard is 1.1: already installed. It has no option for 'pci hole: software'. The only options I'm finding are memory bank interleaving (which I tried disabling), and 'Disabled/Best-Fit/Absolute' settings for the IOMMU, as well as its aperture size (128M, which I might increase if it would be used rather than ignored...). One post in the archive that matched on motherboard make and model looks confused to me, because he refers to his BIOS as AWARD 2.0, but the only BIOS MSI puts out for this board is AMI (versioned 1.1). In summary what I'd like to know is: 1) Where is the 'go fast' button? re: pci video that's slower than even isa video should be (and presumably similar performance problems on other PCI devices). 2) How do I get the last 25% of my memory back? 3) How do I get the system to put the IOMMU somewhere in the range it stole under 4G (so Linux can use it)? Can Linux move this on its own accord? Thanks in advance for any and all help. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

I have an MSI K8D-Master-F motherboard with two Opteron 244's. It recently got retasked as a desktop machine due to another system's failure...I thought the little ATI Rage built into the motherboard would make more than a sufficient 2d desktop type framebuffer, and as far as other stuff inside this box... this should be a pretty kick ass, high performance workstation.
But I can literally watch the screen redraw if I switch virtual desktops. Painfully slow. Thinking this was just some crippled built-in video bug, I threw in an old nVidia card we had lying around in one of the regular PCI slots. Slightly slower, if you can believe it.
I think this:
mtrr: type mismatch for e5000000,1000000 old: uncachable new: write-combining
And of course this:
$ cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Yes, that's set up wrong. The IOMMU and AGP spaces can't be uncachable, since uncachable overrides all other cache settings. You should look for a memory settings option which specifies "discrete" or "continuous", and change the value. Hopefully, that will return some of your memory to you. You should also look to see if there are any memory hoisting options available.
3) How do I get the system to put the IOMMU somewhere in the range it stole under 4G (so Linux can use it)? Can Linux move this on its own accord?
Linux can move this on it's own accord, but until you fix the MTRRs (which Linux can't currently do), it isn't going to help. If you're feeling adventurous, you check to see if there are any patches available to enable PAT. I know Andi was/is working on them, but I'm not sure how far along he is. -Mark Langsdorf AMD, Inc.

On Thu, Aug 10, 2006 at 03:34:12PM -0500, Langsdorf, Mark wrote:
$ cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Yes, that's set up wrong. The IOMMU and AGP spaces can't be uncachable, since uncachable overrides all other cache settings.
You should look for a memory settings option which specifies "discrete" or "continuous", and change the value. Hopefully, that will return some of your memory to you.
You should also look to see if there are any memory hoisting options available.
Nothing like that in the BIOS menu. The best I seem to be able to do is adjust interleaving.
If you're feeling adventurous, you check to see if there are any patches available to enable PAT. I know Andi was/is working on them, but I'm not sure how far along he is.
Got a URL? All I'm finding from google are the odd linux-kernel post. There's an initial patch, but it looks highly theoretical (I don't see any /proc support I might reasonably use). Thanks for taking the time, Mark. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

$ cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Yes, that's set up wrong. The IOMMU and AGP spaces can't be uncachable, since uncachable overrides all other cache settings.
You should look for a memory settings option which specifies "discrete" or "continuous", and change the value. Hopefully, that will return some of your memory to you.
You should also look to see if there are any memory hoisting options available.
Nothing like that in the BIOS menu.
It's been three years since AMD required those options; I don't believe your motherboard have been EOL'd at that point.
The best I seem to be able to do is adjust interleaving.
That won't help, as I'm sure you've noticed.
If you're feeling adventurous, you check to see if there are any patches available to enable PAT. I know Andi was/is working on them, but I'm not sure how far along he is.
Got a URL? All I'm finding from google are the odd linux-kernel post. There's an initial patch, but it looks highly theoretical (I don't see any /proc support I might reasonably use).
Again, I know Andi is working on it, but I'm not sure of the status. You may want to check with him, or ping me again in October. -Mark Langsdorf AMD, Inc.

On Thu, Aug 10, 2006 at 04:13:57PM -0500, Langsdorf, Mark wrote:
It's been three years since AMD required those options; I don't believe your motherboard have been EOL'd at that point.
The RELNOTES on their last BIOS update (which I've been running since 2004 sometime) for this board, according to their website: 1. This is AMI BIOS, second formal release. 2003/7/4. version 1.1 2. Added Feature: (a) Adds Adaptec 2120S Support. 3. Problem Solved (a) Fixes Wake on Lan failed on S1 state. 2003/7/4...close shave? I'd have to be pretty dense to miss them from navigating the BIOS menu, but that's happened before so I went through it again anyway, even looking in every seemingly unrelated submenus (sadly, it was not hidden in the IDE configuration section). No dice, I guess I'm SOL.
That won't help, as I'm sure you've noticed.
Indeed. Thanks again Mark. I've thrown more stuff at their tech support web form, but I rather wonder how effective that will be. Hope you don't mind me quoting you: "three years since AMD required those options". -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

On Thu, Aug 10, 2006 at 02:49:36PM -0700, David W. Hankins wrote:
1. This is AMI BIOS, second formal release. 2003/7/4. version 1.1
Correction, in an amazing 14 minute turnaround, MSI gave me an unpublished update: MS-9131 V1.2 BIOS Release 1. This is AMI BIOS, third formal release. 2003/11/28. version 1.2 2. Added Feature: (a)Add support for all series of AMD Opteron(up to 248) (b)Updated AMD memory reference ^^^^^^^^^^^^^^^^^^^^ 3. Problem Solved (a) Fixed the issue that SuSe-Linux 64Bit BOOT with USB-CDROM will hang. (b) Fixed the LSI-22915 U160 SCSI card will hang on POST. (c) WakeUp from S1 fail if pressed KBD/moved Mouse during it entering StandBy. Now all I have to do is find a floppy drive. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

No such luck on MSI's v1.2 bios for this K8 Master F. The only tangible difference seems to be that it steals memory in smaller chunks (now I got ~3768MB until I re-enabled the IOMMU and gave it 256MB). No new options in the BIOS menu...in fact several options seem to have disappeared. PCI bus 1 appears to have been renumbered 4. I also don't remember seeing these in the lspci output before: 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control They sound ominously relevant. But a fat lot of good it does me. And they've redone all the ACPI stuff so now I seem to be able to monitor the cpu temperature etc: ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: Thermal Zone [THRM] (50 C) ACPI: Fan [FN00] (on) ACPI: Fan [FN01] (on) That's new. But effectively no change to mtrr: reg00: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg01: base=0xd8000000 (3456MB), size= 128MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 That's sized a little better, but still overlaps the video cards' frame buffers. mtrr: type mismatch for f5000000,1000000 old: uncachable new: write-combining And the IOMMU aperture is still at an impossibly high address: Checking aperture... CPU 0: aperture @ 1b00000000 size 256 MB Aperture from northbridge cpu 0 beyond 4GB. Ignoring. At this point I'm mostly writing for the benefit of any others who might google this in the archives. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

* David W. Hankins <David_Hankins@isc.org> [060811 01:25]:
ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: Thermal Zone [THRM] (50 C) ACPI: Fan [FN00] (on) ACPI: Fan [FN01] (on)
That's new.
But effectively no change to mtrr:
reg00: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg01: base=0xd8000000 (3456MB), size= 128MB: uncachable, count=1 reg02: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
That's sized a little better, but still overlaps the video cards' frame buffers.
mtrr: type mismatch for f5000000,1000000 old: uncachable new: write-combining
And the IOMMU aperture is still at an impossibly high address:
Checking aperture... CPU 0: aperture @ 1b00000000 size 256 MB Aperture from northbridge cpu 0 beyond 4GB. Ignoring.
At this point I'm mostly writing for the benefit of any others who might google this in the archives.
Just to make sure: Did you reset the CMOS with the reset jumper on the mainboard?
-- David W. Hankins "If you don't do it right the first time,
Stefan -- SUSE LINUX Products GmbH, Maxfeldstr. 5 Mail: sf@suse.de D-90409 Nuernberg Phone: +49-911-740 53 - 0 GPG 1024D/91614BBC B226 E3DA 37B0 2170 7403 D19C 18AF E579 9161 4BBC

On Fri, Aug 11, 2006 at 10:30:55AM +0200, Stefan Fent wrote:
Just to make sure: Did you reset the CMOS with the reset jumper on the mainboard?
No, I hadn't thought to try that. The BIOS indicated the CMOS checksum was bad, which I took to mean that the new version automatically threw out the old version's dataset. I then did a 'load optimal defaults', which oddly enough did not enable bank or node interleaving, and searched through menu by menu for new options. But I'll try the CMOS jumper just to be sure before I throw in the towel on this. I'm back at 3776MB (since the IOMMU is at a useless address, I disabled it), so at least I'm a little better off than when I started. I also had another funny idea I might try sometime today: disabling hardware acceleration on the ATI. These slow page flips might be the result of copying inbetween different parts of video memory. So if the frame buffer became write-only, I'm wondering if it would speed up any. That would at least make the symptom go away. Kind weird to think that turning off acceleration would speed up performance. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

Hello, I have the following problem: when I insert a new SATA disk, it moves "in front" of my existing SATA disks and the machine cannot boot anymore. Here, the details: My box has two SATA disks, one to boots from and one with data. I want to insert a third one and migrate the data SATA disk to the new SATA disk (the boot disk is untouched). Somehow the new disk is faster to negotiate with the motherboard than the others and always becomes sda. This messes up the configuration of grub, which now tries to boot from the new disk. The order to plug them into the motherboard has no effect. No matter how I plug them in, the new disk is always first. My question: Is there a way to give persistent names to SATA disks which are already used during boot? In that fashion, I could give persistent names and configure grub to boot from my SATA boot disk, no matter what other SATA disks there are. regards, einar P.S.: I remember having had such problems under windows ten years ago. Where a new disk made a mess out of your drive letters. Funny, how old problems are still not solved... :-) --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-amd64+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-amd64+help@opensuse.org

Hello, I found a solution: It is possible to give the file system labels. These labels are tied to the files system itself. They can be used in grub and fstab. The new SATA HD still presses in and becomes /dev/sda but since grub and fstab follow the label, it does not matter. For completeness: e2label to give ext3 file systems a label and mkswap -L for swap systems. There are similar commands for other file system types. Grub then looks: kernel /boot/vmlinuz root=LABEL=root bla bla bla and fstab: LABEL=root / ext3 defaults, bla bla bla Mind, the label given has to be unique. regards, einar einar_linux wrote:
Hello,
I have the following problem: when I insert a new SATA disk, it moves "in front" of my existing SATA disks and the machine cannot boot anymore.
Here, the details: My box has two SATA disks, one to boots from and one with data. I want to insert a third one and migrate the data SATA disk to the new SATA disk (the boot disk is untouched). Somehow the new disk is faster to negotiate with the motherboard than the others and always becomes sda. This messes up the configuration of grub, which now tries to boot from the new disk. The order to plug them into the motherboard has no effect. No matter how I plug them in, the new disk is always first.
My question: Is there a way to give persistent names to SATA disks which are already used during boot? In that fashion, I could give persistent names and configure grub to boot from my SATA boot disk, no matter what other SATA disks there are.
regards,
einar
P.S.: I remember having had such problems under windows ten years ago. Where a new disk made a mess out of your drive letters. Funny, how old problems are still not solved... :-) --------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-amd64+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-amd64+help@opensuse.org
--------------------------------------------------------------------- To unsubscribe, e-mail: opensuse-amd64+unsubscribe@opensuse.org For additional commands, e-mail: opensuse-amd64+help@opensuse.org

The mtrr type mismatch messages are relatively harmless by itself -- they happen even on perfectly working machines. In particular uncachable->write combing is a standard transition that happens often -- the BIOS left a uncachable memory region and the X server or the frame buffer decides to turn it WC for better performance.
3) How do I get the system to put the IOMMU somewhere in the range it stole under 4G (so Linux can use it)? Can Linux move this on its own accord?
The IOMMU should work, you just lose 128MB of memory because its aperture will be mapped over memory as fallback. I don't think that is your problem.
Linux can move this on it's own accord, but until you fix the MTRRs (which Linux can't currently do),
What Linux can't do is to fix it up automatically. But it should be possible to do it manually after boot (although the interface is a bit weird). See /usr/src/linux/Documentation/mtrr.txt The MTRR driver should be able to change any MTRRs using this.
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Looking at your log it might be enough to change reg01 to end at 3072MB
it isn't going to help.
If you're feeling adventurous, you check to see if there are any patches available to enable PAT. I know Andi was/is working on them, but I'm not sure how far along he is.
PAT is still work in progress, I'm sorry. Even after it's done I'm not sure it would help in this particular case -- at least not without patching the X server. Norm Another option to get the machine working quickly might be to take out 1 or 2GB of memory. Most of these problems only happen because many BIOS are not very good at handling memory bumping into the PCI hole. -Andi

On Fri, Aug 11, 2006 at 11:50:31AM +0200, Andi Kleen wrote:
The mtrr type mismatch messages are relatively harmless by itself -- they happen even on perfectly working machines.
In particular uncachable->write combing is a standard transition that happens often -- the BIOS left a uncachable memory region and the X server or the frame buffer decides to turn it WC for better performance.
Well, they aren't 'taking', at least not in so far as /proc/mtrr displays, nor in how X performs.
3) How do I get the system to put the IOMMU somewhere in the range it stole under 4G (so Linux can use it)? Can Linux move this on its own accord?
The IOMMU should work, you just lose 128MB of memory because its aperture will be mapped over memory as fallback. I don't think that is your problem.
I tried commenting out the bit of code in the kernel that detects apertures addressed over 4G and ignores them. This removed the line from dmesg that said: "Aperture from northbridge cpu 0 is greater than 4G. Ignoring." But this remained: PCI-DMA: Disabling IOMMU. I did not look into the PCI code to see why it's choosing to do this, but I assumed it was unprepared to use a 64-bit IOMMU.
Linux can move this on it's own accord, but until you fix the MTRRs (which Linux can't currently do),
What Linux can't do is to fix it up automatically.
But it should be possible to do it manually after boot (although the interface is a bit weird). See /usr/src/linux/Documentation/mtrr.txt
The MTRR driver should be able to change any MTRRs using this.
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Looking at your log it might be enough to change reg01 to end at 3072MB
Documentation/mtrr.txt doesn't tell me how to do that. If I delete register 1, the system will crash (hard hang). If I try to add any region that overlaps with register 1 (but doesn't overlap with register zero - these get errored out and appear to make no change), the count on register one gets incremented and nothing changes. I've toyed with the idea of hacking into the kernel mtrr code - when the mtrr is detected/read at boot time, when hopefully one might disable interrupts and initialize the mtrr from scratch without anything else going on. Do you think that tactic would bear fruit?
Another option to get the machine working quickly might be to take out 1 or 2GB of memory. Most of these problems only happen because many BIOS are not very good at handling memory bumping into the PCI hole.
Right, I thought of that last night and wished I'd thought of it yesterday, I'll give this a shot. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins

On Friday 11 August 2006 18:15, David W. Hankins wrote:
On Fri, Aug 11, 2006 at 11:50:31AM +0200, Andi Kleen wrote:
The mtrr type mismatch messages are relatively harmless by itself -- they happen even on perfectly working machines.
In particular uncachable->write combing is a standard transition that happens often -- the BIOS left a uncachable memory region and the X server or the frame buffer decides to turn it WC for better performance.
Well, they aren't 'taking', at least not in so far as /proc/mtrr displays, nor in how X performs.
Maybe the particular combination set up by the BIOS manages to violate the priority rules of the CPUs (you can look it up in the architecture manual, they're pretty complicated)
I tried commenting out the bit of code in the kernel that detects apertures addressed over 4G and ignores them. This removed the line from dmesg that said: "Aperture from northbridge cpu 0 is greater than 4G. Ignoring." But this remained:
PCI-DMA: Disabling IOMMU.
I did not look into the PCI code to see why it's choosing to do this, but I assumed it was unprepared to use a 64-bit IOMMU.
The whole point of the GART IOMMU is to be below 4GB to handle IO to devices that can't access more than 32bits worth of address space. Putting it above 4GB is totally useless.
Linux can move this on it's own accord, but until you fix the MTRRs (which Linux can't currently do),
What Linux can't do is to fix it up automatically.
But it should be possible to do it manually after boot (although the interface is a bit weird). See /usr/src/linux/Documentation/mtrr.txt
The MTRR driver should be able to change any MTRRs using this.
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
Looking at your log it might be enough to change reg01 to end at 3072MB
Documentation/mtrr.txt doesn't tell me how to do that. If I delete register 1, the system will crash (hard hang). If I try to add any region that overlaps with register 1 (but doesn't overlap with register zero - these get errored out and appear to make no change), the count on register one gets incremented and nothing changes.
Hmm, maybe if you change the default MTRR first?
I've toyed with the idea of hacking into the kernel mtrr code - when the mtrr is detected/read at boot time, when hopefully one might disable interrupts and initialize the mtrr from scratch without anything else going on.
Do you think that tactic would bear fruit?
Maybe. -Andi
participants (5)
-
Andi Kleen
-
David W. Hankins
-
einar_linux
-
Langsdorf, Mark
-
Stefan Fent