Memory Problem: 3Gb instead of 4Gb, TYAN K8WE s2895, 2x252 Opteron
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok. However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default. Thanks!
Fuad Efendi wrote:
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok.
However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default.
Thanks!
Does the BIOS see all the memory? Sounds like the AGP Aperture is taking up some system memory. There may be some memory configuration options in the BIOS that you can tweak, but they may make the system unstable. See my thread - http://lists.suse.com/archive/suse-amd64/2005-Jan/0104.html -Ken -- Ken Siersma, Software Engineer EKK, Inc. phone: (248) 624-9957 fax: (248) 624-7158 http://www.ekkinc.com -- "Our lives begin to end the day we become silent about things that matter." -MLK Jr.
Thanks Ken, AGP or smth reserves some memory... I am not familiar with Linux at engineering level, sorry... May be for IO on a second CPU, or simmetric on both I received reply also in [suse-sles-e]:
This is normal behavior. x86 architecture reserves an amount of memory for PCI cards and other memory segments. If you put 4GB on such a Board you will see a maximum of 3.5GB depending on which devices are on Board and PCI bus. Oliver Antwerpen
-----Original Message----- From: Ken Siersma [mailto:siersmak@ekkinc.com] Sent: Monday, July 11, 2005 10:39 AM Cc: suse-amd64@suse.com Subject: Re: [suse-amd64] Memory Problem: 3Gb instead of 4Gb, TYAN K8WE s2895, 2x252 Opteron Fuad Efendi wrote:
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok.
However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default.
Thanks!
Does the BIOS see all the memory? Sounds like the AGP Aperture is taking up some system memory. There may be some memory configuration options in the BIOS that you can tweak, but they may make the system unstable. See my thread - http://lists.suse.com/archive/suse-amd64/2005-Jan/0104.html -Ken -- Ken Siersma, Software Engineer EKK, Inc. phone: (248) 624-9957 fax: (248) 624-7158 http://www.ekkinc.com -- "Our lives begin to end the day we become silent about things that matter." -MLK Jr. -- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
Same issue here with my Tyan K8WE s2885's. This is a known Motherboard/BIOS issue that Tyan currently does not have a fix for. We have 4 GB of physical ram in our machines, but SUSE only sees 3.4. Thanks, John "Fuad Efendi" <fuad@efendi.ca> 07/11/05 10:32 AM To <suse-amd64@suse.com> cc Subject [suse-amd64] Memory Problem: 3Gb instead of 4Gb, TYAN K8WE s2895, 2x252 Opteron SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok. However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default. Thanks! -- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
Hi, I have S2895 too. I was able to make OS see all the system memory (8 Gb) enabling "memory node interleave" and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI memory hole ... Dima. --- Fuad Efendi <fuad@efendi.ca> wrote:
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok.
However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default.
Thanks!
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
____________________________________________________ Sell on Yahoo! Auctions no fees. Bid on great items. http://auctions.yahoo.com/
Thanks guys, Interesting... What I found on http://www.tyan.com/support/html/f_tg_mp.html: ======= "AMD chipset architecture requires memory above 3.5GB to be reserved for PCI devices. You will typically see 3.6-3.8GB available." ======= Sounds like it is not a problem! I found similar problems with Intel! DELL Precision & Red Hat & Intel Pentium 64-bit, and with XEON too. Sounds like DELL uses same BIOS/Architecture/Motherboard/Phoenix BIOS/etc. http://www.linuxquestions.org/questions/showthread.php?s=&threadid=32252 5 And even 64-bit Windows: http://www.planetamd64.com/lofiversion/index.php/t8663.html However, can I really have 12-16 Gb on TYAN S2895? It's written on their site "teoretically supports up to 16Gb, but not tested", have a fun, guys! "Tyan Computer Corporation, founded in 1989 by long-time Intel and IBM executive, Dr. T. Symon Chang..." http://www.opendrivers.com/company/2139/tyan-free-driver-download.html http://www.opendrivers.com/categorycompany/14/1139/bios-and-system-updat e-tyan-free-driver-download.html Funny... -----Original Message----- From: Dima [mailto:bryga66@yahoo.com] Sent: Monday, July 11, 2005 11:06 AM To: suse-amd64@suse.com Subject: Re: [suse-amd64] Memory Problem: 3Gb instead of 4Gb, TYAN K8WE s2895, 2x252 Opteron Hi, I have S2895 too. I was able to make OS see all the system memory (8 Gb) enabling "memory node interleave" and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI memory hole ... Dima. --- Fuad Efendi <fuad@efendi.ca> wrote:
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok.
However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default.
Thanks!
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
____________________________________________________ Sell on Yahoo! Auctions - no fees. Bid on great items. http://auctions.yahoo.com/ -- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
Dima <bryga66@yahoo.com> wrote:
I have S2895 too. I was able to make OS see all the system memory (8 Gb) enabling "memory node interleave" and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI memory hole ...
I have an S2885, and with both the BIOS versions where the "memory hole remapping: software" was available, memory copy bandwidth in the bottom 4 GB (including the remapped portion) was considerably decreased, from about 1.7GB/sec in the non-remapped case, to about 1.2GB/sec. YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue. (My server has 8GB physically installed, but only 7170-ish MB is available, with remapping it puts the memory between 3 & 4GB up at the end of the address space, so physical memory appears to go up to the 9GB mark) Note that for the newest Opterons available (including the dual-core chips), there is a new "hardware remapping" feature which may not have the performance hit. I haven't been able to test it but bet it probably doesn't have that problem. Some people seem to know what the "hole before 4GB" is, and others do not, so a short tutoral (which those who know can tune out): -- Hardware with RAM or memory-mapped I/O needs to be able to map it somewhere. -- 32-bit OSes *and* 32-bit addressable hardware (which is the majority of the PCI hardware out there right now) both need to be able to work for the general case. So, a "hole" at 4GB which grows downward depending on the total size of the addresses used is punched. It consists of (usually more or less in this order growing downward from 4GB): -- x86 ROM/BIOS (this really is required to be aligned against the top of the 4GB boundary) -- IO APIC -- PCI/AGP devices as mapped in PCI config space numerically (but not required to be in that order) NOTE each of the PCI domains are required to be aligned to their native size. -- AGP aperture (NOTE: the agp aperture is required to be aligned to it's native size. The point of the "native size alignment" comments above is that, for example, if you have, say a video card with a 256MB video buffer, then at minimum, you'll have at least a 512MB "PCI hole", since the BIOS needs to be the last part before 4GB and the video buffer must be aligned to a 256MB (i.e. native size) boundary. If it was a 512MB video buffer, then you'd be guaranteed to lose at least 1GB. So, it's easier then you'd imagine to lose 0.5GB or even a full 1GB in the PCI hole with some high-resource add-in adapters. My server box loses about 870MB without memory remapping enabled. Though I guess I asked for it with that 256MB video card and the other add-in adapters I use. This is not a Linux vs. MS-Windows nor an AMD vs. Intel thing. It's just a fact of how the PC architecture works. Modern Intel and AMD server hardware both support remapping the extra memory to the high end of the physical address space, but I've never tried the Intel version, so I don't know if it has any caveats like the performance hit for the "software" version. -- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular"
Thanks for the useful description, Erich. If there was a FAQ for this mailing list I would propose to put this in because it's really asked a lot. Some comments: On Mon, Jul 11, 2005 at 09:06:06AM -0700, Erich Boleyn wrote:
I have an S2885, and with both the BIOS versions where the "memory hole remapping: software" was available, memory copy bandwidth in the bottom 4 GB (including the remapped portion) was considerably decreased, from about 1.7GB/sec in the non-remapped case, to about 1.2GB/sec. YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue. (My server has 8GB physically installed, but only 7170-ish MB is available, with remapping it puts the memory between 3 & 4GB up at the end of the address space, so physical memory appears to go up to the 9GB mark)
Are you sure it works that way in old pre E stepping world? My understanding was that just the next DIMM is mapped above 4GB. e.g. if you have 2 2GB DIMMs, then first DIMM would be from 0 to 2GB and the next one from 4GB to 6GB with remapping on, leaving a 2GB hole. One theory for the slowdown was broken MTRR entries, but I don't think they can explain your case. If the MTRR is not set to write back for memory it's either very slow or normal performance, not a slight slow down like you see.
So, a "hole" at 4GB which grows downward depending on the total size of the addresses used is punched. It consists of (usually more or less in this order growing downward from 4GB):
-- x86 ROM/BIOS (this really is required to be aligned against the top of the 4GB boundary) -- IO APIC -- PCI/AGP devices as mapped in PCI config space numerically (but not required to be in that order) NOTE each of the PCI domains are required to be aligned to their native size. -- AGP aperture (NOTE: the agp aperture is required to be aligned to it's native size.
There needs to be also some free space. e.g. with PCI hotplug or Cardbus in laptops the OS needs to assign IO space later after boot. It will also do this if the BIOS "forgot" to assign some IO mappings in devices. It's tricky because the BIOS doesn't know how much is needed later. This is a continuous source of problems on Linux too. I tried to map into unused space in this case (space not reported as used in the e820 map), but that also leads to mysterious failures on some boards.
The point of the "native size alignment" comments above is that, for example, if you have, say a video card with a 256MB video buffer, then at minimum, you'll have at least a 512MB "PCI hole", since the BIOS needs to be the last part before 4GB and the video buffer must be aligned to a 256MB (i.e. native size) boundary. If it was a 512MB video buffer, then you'd be guaranteed to lose at least 1GB.
The aperture must be aligned this way, but I am not sure about the video buffer. It might be a secondary requirement to make MTRR assignment easier ("MTRRs - the bane of the x86 world")
Modern Intel and AMD server hardware both support remapping the extra memory to the high end of the physical address space, but I've never tried the Intel version, so I don't know if it has any caveats like the performance hit for the "software" version.
No reports of this problem on Intel hardware, so I presume it is usually enabled by default and works without visible side effects. -Andi
Andi Kleen <ak@suse.de> wrote:
Thanks for the useful description, Erich. If there was a FAQ for this mailing list I would propose to put this in because it's really asked a lot.
Some comments:
On Mon, Jul 11, 2005 at 09:06:06AM -0700, Erich Boleyn wrote:
I have an S2885, and with both the BIOS versions where the "memory hole remapping: software" was available, memory copy bandwidth in the bottom 4 GB (including the remapped portion) was considerably decreased, from about 1.7GB/sec in the non-remapped case, to about 1.2GB/sec. YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue. (My server has 8GB physically installed, but only 7170-ish MB is available, with remapping it puts the memory between 3 & 4GB up at the end of the address space, so physical memory appears to go up to the 9GB mark)
Are you sure it works that way in old pre E stepping world?
My understanding was that just the next DIMM is mapped above 4GB. e.g. if you have 2 2GB DIMMs, then first DIMM would be from 0 to 2GB and the next one from 4GB to 6GB with remapping on, leaving a 2GB hole.
As the saying goes, we're both right. I said between 3 & 4GB, but I forgot to mention my system is loaded with 1GB DIMMs. Though, I *thought* it was a bit different: My understanding of the way it was supposed to work was: -- Take the last *bank* of the DIMMs, and map that high, but across the full width of the controller. -- Most large DIMMs are dual-banked, and in the case I have, I have 4 1GB DIMMs which are each dual-banked. So, for each DIMM, the last bank is 0.5GB. And it remapping between 3 & 4GB would make sense. For the Rev E world, my understanding is that it remaps just the addresses you need in hardware without this memory controller hack, so I think it should fix any (perhaps inadvertant) slowdown.
One theory for the slowdown was broken MTRR entries, but I don't think they can explain your case. If the MTRR is not set to write back for memory it's either very slow or normal performance, not a slight slow down like you see.
If I misunderstood or there is a bug in what Tyan did, that would explain the slowdown. If they can only remap one DIMM, then you have to ungang the link and not run it in interleaved fashion, so you lose maximum memory bandwidth on all the DIMMs on the Opteron Northbrige involved. I'm going to see if we can fix that with Tyan. Worst case, you could probably get them to remap both of the 2 whole DIMMs up if you had 1GB DIMMs fully populated... (i.e. if 4 1GB DIMMs, it would remap 2 of them high and therefore preserve the interleaved state of the memory controller).
So, a "hole" at 4GB which grows downward depending on the total size of the addresses used is punched. It consists of (usually more or less in this order growing downward from 4GB):
-- x86 ROM/BIOS (this really is required to be aligned against the top of the 4GB boundary) -- IO APIC -- PCI/AGP devices as mapped in PCI config space numerically (but not required to be in that order) NOTE each of the PCI domains are required to be aligned to their native size. -- AGP aperture (NOTE: the agp aperture is required to be aligned to it's native size.
There needs to be also some free space. e.g. with PCI hotplug or Cardbus in laptops the OS needs to assign IO space later after boot. It will also do this if the BIOS "forgot" to assign some IO mappings in devices.
It's tricky because the BIOS doesn't know how much is needed later.
This is a continuous source of problems on Linux too. I tried to map into unused space in this case (space not reported as used in the e820 map), but that also leads to mysterious failures on some boards.
Agreed.
The point of the "native size alignment" comments above is that, for example, if you have, say a video card with a 256MB video buffer, then at minimum, you'll have at least a 512MB "PCI hole", since the BIOS needs to be the last part before 4GB and the video buffer must be aligned to a 256MB (i.e. native size) boundary. If it was a 512MB video buffer, then you'd be guaranteed to lose at least 1GB.
The aperture must be aligned this way, but I am not sure about the video buffer. It might be a secondary requirement to make MTRR assignment easier ("MTRRs - the bane of the x86 world")
Native size alignment is a requirement of both the AGP aperture AND all PCI BARs (Base Address Registers, the mechanism whereby any RAM or Memory-mapped I/O is mapped into the physical address space). In the case of PCI, it is not possible to map them in any way but aligning to the size of the mapped area. All address bits below the power-of-2 size of the item being mapped are guaranteed to be masked out in the PCI specification. In PCI, the official way to determine the size of what a BAR can map is in fact by writing all 1's into the register and seeing which bits were masked out on the bottom, with the requirement that it must be a power-of-2. I've always thought that was a bit wacky. For AGP, the specification explicitly states that the aperture must be aligned, but the address register does happen to be separate from the way to set/determine the size of the window. -- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular"
Sorry if I jump in into that discussion as an absolute newbie to the 64bit world :-) I've just bought 2 AMD64 systems with Asus A8V boards and 4 1GB DIMMs each. Indeed I stepped over that memory problem and now I almost understand what the problem is. But let me ask two things: 1) If that reserved memory is required for PCI devices, why don't have 32bit systems that problem? On all my systems with 2GB RAM and video cards with 256MB video ram, there is the full 2GB available running linux. Why do only the 64bit systems need this explicit reservation between 3.5 and 4GB? 2) The Asus bios allows for software *and* hardware memory remapping. In both cases I have almost all 4GB of RAM available. However, I don't have 3D acceleration anymore because the fglrx driver complains about those uncacheable mtrr space between 3.5 and 4GB. Andi wrote sth. about that problem in an earlier thread that I found in the archives. What I don't understand is: How can that memory be available suddenly, when using the bios option to remap that memory hole, if the memory must be reserved for PCI devices? Or is it that it is indeed no longer reserved when doing the remapping, and that's why the fglrx drivers cannot access it? Andi wrote sth. in that other thread about manually creating a write- combined mtrr space from the uncacheable space (which I didn't try yet), but I don't understand why this should be possible: If I could easily create a write-combined space in the uncacheable space, why can't I just turn all the uncacheable mtrr space into write-combined and all the problems are gone? What's the relation between that "reserved for PCI" memory and this uncacheable mtrr range between 3.5 and 4G? Sorry when I'm asking so much, but I would really like to understand the background of these 64bit secrets :-) cu, FRank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
1) If that reserved memory is required for PCI devices, why don't have 32bit systems that problem? On all my systems with 2GB RAM and video cards with 256MB video ram, there is the full 2GB available running linux. Why do only the 64bit systems need this explicit reservation between 3.5 and 4GB?
With 2GB RAM there is enough enough unused space in the 4GB area to place the memory hole without conflicts. -Andi
Hello Frank and Andi, may I ask which version of Suse you are using and which BIOS revision your A8V have? We have A8V with 4x1GB Dimms and Suse 9.2, which don't boot with activated memory remapping. AFAIR there is also no option to choose between software and hardware remapping (what is the difference anyway?). Do you have SCSI-Controllers in your systems? Because during the initialization of the SCSI-controller the boot fails on our system with memory remapping enabled. Andi, is there a list of all boards, whose memory remapping option works together with Suse? Doesn't matter if those are Intel, AMD64 or Opteron based. Ciao Siegbert
Andi, is there a list of all boards, whose memory remapping option works together with Suse? Doesn't matter if those are Intel, AMD64 or Opteron based.
There isn't. It would be misleading anyways because it depends on the BIOS version. For negative reports you can search the archives of this mailing list. -Andi
Siegbert Baude wrote
Hello Frank and Andi,
may I ask which version of Suse you are using and which BIOS revision your A8V have?
I'm using SuSE 9.3, and the bios is 1013. I recommend you try SuSE 9.3 or perhaps even the latest kernel of the day, because I think there has been a lot of progress for x86_64 between 9.2 and 9.3.
We have A8V with 4x1GB Dimms and Suse 9.2, which don't boot with activated memory remapping. AFAIR there is also no option to choose between software and hardware remapping (what is the difference anyway?).
I don't know about the difference, but the 1013 bios has indeed both options. Maybe it makes a difference in memory access time?
Do you have SCSI-Controllers in your systems? Because during the initialization of the SCSI-controller the boot fails on our system with memory remapping enabled.
Well, likely that's for the same reason the tg3 driver fails with memory remapping enabled, or fglrx does not work (see my other post). So it seems to be the best choice to disable memory remapping :-/ cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
Frank Steiner <fsteiner-mail1@bio.ifi.lmu.de> wrote:
Sorry if I jump in into that discussion as an absolute newbie to the 64bit world :-) I've just bought 2 AMD64 systems with Asus A8V boards and 4 1GB DIMMs each. ... Sorry when I'm asking so much, but I would really like to understand the background of these 64bit secrets :-)
No problem. It's a long and convoluted history that most people don't know much about.
Indeed I stepped over that memory problem and now I almost understand what the problem is. But let me ask two things:
1) If that reserved memory is required for PCI devices, why don't have 32bit systems that problem? On all my systems with 2GB RAM and video cards with 256MB video ram, there is the full 2GB available running linux. Why do only the 64bit systems need this explicit reservation between 3.5 and 4GB?
It's not a 64-bit specific thing (as I think Andi Kleen commented on shortly in another response) per se, it's about the 4GB physical addressing limitation of 32-bit x86 PC architecture. This isn't quite true for some modern processors, they have 36-bit and even 40-bit physical addressing limitations instead of 32-bit, but desktop chipsets don't support more than 32-bit in general still, and the OS support is not there. So, the PC architecture, before 64-bit PCs came about looked like: 0 ==== 640K - 1GB ======================= (MAXMEM) --- PCI Hole --- 4GB The problem is if MAXMEM is greater than the size of your PCI hole, then you would lose some RAM. The RAM would just be inaccessible. If you had less than (4GB - pci hole size) of RAM, this is not a problem. Now with the number of PCs with 4GB or more RAM out there increasing, new features to do memory remapping (sometimes called "hoisting") have come about. Also now that desktop machines have 64-bit OSes and drivers available. The remapping would take a part at the end of the address space and map that into the RAM behind the PCI hole. New picture for a machine with 4GB of RAM total: 0 ==== 640K - 1GB ==== (PCI hole start) --- 4GB === (4GB + pci hole size) In the 64-bit world, it's the same, except the remapped RAM would appear at the end of the normal RAM, so for example, my 8GB machine looked like this: 0 ==== 640K - 1GB ==== (PCI hole start) --- 4GB === 8GB ...and with remapping looks like this: 0 ==== 640K - 1GB ==== (PCI hole start) --- 4GB === (8GB + pci hole size) NOTE: Even on machines running only 64-bit OSes, the PCI hole pretty much has to be in the same place, because many PCI add-in cards only support being mapped into 32-bit addressable locations, so must be placed under 4GB in the address space. This is also the reason the IOTLB exists, so that RAM above the 4GB limit can be accessed by 32-bit PCI devices.
2) The Asus bios allows for software *and* hardware memory remapping. In both cases I have almost all 4GB of RAM available. However, I don't have 3D acceleration anymore because the fglrx driver complains about those uncacheable mtrr space between 3.5 and 4GB. Andi wrote sth. about that problem in an earlier thread that I found in the archives. What I don't understand is: How can that memory be available suddenly, when using the bios option to remap that memory hole, if the memory must be reserved for PCI devices? Or is it that it is indeed no longer reserved when doing the remapping, and that's why the fglrx drivers cannot access it?
NOTE: On Athlon64/Opteron systems, pre-rev-E parts only support software remapping, but rev-E parts support both. I'm not sure exactly what you mean by "suddenly"... it's a BIOS feature that had to be rolled out after lots of testing, so it's not surprising it took a while for it to appear even though the hardware could do it. The BIOS could be punting and not trying to map the spaces correctly because of the complexity of MTRR overlap rules. Before it only had cacheable memory below the PCI hole, now it has it on either size (above and below), and this could be making it punt and just mark them all as uncacheable. I strongly suspect it's just a bug when the remapping is enabled.
Andi wrote sth. in that other thread about manually creating a write- combined mtrr space from the uncacheable space (which I didn't try yet), but I don't understand why this should be possible: If I could easily create a write-combined space in the uncacheable space, why can't I just turn all the uncacheable mtrr space into write-combined and all the problems are gone?
There are multiple MTRRs available (generally) on modern x86 hardware to set memory types. You would want to take one of the available MTRRs and set the range of memory for the video card to be write-combining. Setting all uncacheable memory to write-combining would likely crash your system either instantly or nearly so. "write-combining" means individual writes may not be separable, and most hardware drivers require individual writes, at least to control registers, to be guaranteed as separate operations. It's OK to make a video frame buffer write-combining because it's generally irrelevant if you write half or a whole pixel or two pixels as one operation. Video drivers are also written to account for this issue. It allows them to get better performance since multiple writes can be combined into one.
What's the relation between that "reserved for PCI" memory and this uncacheable mtrr range between 3.5 and 4G?
PCI devices in general need their memory addresses to be marked as uncacheable, so I'm sure that's what the BIOS does by default. That's where they are in the physical address space. -- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular"
Erich, this was a very very helpful explanation! Thanks a lot! I think I've understood most of the issues. Erich Boleyn wrote
The remapping would take a part at the end of the address space and map that into the RAM behind the PCI hole. New picture for a machine with 4GB of RAM total:
0 ==== 640K - 1GB ==== (PCI hole start) --- 4GB === (4GB + pci hole size)
Yes, that's what I did not understand before. So remapping means just taking the RAM that would be made invisible by the PCI hole and move it to some other place so that it is accessible. First I though that the PCI devices were taking some amount from the memory, so that it would always be lost and didn't understand why it could be available ("suddenly" :-)) when using remapping.
The BIOS could be punting and not trying to map the spaces correctly because of the complexity of MTRR overlap rules. Before it only had cacheable memory below the PCI hole, now it has it on either size (above and below), and this could be making it punt and just mark them all as uncacheable.
I strongly suspect it's just a bug when the remapping is enabled. ... There are multiple MTRRs available (generally) on modern x86 hardware to set memory types. You would want to take one of the available MTRRs and set the range of memory for the video card to be write-combining. ... PCI devices in general need their memory addresses to be marked as uncacheable, so I'm sure that's what the BIOS does by default. That's where they are in the physical address space.
Ok, so I tried to play around with that a little bit. With memory remapping, /proc/mtrr looks like that: I can disable reg3 and reg2, but trying to create some new write-combining mtrr starting at 3.5GB (and recreating uncacheable ones behind the write-combining) always fails. The kernel complains "mtrr: type mismatch for e8000000,8000000 old: write-back new: write-combining" (That also happens already when X starts and loads the fglrx driver) and indeed no new mtrr range appears in /proc/mtrr. So it looks like I can't create overlapping mtrrs when write-back mtrrs are setup for the whole memory from 0-4GB. So I disabled reg0 also (disabling reg1 first crashes the PC immediately) and tried to set up a write-back mtrr at 0 with size 3,5GB, then a write-combining mtrr at 3.5 with size 128MB (my video ram), and the rest up to 4gb as uncachebable. But trying to create the write-back mtrr with echo "base=0x00000000 size=0xe0000000 type=write-back" >| /proc/mtrr the kernel complains that "base(0x0000) is not aligned on a size(0xe0000000) boundary". Do you have an idea what I'm doing wrong here? Even if not, thanks a lot again for your explanation :-) cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
Frank Steiner wrote Sorry, forgot the cut&paste:
Ok, so I tried to play around with that a little bit. With memory remapping, /proc/mtrr looks like that:
reg00: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1 reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1 reg02: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1 reg03: base=0xe0000000 (3584MB), size= 128MB: write-combining, count=1 -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue.
Interesting, if HP xw9300 http://www.hp.com/workstations/pws/xw9300/ has the same problem. Looks like those boxes have exactly same boards. The only diff from s2895 I found is that xw9300 has one on-board network interface, not two. The chips and layout look the same. Though BIOS is HP's. And it's printed all over the board "hp,hp,hp,...". Dima. --- Erich Boleyn <erich@uruk.org> wrote:
Dima <bryga66@yahoo.com> wrote:
I have S2895 too. I was able to make OS see all the system memory (8 Gb) enabling "memory node interleave" and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI memory hole ...
I have an S2885, and with both the BIOS versions where the "memory hole remapping: software" was available, memory copy bandwidth in the bottom 4 GB (including the remapped portion) was considerably decreased, from about 1.7GB/sec in the non-remapped case, to about 1.2GB/sec. YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue. (My server has 8GB physically installed, but only 7170-ish MB is available, with remapping it puts the memory between 3 & 4GB up at the end of the address space, so physical memory appears to go up to the 9GB mark)
Note that for the newest Opterons available (including the dual-core chips), there is a new "hardware remapping" feature which may not have the performance hit. I haven't been able to test it but bet it probably doesn't have that problem.
Some people seem to know what the "hole before 4GB" is, and others do not, so a short tutoral (which those who know can tune out):
-- Hardware with RAM or memory-mapped I/O needs to be able to map it somewhere. -- 32-bit OSes *and* 32-bit addressable hardware (which is the majority of the PCI hardware out there right now) both need to be able to work for the general case.
So, a "hole" at 4GB which grows downward depending on the total size of the addresses used is punched. It consists of (usually more or less in this order growing downward from 4GB):
-- x86 ROM/BIOS (this really is required to be aligned against the top of the 4GB boundary) -- IO APIC -- PCI/AGP devices as mapped in PCI config space numerically (but not required to be in that order) NOTE each of the PCI domains are required to be aligned to their native size. -- AGP aperture (NOTE: the agp aperture is required to be aligned to it's native size.
The point of the "native size alignment" comments above is that, for example, if you have, say a video card with a 256MB video buffer, then at minimum, you'll have at least a 512MB "PCI hole", since the BIOS needs to be the last part before 4GB and the video buffer must be aligned to a 256MB (i.e. native size) boundary. If it was a 512MB video buffer, then you'd be guaranteed to lose at least 1GB.
So, it's easier then you'd imagine to lose 0.5GB or even a full 1GB in the PCI hole with some high-resource add-in adapters. My server box loses about 870MB without memory remapping enabled. Though I guess I asked for it with that 256MB video card and the other add-in adapters I use.
This is not a Linux vs. MS-Windows nor an AMD vs. Intel thing. It's just a fact of how the PC architecture works.
Modern Intel and AMD server hardware both support remapping the extra memory to the high end of the physical address space, but I've never tried the Intel version, so I don't know if it has any caveats like the performance hit for the "software" version.
-- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular"
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Dima wrote:
YMMV, but it seems there is either a performance
issue
with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue.
Interesting, if HP xw9300 http://www.hp.com/workstations/pws/xw9300/ has the same problem. Looks like those boxes have exactly same boards. The only diff from s2895 I found is that xw9300 has one on-board network interface, not two. The chips and layout look the same. Though BIOS is HP's. And it's printed all over the board "hp,hp,hp,...".
The xw9300 I had a chance to struggle with earlier this year has an nForce Professional 2200 chipset.
Dima.
--- Erich Boleyn <erich@uruk.org> wrote:
Dima <bryga66@yahoo.com> wrote:
I have S2895 too. I was able to make OS see all
the
system memory (8 Gb) enabling "memory node
interleave"
and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI
memory
hole ...
I have an S2885, and with both the BIOS versions where the "memory hole remapping: software" was available, memory copy bandwidth in the bottom 4 GB (including the remapped portion) was considerably decreased, from about 1.7GB/sec in the non-remapped case, to about 1.2GB/sec. YMMV, but it seems there is either a performance issue with the software remapping in general or a bug with the implementation. I've informed Tyan of the issue. (My server has 8GB physically installed, but only 7170-ish MB is available, with remapping it puts the memory between 3 & 4GB up at the end of the address space, so physical memory appears to go up to the 9GB mark)
Note that for the newest Opterons available (including the dual-core chips), there is a new "hardware remapping" feature which may not have the performance hit. I haven't been able to test it but bet it probably doesn't have that problem.
Some people seem to know what the "hole before 4GB" is, and others do not, so a short tutoral (which those who know can tune out):
-- Hardware with RAM or memory-mapped I/O needs to be able to map it somewhere. -- 32-bit OSes *and* 32-bit addressable hardware (which is the majority of the PCI hardware out there right now) both need to be able to work for the general case.
So, a "hole" at 4GB which grows downward depending on the total size of the addresses used is punched. It consists of (usually more or less in this order growing downward from 4GB):
-- x86 ROM/BIOS (this really is required to be aligned against the top of the 4GB boundary) -- IO APIC -- PCI/AGP devices as mapped in PCI config space numerically (but not required to be in that order) NOTE each of the PCI domains are required to be aligned to their native size. -- AGP aperture (NOTE: the agp aperture is required to be aligned to it's native size.
The point of the "native size alignment" comments above is that, for example, if you have, say a video card with a 256MB video buffer, then at minimum, you'll have at least a 512MB "PCI hole", since the BIOS needs to be the last part before 4GB and the video buffer must be aligned to a 256MB (i.e. native size) boundary. If it was a 512MB video buffer, then you'd be guaranteed to lose at least 1GB.
So, it's easier then you'd imagine to lose 0.5GB or even a full 1GB in the PCI hole with some high-resource add-in adapters. My server box loses about 870MB without memory remapping enabled. Though I guess I asked for it with that 256MB video card and the other add-in adapters I use.
This is not a Linux vs. MS-Windows nor an AMD vs. Intel thing. It's just a fact of how the PC architecture works.
Modern Intel and AMD server hardware both support remapping the extra memory to the high end of the physical address space, but I've never tried the Intel version, so I don't know if it has any caveats like the performance hit for the "software" version.
-- Erich Stefan Boleyn <erich@uruk.org> http://www.uruk.org/ "Reality is truly stranger than fiction; Probably why fiction is so popular"
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
-- Ken Siersma, Software Engineer EKK, Inc. phone: (248) 624-9957 fax: (248) 624-7158 http://www.ekkinc.com -- "Our lives begin to end the day we become silent about things that matter." -MLK Jr.
In message from Dima <bryga66@yahoo.com> (Mon, 11 Jul 2005 08:05:46 -0700 (PDT)):
Hi, I have S2895 too. I was able to make OS see all the system memory (8 Gb) enabling "memory node interleave" Switch "ON" memory node interleave on 2-CPU opteron server gives essentially more worse RAM bandwidth (I checked it really)!
Yours Mikhail Kuzminsky Zelinsky Institute of Organic Chemistry Moscwo
and "memory hole mapping: software" in BIOS. That probably will penalize your performance a bit .. I guess the missing memory is what is called PCI memory hole ...
Dima.
--- Fuad Efendi <fuad@efendi.ca> wrote:
SUSE Linux Enterprise Server 9, TYAN K8WE S2895, BIOS v.1.01 2 x Opteron 252 Troy. 4 x OCZ 1024MB DDR ECC PC3200 (Server Series) of memory. BIOS tests are Ok.
However, SLES 9 finds only 3 Gb instead of 4 Gb. All BIOS settings are default; installation is default.
Thanks!
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/
-- Check the List-Unsubscribe header to unsubscribe For additional commands, email: suse-amd64-help@suse.com
after reading the TYAN K8WE thread: I have an Opteron running with 2G on an Asus sk8v board since about a year now without any problems. Last week we tried to upgrade to 4gb and run into big trouble. I tried to upgrade from 9.2 to 9.3, but it does not work any better (as I already run a 2.6.11 kernel anyway) But it seems I have two problems in combination: First I learned, that Asus boards with a VIA chipset have some serious hardware problems dealing with >= 4Gb of memory. The Bios offered some switch to map any memory in conflict to regions above 4G, but the newest Bios update misses this option, as it did not work stable. So Q1: does it work in any way with 2.6.12 (i installed now) by setting any iommu=??? Why is addressing above 4gb a problem for a 64-bit cpu, anyway? I bought this 64bit system exactly to get rid of any ugly 4gb limit :-( I got messages like: PCI-DMA: "Out of SW-IOMMU space ..." but I'm not sure about what iommu setting I used for this try. Problem 2 seems to be a serious stability problem when pug in all 4G (4x1G Kingston registered ECC modules). My bios reports spurious strange "SYSTEM FAILURE DUE TO CPU OVERCLOCKING" messages, even if I did not change any settings at all. After several attempts to boot, the system even stays black. Not even any bios comes up. After shortcutting the NV-Ram it works again for some time. With only 2gb it seems to work stable again. Some other discussion threads suggest a weak power supply, but I use some 450W PSU. Is this even too weak to power an Operon + 4gB memory? We also examined the board for any popped capacitor, but they all appear to be fit. I'm willing to send the board to the moon, and get a new one. But following the current discussion about the Tyan board, I'm not sure how to to proceed now. Some suggestions? -- Dieter Stüken, con terra GmbH, Münster stueken@conterra.de http://www.conterra.de/ (0)251-7474-501
On Mon, Jul 11, 2005 at 06:27:56PM +0200, Dieter St?ken wrote:
in combination: First I learned, that Asus boards with a VIA chipset have some serious hardware problems dealing with >= 4Gb of memory. The Bios offered some switch to map any memory in conflict to regions above 4G, but the newest Bios update misses this option, as it did not work stable.
It should still work even without remapping, just you lose some memory where the IO hole is mapped over it. What might not work and often indeed doesn't is to enable the mapping option.
So Q1: does it work in any way with 2.6.12 (i installed now) by setting any iommu=??? Why is addressing above 4gb a problem for a 64-bit cpu, anyway?
The CPU has no problems with that, just some devices on your motherboard. If you want to truly escape the 4gb limit you need 64bit DMA capable devices too. e.g. most SCSI controllers support that, most IDE controllers don't.
I bought this 64bit system exactly to get rid of any ugly 4gb limit :-( I got messages like: PCI-DMA: "Out of SW-IOMMU space ..." but I'm not sure about what iommu setting I used for this try.
Some device needed more than 64MB of bounce buffer. On VIA use swiotlb=32768 or swiotlb=65536 For some reason nobody understands the standard IOMMU doesn't work on VIA so the software bounce buffering is used.
Problem 2 seems to be a serious stability problem when pug in all 4G (4x1G Kingston registered ECC modules). My bios reports spurious strange "SYSTEM FAILURE DUE TO CPU OVERCLOCKING" messages, even if I did not change any settings at all. After several attempts to boot, the system even stays black.
Doesn't sound like a Linux related problem. -Andi
participants (10)
-
Andi Kleen
-
Dieter Stüken
-
Dima
-
Erich Boleyn
-
Frank Steiner
-
Fuad Efendi
-
john_ozarchuk@goodyear.com
-
Ken Siersma
-
Mikhail Kuzminsky
-
Siegbert Baude