Tyan S2885ANRF AGP framebuffer write slowness?
Andreas and other interested folks, I just grabbed the 2.4.21-139 kernel, built it and installed it on my Tyan S2885ANRF. It now finds the 8151 AGP controller correctly, and seems to correctly identify both AGP110 (ATI X1) and older AGP v2 cards, go into 4x/8x modes fine and the like. However, if you take video-card-of-your-choice, map its framebuffer and set the AGP bridge for fast writes, the write bandwidth to the framebuffer is BAD. About a factor of 40 slower than it should be. A 1x card writes at about 18MB/s, a 4x card at 25 and an 8x card at 48MB/s. These numbers should be 256MB/s, 1GB/s and 2GB/s. Under no circumstances (1x, fast writes off) should the bandwidth be less than 256MB/s. I've tired this on lots of 760MPX machines and get the performance I expect, but I am at a loss as to what could possibly be causing this slowness. I notice that the Tyan board sets up the MTRR for the whole PCI physical address space to be uncachable: linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1 In previous framebuffer drivers, I have set the MTRR for the frame buffer to be write combining, which does have an effect on performance ('tho I've never known the effect to be a factor of 40!). I'd like to do the same thing on the Tyan board, but unsetting the c0000000-ffffffff uncacheable MTRR, or fragmenting it so that everything in that region EXCEPT the framebuffer is uncacheable both quickly lead to system crasshes (why? how could caching the framebuffer hang the system in text mode (corrupt screen contents, sure)? is this a race while the region is uncacheable?) It seems there's something strange going on. Could this be some kind of IOMMU interaction? The AGP aperture ends up at e0000000, which I cover with an uncacheable MTRR entry when I try the fragments, so I don't THINK I'm hurting anything. Any ideas why it could possibly be so slow or what I might do to get it back up to reasonable write speeds? Hard to do video when you can only get 2 frames/sec into the card's RAM. Thanks! -mcq
John McCorquodale <mcq1@viz.cacr.caltech.edu> writes:
Andreas and other interested folks,
I just grabbed the 2.4.21-139 kernel, built it and installed it on my Tyan S2885ANRF. It now finds the 8151 AGP controller correctly, and seems to correctly identify both AGP110 (ATI X1) and older AGP v2 cards, go into 4x/8x modes fine and the like.
That kernel especially contained a fix for the 2885 after we got hold of it a few days earlier.
However, if you take video-card-of-your-choice, map its framebuffer and set the AGP bridge for fast writes, the write bandwidth to the framebuffer is BAD. About a factor of 40 slower than it should be. A 1x card writes at about 18MB/s, a 4x card at 25 and an 8x card at 48MB/s. These numbers should be 256MB/s, 1GB/s and 2GB/s. Under no circumstances (1x, fast writes off) should the bandwidth be less than 256MB/s.
Do you have some simple test program that I can run locally?
I've tired this on lots of 760MPX machines and get the performance I expect, but I am at a loss as to what could possibly be causing this slowness.
I notice that the Tyan board sets up the MTRR for the whole PCI physical address space to be uncachable:
linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1
Yes, this looks strange. How much memory do you have?
In previous framebuffer drivers, I have set the MTRR for the frame buffer to be write combining, which does have an effect on performance ('tho I've never known the effect to be a factor of 40!). I'd like to do the same thing on the Tyan board, but unsetting the c0000000-ffffffff uncacheable MTRR, or fragmenting it so that everything in that region EXCEPT the framebuffer is uncacheable both quickly lead to system crasshes (why? how could caching the framebuffer hang the system in text mode (corrupt screen contents, sure)? is this a race while the region is uncacheable?)
It seems there's something strange going on. Could this be some kind of IOMMU interaction? The AGP aperture ends up at e0000000, which I cover with an uncacheable MTRR entry when I try the fragments, so I don't THINK I'm hurting anything.
Any ideas why it could possibly be so slow or what I might do to get it back up to reasonable write speeds? Hard to do video when you can only get 2 frames/sec into the card's RAM.
Andreas -- Andreas Jaeger, aj@suse.de, http://www.suse.de/~aj SuSE Linux AG, Deutschherrnstr. 15-19, 90429 Nürnberg, Germany GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Do you have some simple test program that I can run locally?
Yeah. Sorry for the delay; took me a while to find time to clean it up for public consumption. It's only 8k so I attached it; I hope that doesn't cheese anybody off (and survives the listware). Otherwise, it can be grabbed from here: http://viz.cacr.caltech.edu/dl/fbtest-0.1.tar.bz2 Edit the Makefile to point at your kernel source tree (e.g. /usr/src/linux), type make and then insmod the resulting fbtest.o. It's a module that ALWAYS fails to install, but does a benchmark along the way. You should use it with an ATI video card in the box, but any Radeon card >=8500 should work (note that the 8500DV did not support fast writes). It works for 2.4.20 and 2.4.21, and perhaps others. What you're looking for is the write-combined/fast-write pair, which for a 4x AGP card on other machines achieves 1 GB/s bandwidth, and on the S2885 achieves a sad, depressing 25 MB/s. AGP Pro110 and 8x AGP cards should see even better performance.
linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1
Yes, this looks strange. How much memory do you have?
8 x 1GB -- I also observe the "missing 1GB" problem that Mike Frenz mentions. (Mike: did you ever get that patch from AMI via Tyan? Any more word on this MTRR strangeness?) Anyway, following are some dmesg excerpts from the framebuffer write bandwidth tester on a 760MPX box and a S2885ANRF. I hope these are helpful. The goal is to to fast-write/write-combining and get at least 1GB/s write bandwidth. -mcq --- 2.4.21 on a 760MPX board with Radeon 9100 --- Default: 63 MiB/s, write-combining: 266 MiB/s, fast-wr/wr-comb: 1000 MiB/s Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 3459M agpgart: Detected AMD 760MP chipset agpgart: AGP aperture is 64M @ 0xec000000 fbtest: Framebuffer bandwidth tester v0.1 fbtest: mapped framebuffer at 0xf8a32000 for 67108864 bytes fbtest: default AGP/MTRR: framebuffer write bandwidth 64 MiB/s fbtest: default AGP/write-combining: framebuffer write bandwidth 266 MiB/s fbtest: Card has AGP capability: 2f000217 00000200 fbtest: Bridge has AGP capability: 0f000217 00000300 fbtest: AGP mode is 4x SidebandAddressing=enabled FastWrites=enabled fbtest: AGP fast-write: framebuffer write bandwidth 63 MiB/s fbtest: AGP fast writes/write-combining: framebuffer write bandwidth 1000 MiB/s fbtest: AGP and MTRR status restored; fbtest done. --- 2.4.20 on a 760MPX board with Radeon 8500DV --- Default: 63 MiB/s, write-combining: 266 MiB/s fbtest: Framebuffer bandwidth tester v0.1 fbtest: mapped framebuffer at 0xf8a0c000 for 67108864 bytes fbtest: default AGP/MTRR: framebuffer write bandwidth 63 MiB/s fbtest: default AGP/write-combining: framebuffer write bandwidth 266 MiB/s fbtest: Card has AGP capability: 2f000207 0f000304 fbtest: Bridge has AGP capability: 0f000217 00000304 fbtest: AGP mode is 4x SidebandAddressing=enabled FastWrites=disabled fbtest: can't enable AGP fast write: card must be Radeon >=8500 and not 8500DV fbtest: AGP and MTRR status restored; fbtest done. --- BROKEN: 2.4.21-139 on S2885ANRF with Radeon X1 --- Default: 19 MiB/s, fast-writes: 25 MiB/s Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 7956M agpgart: Detected AMD 8151 chipset agpgart: AGP aperture is 256M @ 0xe0000000 PCI-DMA: Reserving 128MB of IOMMU area in the AGP aperture ... fbtest: Framebuffer bandwidth tester v0.1 fbtest: mapped framebuffer at 0xffffff0000229000 for 268435456 bytes fbtest: default AGP/MTRR: framebuffer write bandwidth 19 MiB/s mtrr: type mismatch for d0000000,10000000 old: uncachable new: write-combining fbtest: WARNING: failed to set wrtite-combining in MTRR fbtest: Card has AGP capability: ff00021b 00000200 fbtest: Bridge has AGP capability: 1f000b7b 00000200 AGP: Found AGPv3 capable device at 4:0:0 AGP: Found AGPv3 capable device at 5:0:0 AGP: Enough AGPv3 devices found, setting up... AGP: Setting up AGPv3 capable device at 4:0:0 AGP: Setting up AGPv3 capable device at 5:0:0 fbtest: AGP mode is 2x SidebandAddressing=enabled FastWrites=enabled fbtest: AGP fast-write: framebuffer write bandwidth 25 MiB/s mtrr: type mismatch for d0000000,10000000 old: uncachable new: write-combining fbtest: WARNING: failed to set wrtite-combining in MTRR AGP: Found AGPv3 capable device at 4:0:0 AGP: Found AGPv3 capable device at 5:0:0 AGP: Enough AGPv3 devices found, setting up... AGP: Setting up AGPv3 capable device at 4:0:0 AGP: Setting up AGPv3 capable device at 5:0:0 fbtest: AGP and MTRR status restored; fbtest done. Trying to free nonexistent resource <d0000000-dfffffff> $ cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1 --- BROKEN: 2.4.21-139 on S2885ANRF with Radeon 9100 --- Default: 19 MiB/s, fast-writes 25 MiB/s Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 7956M agpgart: Detected AMD 8151 chipset agpgart: AGP aperture is 256M @ 0xe0000000 PCI-DMA: Reserving 128MB of IOMMU area in the AGP aperture ... fbtest: Framebuffer bandwidth tester v0.1 fbtest: mapped framebuffer at 0xffffff0000229000 for 67108864 bytes fbtest: default AGP/MTRR: framebuffer write bandwidth 19 MiB/s mtrr: type mismatch for d0000000,4000000 old: uncachable new: write-combining fbtest: WARNING: failed to set wrtite-combining in MTRR fbtest: Card has AGP capability: 2f000217 00000200 fbtest: Bridge has AGP capability: 1f000b77 00000000 AGP: Found AGPv3 capable device at 4:0:0 AGP: Version 2 AGP device found. AGP: Only 1 devices found, not enough, trying AGPv2 fbtest: AGP mode is 4x SidebandAddressing=enabled FastWrites=enabled fbtest: AGP fast-write: framebuffer write bandwidth 25 MiB/s mtrr: type mismatch for d0000000,4000000 old: uncachable new: write-combining fbtest: WARNING: failed to set wrtite-combining in MTRR AGP: Found AGPv3 capable device at 4:0:0 AGP: Version 2 AGP device found. AGP: Only 1 devices found, not enough, trying AGPv2 fbtest: AGP and MTRR status restored; fbtest done. --- BROKEN: 2.4.21-139 on S2885ANRF with Radeon 9100 and hacked MTRR entries --- (where here I have deleted MTRR entry 0 and replaced it with several others that similarly cover all of that 1GB _except_ the framebuffer, so that I can give it a write-combining entry) Linux agpgart interface v0.99 (c) Jeff Hartmann agpgart: Maximum main memory to use for agp memory: 7956M agpgart: Detected AMD 8151 chipset agpgart: AGP aperture is 256M @ 0xe0000000 PCI-DMA: Reserving 128MB of IOMMU area in the AGP aperture ... fbtest: Framebuffer bandwidth tester v0.1 fbtest: mapped framebuffer at 0xffffff0000229000 for 67108864 bytes fbtest: default AGP/MTRR: framebuffer write bandwidth 25 MiB/s fbtest: default AGP/write-combining: framebuffer write bandwidth 13 MiB/s fbtest: Card has AGP capability: 2f000217 1f000314 fbtest: Bridge has AGP capability: 1f000b77 00000b34 AGP: Found AGPv3 capable device at 4:0:0 AGP: Version 2 AGP device found. AGP: Only 1 devices found, not enough, trying AGPv2 fbtest: AGP mode is 4x SidebandAddressing=enabled FastWrites=enabled fbtest: AGP fast-write: framebuffer write bandwidth 13 MiB/s fbtest: AGP fast writes/write-combining: framebuffer write bandwidth 13 MiB/s AGP: Found AGPv3 capable device at 4:0:0 AGP: Version 2 AGP device found. AGP: Only 1 devices found, not enough, trying AGPv2 fbtest: AGP and MTRR status restored; fbtest done.
Hello John, we have probs with a S2885 too. Instead of 8GB (installed) we get with "free -m" just 6986MB total RAM instead of about 8000MB. With 4GB installed it quotes 3GB. Always there is 1GB missing (BIOS V1.01, Suse 8.2 beta9 X64). Tyan is working on a BIOS Patch but they are waiting for a fix from AMI (BIOS Manufacturer). cat /proc/mtrr gives: reg00: base=0xc0000000 (3072MB), size=1024MB uncacheable, count=1 John McCorquodale wrote:
Andreas and other interested folks,
I just grabbed the 2.4.21-139 kernel, built it and installed it on my Tyan S2885ANRF. It now finds the 8151 AGP controller correctly, and seems to correctly identify both AGP110 (ATI X1) and older AGP v2 cards, go into 4x/8x modes fine and the like.
However, if you take video-card-of-your-choice, map its framebuffer and set the AGP bridge for fast writes, the write bandwidth to the framebuffer is BAD. About a factor of 40 slower than it should be. A 1x card writes at about 18MB/s, a 4x card at 25 and an 8x card at 48MB/s. These numbers should be 256MB/s, 1GB/s and 2GB/s. Under no circumstances (1x, fast writes off) should the bandwidth be less than 256MB/s.
I've tired this on lots of 760MPX machines and get the performance I expect, but I am at a loss as to what could possibly be causing this slowness.
I notice that the Tyan board sets up the MTRR for the whole PCI physical address space to be uncachable:
linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1
In previous framebuffer drivers, I have set the MTRR for the frame buffer to be write combining, which does have an effect on performance ('tho I've never known the effect to be a factor of 40!). I'd like to do the same thing on the Tyan board, but unsetting the c0000000-ffffffff uncacheable MTRR, or fragmenting it so that everything in that region EXCEPT the framebuffer is uncacheable both quickly lead to system crasshes (why? how could caching the framebuffer hang the system in text mode (corrupt screen contents, sure)? is this a race while the region is uncacheable?)
It seems there's something strange going on. Could this be some kind of IOMMU interaction? The AGP aperture ends up at e0000000, which I cover with an uncacheable MTRR entry when I try the fragments, so I don't THINK I'm hurting anything.
Any ideas why it could possibly be so slow or what I might do to get it back up to reasonable write speeds? Hard to do video when you can only get 2 frames/sec into the card's RAM.
Thanks!
-mcq
-- Mit freundlichsten Grüßen / With best regards ---- PS: Ask for our High Performance Computing 32/64bit Multi-CPU AMD OPTERON Systems! ---- ________________________________________________________________ Mike D. Frenz (Dipl.-Phys.) Mail: MikeRoHard Computersysteme -Geschäftsleitung- Kärntner Weg 6 MikeRoHard Computersysteme D-79111 Freiburg High Performance Computing GERMANY Tel. +49 (0)761 - 888 66 50 mailto: mike.frenz@mikerohard.de Fax +49 (0)761 - 888 66 52 Website:http://www.mikerohard.de ________________________________________________________________
In a previous message, I mentioned the strange MTRR setup (which seems to be coming from BIOS?) on the Tyan S2885:
linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1
If I erase reg00 and replace it with three others: 0xc0000000 size=256MB 0xd8000000 size=128MB 0xe0000000 size=512MB all uncachable, I've effectively removed the MTRR from the PCI physical addresses on which my video card's framebuffer memory sits. I can then add a write-combining MTRR: 0xd0000000 size=128MB And things don't speed up. No surprise, since something is broken. The surprise is that my system proceeds to quickly crash! Why on earth would this blow my system away? There shouldn't be anything at d0000000-d7ffffff except framebuffer memory, which might garbage my screen (I'm in text mode) but shouldn't cause spewage indicating vast memory corruption followed quickly by a crash...should it? Am I missing something important about How It All Works? Thanks, -mcq
On Mon, Nov 03, 2003 at 02:44:09PM -0800, John McCorquodale wrote:
In a previous message, I mentioned the strange MTRR setup (which seems to be coming from BIOS?) on the Tyan S2885:
linux:~ # cat /proc/mtrr reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1 reg01: base=0xbe000000 (3040MB), size= 32MB: uncachable, count=1
If I erase reg00 and replace it with three others:
0xc0000000 size=256MB 0xd8000000 size=128MB 0xe0000000 size=512MB
all uncachable, I've effectively removed the MTRR from the PCI physical addresses on which my video card's framebuffer memory sits. I can then add a write-combining MTRR:
0xd0000000 size=128MB
And things don't speed up. No surprise, since something is broken. The surprise is that my system proceeds to quickly crash! Why on earth would this blow my system away? There shouldn't be anything at d0000000-d7ffffff except framebuffer memory, which might garbage my screen (I'm in text mode) but shouldn't cause spewage indicating vast memory corruption followed quickly by a crash...should it?
Most likely the aperture is there and you are messing with the IOMMU mappings. Does it go away when you boot with iommu=memaper=2 (this will make your system lose some memory) Overall it sounds more like hardware/BIOS issues than a Linux problem. I would recommend to contact Tyan. You could also try a different video card. -Andi
participants (4)
-
Andi Kleen
-
Andreas Jaeger
-
John McCorquodale
-
Mike D. Frenz