Re: [suse-amd64] Installation of SUSE 9.0 AMD 64 Version on latest motherboards
Did some testing for you. Running dga kills the driver, no two ways about it. It doesn't, however, kill the machine. I can ssh in from another machine, kill off X and restart X without any difficulties at all. All you see in the screen are lines as though the display is out of sync. I tried a few different things, dropping the color depth to 16 bit, turning off the nvidia logo and it didn't make any difference.
Yeah, this is exactly the same behavior I see on ATI cards as well. It seems to be something about that board. I'm still working on something that'll dump out the northbridge, 8131 and 8151 configuration registers so we can see how BIOS is setting things up and figure out what's going on. Whatever is causing the dga hang is probably related, but it's worth noting that my framebuffer code doesn't hang, it just becomes very slow. I haven't debugged into the X server yet to track the dga hang; it doesn't seem like a path likely to lead to a solution to my actual problem (bad agp throughput).
BTW, how did you get the figures you quoted for the AGP throughput ? (It may have been using dga - but not an option here, even trying dga > dga_test and then monitoring the file from the remote machine showing nothing at all - definitely hung X.)
I wrote a kernel module that maps the framebuffer, sets up write combining and sets AGP fast writes, then does a tight-loop microbenchmark. It's still available at: http://viz.cacr.caltech.edu/dl/fbtest-0.1.tar.bz2 If you're interested enough to grab it and try it, let me know! I'd love to hear that somebody has replicated my experiment. It's also worth noting that if Andi is correct the 8151 is behind the 8131, which means it's sitting on an 8-bit hypertransport link. There's an 8151 erratum where it hangs running HT at 800 MHz, so all the HT links in the S2885 are clocked at 600 MHz (except the Opteron-to-Opteron one, which runs at 800). What this means is that the maximum theoretical b/w to the AGP chip is 600*2 = 1.2GB/s. Subtracting 20% for PCI address transmission overhead, that leaves 960 MB/s theoretical maximum one should ever see to the 8151 on the S2885. That's less than AGP 4x peak b/w, so best case the board should only be thought of as AGP 3.5x, not the 8x that the AGP connector happens to signal at. I'd be totally happy if I could get even close to that out of the board, but at the moment the max I've been able to get is 270 MB/s on a AGP v2 card and about 320 MB/s on an AGP v3 card. This is with 2GB RAM installed and the Tyan v1.01 BIOS. If you install 8GB of RAM, your AGP performance will drop to about 10 MB/s and write-combining will fail because of some broken way BIOS sets things up (which I haven't finished figuring out yet; many hangs lie down that road). Based on favorable response from Tyan, I think I can get the 8GB thing fixed once I figure out how it is and how it ought to be. I still don't understand how AGP performance can be the ~ 300MB/s that it is, and figuring that out is a separate question. Feel free to volunteer crazy ideas. I'm only just getting into this whole chip configuration register thing, so everybody should continue telling me things I ought to already know, 'cause I probably don't. :) -mcq
On Fri, Nov 07, 2003 at 10:02:01PM -0800, John McCorquodale wrote:
Did some testing for you. Running dga kills the driver, no two ways about it. It doesn't, however, kill the machine. I can ssh in from another machine, kill off X and restart X without any difficulties at all. All you see in the screen are lines as though the display is out of sync. I tried a few different things, dropping the color depth to 16 bit, turning off the nvidia logo and it didn't make any difference.
Yeah, this is exactly the same behavior I see on ATI cards as well. It seems to be something about that board. I'm still working on something that'll dump out the northbridge, 8131 and 8151 configuration registers so we can see
You can just dump them with lspci -vxxx as root.
BTW, how did you get the figures you quoted for the AGP throughput ? (It may have been using dga - but not an option here, even trying dga > dga_test and then monitoring the file from the remote machine showing nothing at all - definitely hung X.)
I wrote a kernel module that maps the framebuffer, sets up write combining and sets AGP fast writes, then does a tight-loop microbenchmark. It's still available at:
We usually used testgart for this, which just benchmarks the AGP aperture as seen by the CPU. We even have a package for it, but I am not sure it is on the 9.0 DVD. If not google should be able to find it. testgart sets the needed WC MTRR by itself.
It's also worth noting that if Andi is correct the 8151 is behind the 8131, which means it's sitting on an 8-bit hypertransport link. There's an 8151 erratum where it hangs running HT at 800 MHz, so all the HT links in the S2885 are clocked at 600 MHz (except the Opteron-to-Opteron one, which
Actually it is the 8131 that is limited to 600Mt links, 8151 should be fine.
I'd be totally happy if I could get even close to that out of the board, but at the moment the max I've been able to get is 270 MB/s on a AGP v2 card and about 320 MB/s on an AGP v3 card. This is with 2GB RAM installed and the Tyan v1.01 BIOS. If you install 8GB of RAM, your AGP performance will drop to about 10 MB/s and write-combining will fail because of some broken way BIOS sets things up (which I haven't finished figuring out yet; many hangs lie down that road).
It could be still some memory attribute issue. If you have no write-combining performance will be very bad on anything AGP. -Andi
I'm still working on something that'll dump out the northbridge, 8131 and 8151 configuration registers
You can just dump them with lspci -vxxx as root.
Yeah, I'm writing something that'll decode the register data into human- readable form so I can learn something from it (a bunch of hex is just not so handy). I'm doing it as a fake kernel module again rather than a user-space thing just because I find that easier and can turn off interrupts if I want to do a benchmark.
I wrote a kernel module that maps the framebuffer, sets up write combining and sets AGP fast writes, then does a tight-loop microbenchmark. It's still available at:
We usually used testgart for this, which just benchmarks the AGP aperture as seen by the CPU. We even have a package for it, but I am not sure it is on the 9.0 DVD. If not google should be able to find it. testgart sets the needed WC MTRR by itself.
testgart measures write performance through the GART into AGP memory (which is located on the main memory modules of the machine). The bandwidth I am interested in is the bandwidth from the machine to the AGP card across the AGP connector, which is not what testgart measures. This can be measured as the bandwidth of a DMA from AGP memory to the card (again translated through the gart), or similarly by mapping the card's framebuffer through PCI space and doing programmed writes (AGP fast writes) from the CPU. These numbers should be comparable. The bandwidth through the AGP connector is the one that affects performance; the bandwidth measured by testgart is not particularly useful to know, other than, as the name says, to be a test that the gart itself does address translation as advertised. The kernel module I referred to earlier measures fast write bandwidth (which is unusually bad on the S2885). The testgart program mesaures b/w of GART- redirected writes to main memory, which behaves well on the S2885 (2GB/s). Adding AGP DMA code to my framebuffer driver is not a hurdle I wanted to tackle right now, but may just be what I need to do to verify that the bandwidth observed doing that is the same as the bandwidth of programmed fast writes. If ATI (or nvidia for that matter) would publically release specs for their cards this would be easy, but as it is I have to recover the documentation by reading the X DRI drivers (yuck).
Actually it is the 8131 that is limited to 600Mt links, 8151 should be fine.
It could be both. It's at least the 8151. See AMD pub 25912, the AMD-8151 HyperTransport AGP3.0 Graphics Tunnel Revision Guide, page 9: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/2591... The motherboard design guides appear to be available only under NDA, so I don't know if Tyan obeyed the recommended physics hacks in the board design. If you are correct and they have to run the 8131 at 600MHz anyway, there'd be little point in doing the recommended physics hacks.
It could be still some memory attribute issue. If you have no write-combining performance will be very bad on anything AGP.
If I pull out 6GB of RAM (leaving me with 2GB), then BIOS doesn't set up the weird MTRR and I can successfully turn on write combining. Doing this, my AGP v2 4x fast-write b/w jumps to 270MB/s. But that should still be 1000 MB/s, so something is still quite wrong (I need >400MB/s for my application). Where did you get the idea that the 8151 is behind the 8131? I still don't have my HT walker the point where it can dump the hypertransport graph, and I'm curious to poke around wherever you got that idea for inspiration. Thanks, -mcq
participants (2)
-
Andi Kleen
-
John McCorquodale