Hi, I am reading the code for the driver, just to get some idea how this thing works. But I have a question about the r600_exa.c copy code. This function uses the DI_PT_RECTLIST primitive. I don't think I understand how the overlapping copy code works. I think that if you can't alter the direction in which the address is incremented you would need an additional buffer to make this work correctly. The register documentation gives another primitive type called DI_PT_2D_COPY_RECT_LIST_V[0-3]. Obviously I have no idea what this does, but I can guess that these four primitives are copy functions using four different incrementing orders. Maybe someone with the documentation can answer this question? regards, Mark van Doesburg. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
On Mon, Feb 9, 2009 at 4:46 PM, Mark van Doesburg
Hi,
I am reading the code for the driver, just to get some idea how this thing works. But I have a question about the r600_exa.c copy code. This function uses the DI_PT_RECTLIST primitive. I don't think I understand how the overlapping copy code works. I think that if you can't alter the direction in which the address is incremented you would need an additional buffer to make this work correctly.
The current code works breaking the overlapping region down into regions the size of the non-overlapping part and copying them over one by one. That way you never have to worry about raster direction since you are always copying a non-overlapping part.
The register documentation gives another primitive type called DI_PT_2D_COPY_RECT_LIST_V[0-3]. Obviously I have no idea what this does, but I can guess that these four primitives are copy functions using four different incrementing orders.
Maybe someone with the documentation can answer this question?
Those primitives were designed for that purpose, but they don't work like the other primitives. The VGT and vertex setup setup is completely different. They were implemented for the 2D emulation in the CP microcode on r6xx chips. They aren't really supposed to be usable by software according to the hw folks. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
The current code works breaking the overlapping region down into regions the size of the non-overlapping part and copying them over one by one. That way you never have to worry about raster direction since you are always copying a non-overlapping part. That's what the code said, I just didn't want to believe that was actually happening. So dragging a 1000x1000 window by one pixel diagionally results in a 1000000 operations, no wonder my Matrox G400 felt faster. (And the cpu load reaches 95% when dragging a window) Those primitives were designed for that purpose, but they don't work like the other primitives. The VGT and vertex setup setup is completely different. They were implemented for the 2D emulation in the CP microcode on r6xx chips. They aren't really supposed to be usable by software according to the hw folks. Alex So there was supposed to be a better solution, but it doesn't work, too bad. Sorry for wasting your time, I guess I'll have to wait until the rest of the documenation is available. Thanks for the info (and the driver of course). Mark. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
On Mon, Feb 9, 2009 at 6:01 PM, Mark van Doesburg
The current code works breaking the overlapping region down into regions the size of the non-overlapping part and copying them over one by one. That way you never have to worry about raster direction since you are always copying a non-overlapping part.
That's what the code said, I just didn't want to believe that was actually happening. So dragging a 1000x1000 window by one pixel diagionally results in a 1000000 operations, no wonder my Matrox G400 felt faster. (And the cpu load reaches 95% when dragging a window)
if you move it one pixel, but more often than not, the region is bigger than one pixel.
Those primitives were designed for that purpose, but they don't work like the other primitives. The VGT and vertex setup setup is completely different. They were implemented for the 2D emulation in the CP microcode on r6xx chips. They aren't really supposed to be usable by software according to the hw folks.
Alex
So there was supposed to be a better solution, but it doesn't work, too bad. Sorry for wasting your time, I guess I'll have to wait until the rest of the documenation is available.
Thanks for the info (and the driver of course).
That solution works if you use the CP 2D emulation, but it's only available on r6xx chips so we can't use it for r7xx. With more and more movement towards composited desktops this becomes less of a issue. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
2009/2/9 Mark van Doesburg
The current code works breaking the overlapping region down into regions the size of the non-overlapping part and copying them over one by one. That way you never have to worry about raster direction since you are always copying a non-overlapping part.
That's what the code said, I just didn't want to believe that was actually happening. So dragging a 1000x1000 window by one pixel diagionally results in a 1000000 operations, no wonder my Matrox G400 felt faster. (And the cpu load reaches 95% when dragging a window)
Actually, that would be exactly 1000 + 1 distinct copy operations. The algorithm is O(w), where w is the width of the area being copied. It can actually be optimized further by checking whether horizontal or vertical segmentation is faster. That said, there's no way to make it more efficient than a hardware implementation. -- Yang Zhao http://yangman.ca -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
Actually, that would be exactly 1000 + 1 distinct copy operations. The algorithm is O(w), where w is the width of the area being copied. It can actually be optimized further by checking whether horizontal or vertical segmentation is faster. Oops, I didn't get that part of the code. That said, there's no way to make it more efficient than a hardware implementation. It's quite usable this way, it's not as if everybody is dragging windows around all the time. But it is kind of sad my previous (9 year old) video card was faster. Having a scratch buffer, the size of the frame buffer would make things much simpler and reduce the amount of commands the CPU would have to generate. This would waste up to 16MB of memory, and double the amount of memory bandwidth required for an overlapping copy. But I wonder if it might still be faster. Mark. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
On Feb 10, 09 09:09:41 +0100, Mark van Doesburg wrote:
Having a scratch buffer, the size of the frame buffer would make things much simpler and reduce the amount of commands the CPU would have to generate. This would waste up to 16MB of memory, and double the amount of memory bandwidth required for an overlapping copy. But I wonder if it might still be faster.
It certainly would, because you actually have to wait for the previous
1-pixel-wide copy to finish in order to continue...
That, and cache line issues kill you here.
It will probably even be worse with tiled framebuffers.
CU
Matthias
--
Matthias Hopf
Matthias Hopf
On Tue, Feb 10, 2009 at 2:13 PM, Mark van Doesburg
Matthias Hopf
wrote: It certainly would, because you actually have to wait for the previous 1-pixel-wide copy to finish in order to continue...
That's not what I had in mind. To copy an overlaping area you simply copy the entire area to off-screen video-memory. Then copy it from that off-screen memory to the final destination in the frame buffer.
I tried that, but got very strange behavior that I can't explain. If you were to say move a window to the right, the desktop background would fill in in the exposed area, but the the window contents wouldn't show up in the newly covered area. The same thing happened to the temp copy. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
Alex Deucher
On Feb 10, 09 14:24:44 -0500, Alex Deucher wrote:
That's not what I had in mind. To copy an overlaping area you simply copy the entire area to off-screen video-memory. Then copy it from that off-screen memory to the final destination in the frame buffer.
Yes, and I explained why this would be beneficial compared to the current solution.
I tried that, but got very strange behavior that I can't explain. If you were to say move a window to the right, the desktop background would fill in in the exposed area, but the the window contents wouldn't show up in the newly covered area. The same thing happened to the temp copy.
Sounds like either memory management issues, or you didn't let the
engine wait for the destination caches to be flushed. And/or didn't
flush the source caches, on each copy pass.
Matthias
--
Matthias Hopf
On Wed, Feb 11, 2009 at 7:03 AM, Matthias Hopf
On Feb 10, 09 14:24:44 -0500, Alex Deucher wrote:
That's not what I had in mind. To copy an overlaping area you simply copy the entire area to off-screen video-memory. Then copy it from that off-screen memory to the final destination in the frame buffer.
Yes, and I explained why this would be beneficial compared to the current solution.
I tried that, but got very strange behavior that I can't explain. If you were to say move a window to the right, the desktop background would fill in in the exposed area, but the the window contents wouldn't show up in the newly covered area. The same thing happened to the temp copy.
Sounds like either memory management issues, or you didn't let the engine wait for the destination caches to be flushed. And/or didn't flush the source caches, on each copy pass.
That's what I thought as well, but the caches are being flushed (they are queued in the command stream at least). I even tried waiting for the engine to idle between each copy. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org
Matthias Hopf
participants (4)
-
Alex Deucher
-
Mark van Doesburg
-
Matthias Hopf
-
Yang Zhao