Mailinglist Archive: radeonhd (307 mails)

< Previous Next >
[radeonhd] [PATCH] r6xx/r7xx overlapping EXA copy optimization
  • From: Yang Zhao <yang@xxxxxxxxxx>
  • Date: Fri, 6 Feb 2009 00:55:09 -0800
  • Message-id: <40a7b1aa0902060055l2c5c8700m1d80f1851a9da285@xxxxxxxxxxxxxx>
I poked around in R600OverlapCopy() and noticed that in the case where
the offset between src and dst is horizontal, the simple all-at-once
copy seems to work correctly.

For the case where the offset is only vertical, I've modified the
line-by-line copy to instead copy in chunks equal to height of the
non-overlapping area.

The patch has only been tested on a RV770, so I will claim nothing
regarding other GPUs. At least for this particular machine, EXA is now
snappy and corruption free.

--
Yang Zhao
http://yangman.ca
From 32be7c13b5b1da4c08070751dd9ef8d218c8a832 Mon Sep 17 00:00:00 2001
From: Yang Zhao <yang@xxxxxxxxxx>
Date: Fri, 6 Feb 2009 00:31:39 -0800
Subject: [PATCH] r6xx/r7xx EXA: Optimize overlapping copy

When source and destination blocks are only offset horizontally, it
appears to be unnecessary to perform careful, segment-by-segment copy.
The code path that does this is taken out completely.

For the case where offset is only vertical, copying is now done by
height of the non-overlapping area each time, instead of always
line-by-line.
---
src/r600_exa.c | 79 ++++++++++++++++++++-----------------------------------
1 files changed, 29 insertions(+), 50 deletions(-)

diff --git a/src/r600_exa.c b/src/r600_exa.c
index 7d7c80d..df17704 100644
--- a/src/r600_exa.c
+++ b/src/r600_exa.c
@@ -719,58 +719,37 @@ R600OverlapCopy(PixmapPtr pDst,
struct r6xx_accel_state *accel_state = rhdPtr->TwoDPrivate;
uint32_t dst_pitch = exaGetPixmapPitch(pDst) /
(pDst->drawable.bitsPerPixel / 8);
uint32_t dst_offset = exaGetPixmapOffset(pDst) + rhdPtr->FbIntAddress +
rhdPtr->FbScanoutStart;
- int i;
+ int i, chunk;

if (is_overlap(srcX, srcX + w, srcY, srcY + h,
- dstX, dstX + w, dstY, dstY + h)) {
- if (srcY == dstY) { // left/right
- if (srcX < dstX) { // right
- // copy right to left
- for (i = w; i > 0; i--) {
- R600DoPrepareCopy(pScrn,
- dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
- dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
- accel_state->rop, accel_state->planemask);
- R600AppendCopyVertex(pScrn, srcX + i - 1, srcY, dstX + i -
1, dstY, 1, h);
- R600DoCopy(pScrn);
- }
- } else { //left
- // copy left to right
- for (i = 0; i < w; i++) {
- R600DoPrepareCopy(pScrn,
- dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
- dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
- accel_state->rop, accel_state->planemask);
-
- R600AppendCopyVertex(pScrn, srcX + i, srcY, dstX + i, dstY,
1, h);
- R600DoCopy(pScrn);
- }
- }
- } else { //up/down
- if (srcY > dstY) { // up
- // copy top to bottom
- for (i = 0; i < h; i++) {
- R600DoPrepareCopy(pScrn,
- dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
- dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
- accel_state->rop, accel_state->planemask);
-
- R600AppendCopyVertex(pScrn, srcX, srcY + i, dstX, dstY + i,
w, 1);
- R600DoCopy(pScrn);
- }
- } else { // down
- // copy bottom to top
- for (i = h; i > 0; i--) {
- R600DoPrepareCopy(pScrn,
- dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
- dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
- accel_state->rop, accel_state->planemask);
-
- R600AppendCopyVertex(pScrn, srcX, srcY + i - 1, dstX, dstY
+ i - 1, w, 1);
- R600DoCopy(pScrn);
- }
- }
- }
+ dstX, dstX + w, dstY, dstY + h) && (srcY != dstY)) {
+ if (srcY > dstY) { // up
+ // copy top to bottom
+ chunk = srcY - dstY;
+ for (i = 0; i < h; i += chunk) {
+ if (chunk > h - i) chunk = h - i;
+ R600DoPrepareCopy(pScrn,
+ dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
+ dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
+ accel_state->rop, accel_state->planemask);
+
+ R600AppendCopyVertex(pScrn, srcX, srcY + i, dstX, dstY + i, w,
chunk);
+ R600DoCopy(pScrn);
+ }
+ } else { // down
+ // copy bottom to top
+ chunk = dstY - srcY;
+ for (i = h; i > 0; i -= chunk) {
+ if (chunk > i) chunk = i;
+ R600DoPrepareCopy(pScrn,
+ dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
+ dst_pitch, pDst->drawable.height,
dst_offset, pDst->drawable.bitsPerPixel,
+ accel_state->rop, accel_state->planemask);
+
+ R600AppendCopyVertex(pScrn, srcX, srcY + i - chunk, dstX, dstY
+ i - chunk, w, chunk);
+ R600DoCopy(pScrn);
+ }
+ }
} else {
R600DoPrepareCopy(pScrn,
dst_pitch, pDst->drawable.width,
pDst->drawable.height, dst_offset, pDst->drawable.bitsPerPixel,
--
1.6.0.6

< Previous Next >
Follow Ups