xf86-video-radeonhd:master: 1 commit(s)

older
Re: [radeonhd] severe performance...

mhopf＠kemper.freedesktop.org

8 Sep 2009 8 Sep '09

13:33

Reporting from xf86-video-radeonhd: Radeon HD video driver Please visit: http://gitweb.freedesktop.org/?p=xorg/driver/xf86-video-radeonhd Or get your own copy by using: git-clone git://anongit.freedesktop.org/xorg/driver/xf86-video-radeonhd Commit against master at b00c0a75...: commit d8329927aaac5f2d4949785951326ebc782bc420 Author: Marc Dietrich Date: Tue Sep 8 15:32:38 2009 +0200 Fix softlocks on rs690. Idle commands have to be flushed to be of any use. diffstat: src/r5xx_exa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) gitweb url: http://gitweb.freedesktop.org/?p=xorg/driver/xf86-video-radeonhd;a=commit;h=... -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Show replies by date

Matthias Hopf

8 Sep 8 Sep

14:54

On Sep 08, 09 06:33:52 -0700, Matthias Hopf wrote:

...

Commit against master at b00c0a75...: commit d8329927aaac5f2d4949785951326ebc782bc420 Author: Marc Dietrich Date: Tue Sep 8 15:32:38 2009 +0200

Fix softlocks on rs690. Idle commands have to be flushed to be of any use.

Note that this is probably only the tip of an iceberg - I fixed a few others, hopefully this is working out correctly. If anybody is interested to check for further places where the flush() is at the wrong place - feel free to send a patch. Matthias -- Matthias Hopf __ __ __ Maxfeldstr. 5 / 90409 Nuernberg (_ | | (_ |__ mat@mshopf.de Phone +49-911-74053-715 __) |_| __) |__ R & D www.mshopf.de -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Luc Verhaegen

15:00

On Tue, Sep 08, 2009 at 06:33:52AM -0700, Matthias Hopf wrote:

...

Commit against master at b00c0a75...: commit d8329927aaac5f2d4949785951326ebc782bc420 Author: Marc Dietrich Date: Tue Sep 8 15:32:38 2009 +0200

Fix softlocks on rs690. Idle commands have to be flushed to be of any use.

diffstat: src/r5xx_exa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Hoping that this would finally fix the longstanding bug that was (of course) on my plate until i got laid off, i looked into this code a bit deeper. What this patch does, in one case, is the following: - RHDCSFlush(CS); RHDCSIdle(CS); R5xx2DIdle(pScrn); + RHDCSFlush(CS); When one looks at this logically, it clearly is not anything one would want. We need the CS queue to be empty here. We therefor flush the queue, wait for it to go idle and then also wait for the 2d engine to be fully idle afterwards. Waiting for CS to be idle and then waiting for the 2d engine to go idle, and only then giving CS and the 2d engine something to do is rather wrong. Please check out the code behind the above calls to verify this. Now, if this "seems" to fix the issue, then we might be barking up the right tree here. It seems to be locked into the drm call DRM_RADEON_CP_IDLE. I have noticed some things about this call in the past and this is the comment that made it into the DRMCPIdle function in rhd_cs.c: /* The DRM CP IDLE call does quite a lot more than just wait for the CP * going idle. It waits on the RBBM as well. This number needs to be huge, * as the DRM cluelessly uses loops instead of usecs, and this is therefor * decreasing rapidly with CPU advancement. */ Translation: Fast CPU, slow 2d engine, the loop inside the DRM driver, as even though it is marked as "number of usecs" it is a plain counter based loop, times out way too quickly. So wrong fix, but there is definite progress on nailing this down. Luc Verhaegen. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Alex Deucher

15:41

On Tue, Sep 8, 2009 at 11:00 AM, Luc Verhaegen wrote:

...

On Tue, Sep 08, 2009 at 06:33:52AM -0700, Matthias Hopf wrote:

...
Commit against master at b00c0a75...: commit d8329927aaac5f2d4949785951326ebc782bc420 Author: Marc Dietrich Date: Tue Sep 8 15:32:38 2009 +0200

Fix softlocks on rs690. Idle commands have to be flushed to be of any use.

diffstat: src/r5xx_exa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

Hoping that this would finally fix the longstanding bug that was (of course) on my plate until i got laid off, i looked into this code a bit deeper.

What this patch does, in one case, is the following:

- RHDCSFlush(CS); RHDCSIdle(CS); R5xx2DIdle(pScrn); + RHDCSFlush(CS);

When one looks at this logically, it clearly is not anything one would want.

We need the CS queue to be empty here. We therefor flush the queue, wait for it to go idle and then also wait for the 2d engine to be fully idle afterwards.

Waiting for CS to be idle and then waiting for the 2d engine to go idle, and only then giving CS and the 2d engine something to do is rather wrong. Please check out the code behind the above calls to verify this.

Now, if this "seems" to fix the issue, then we might be barking up the right tree here.

It seems to be locked into the drm call DRM_RADEON_CP_IDLE. I have noticed some things about this call in the past and this is the comment that made it into the DRMCPIdle function in rhd_cs.c:

/* The DRM CP IDLE call does quite a lot more than just wait for the CP * going idle. It waits on the RBBM as well. This number needs to be huge, * as the DRM cluelessly uses loops instead of usecs, and this is therefor * decreasing rapidly with CPU advancement. */

Translation: Fast CPU, slow 2d engine, the loop inside the DRM driver, as even though it is marked as "number of usecs" it is a plain counter based loop, times out way too quickly.

It's not just a simple counter. The code in the drm calls udelay for each iteration of the loop. From radeon_cp.c: for (i = 0; i < dev_priv->usec_timeout; i++) { if (!(RADEON_READ(RADEON_RBBM_STATUS) & RADEON_RBBM_ACTIVE)) { radeon_do_pixcache_flush(dev_priv); return 0; } DRM_UDELAY(1); } Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Luc Verhaegen

9 Sep 9 Sep

11:39

On Tue, Sep 08, 2009 at 11:41:02AM -0400, Alex Deucher wrote:

...

It's not just a simple counter. The code in the drm calls udelay for each iteration of the loop. From radeon_cp.c:

for (i = 0; i < dev_priv->usec_timeout; i++) { if (!(RADEON_READ(RADEON_RBBM_STATUS) & RADEON_RBBM_ACTIVE)) { radeon_do_pixcache_flush(dev_priv); return 0; } DRM_UDELAY(1); }

Alex

Alex, It seems that you are right, there does seem to have always been a DRM_UDELAY there. I wonder how i missed this then. I had to massively bump the usec_delay in rhd_dri.c (up to the limit) and then noticed that this still wasn't enough, and therefor massively increased the counterlimit in DRMCPIdle in rhd_cs.c for it to succeed and to not drop into an engine reset. Now, what i also remember noticing then is that the code behind DRM_RADEON_CP_IDLE did a whole lot more than just checking whether the CP was idle. The bit you pasted here is a clear example of that. We are talking about the DRM_RADEON_CP_IDLE ioctl, but you paste code that deals solely with the RB, a completely different part of the engine, one that gets just some of the register writes that the CP outputs queued into it. And one that might become empty when the CP itself is still chewing on P3 commands. Actually, looking over the same code again now, i am unable to find anything where we wait for the actual CP to go idle. We put some cache flushing commands and such in the the cp ringbuffer and then flush the ringbuffer out. And then we go and wait for the RBBM to go idle. We never wait for the RPTR to catch up with the WPTR, we never wait for the CP to claim that it is idle. All we do is wait for the RBBM to become idle, and the thing is, it might already become idle while the CP is still busy working off something else. So with that knowledge, it now seems quite clear that the DRM_RADEON_CP_IDLE command is broken. But. It has been this kind of broken for ages, and its behaviour is probably depended upon in many places. If this is to be put straight, then another ioctl has to be created to fix this. Luc Verhaegen. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Alex Deucher

14:48

On Wed, Sep 9, 2009 at 7:39 AM, Luc Verhaegen wrote:

...

On Tue, Sep 08, 2009 at 11:41:02AM -0400, Alex Deucher wrote:

...
It's not just a simple counter. The code in the drm calls udelay for each iteration of the loop. From radeon_cp.c:

for (i = 0; i < dev_priv->usec_timeout; i++) { if (!(RADEON_READ(RADEON_RBBM_STATUS) & RADEON_RBBM_ACTIVE)) { radeon_do_pixcache_flush(dev_priv); return 0; } DRM_UDELAY(1); }

Alex

Alex,

It seems that you are right, there does seem to have always been a DRM_UDELAY there. I wonder how i missed this then.

I had to massively bump the usec_delay in rhd_dri.c (up to the limit) and then noticed that this still wasn't enough, and therefor massively increased the counterlimit in DRMCPIdle in rhd_cs.c for it to succeed and to not drop into an engine reset.

Now, what i also remember noticing then is that the code behind DRM_RADEON_CP_IDLE did a whole lot more than just checking whether the CP was idle.

Perhaps the naming is a bit too exact (it's a bit more than just CP idle, but that's what we want). It waits for the command fifo to drain and then waits for the entire GUI engine to idle (CP, 2D, and 3D). Bit 31 of RBMM_STATUS, will be set if: 2D is busy or 3D is busy or Command fifo is not empty or CP is busy or CSQ is not empty or Ring buffer is not empty You can also poll other bits of that reg for the status of individual engine blocks.

...

The bit you pasted here is a clear example of that. We are talking about the DRM_RADEON_CP_IDLE ioctl, but you paste code that deals solely with the RB, a completely different part of the engine, one that gets just some of the register writes that the CP outputs queued into it. And one that might become empty when the CP itself is still chewing on P3 commands.

This is not a different part of the engine, this is what you want to poll to get the busy status. I pasted the code that actually polls to wait for the engines (CP, 2D, and 3D) to be idle. The function used in the drm directly as well as in the CP_IDLE ioctl.

...

Actually, looking over the same code again now, i am unable to find anything where we wait for the actual CP to go idle. We put some cache flushing commands and such in the the cp ringbuffer and then flush the ringbuffer out. And then we go and wait for the RBBM to go idle.

We never wait for the RPTR to catch up with the WPTR, we never wait for the CP to claim that it is idle. All we do is wait for the RBBM to become idle, and the thing is, it might already become idle while the CP is still busy working off something else.

We do wait for the CP to idle. See above. We could probably add better hang detection by checking whether the RTPR is actually progressing when we time-out waiting for the fifo or busy bit to clear. Unfortunately, if one of the engine blocks has hung, the CP may still be fetching stuff until it all falls over.

...

So with that knowledge, it now seems quite clear that the DRM_RADEON_CP_IDLE command is broken. But. It has been this kind of broken for ages, and its behaviour is probably depended upon in many places. If this is to be put straight, then another ioctl has to be created to fix this.

The ioctl is not broken although perhaps we need some better logic for detecting whether the GPU has actually hung. However, in most cases it has hung if the ioctl fails. Generally this is caused by a bad command stream or combination of command streams. Unfortunately, sorting that out is hard since the command that actually hung the chip may not be the one currently being processed. The new kms-enabled drm adds some debugfs features for dumping IBs and the command fifo which makes this easier to sort out. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Luc Verhaegen

16:14

On Wed, Sep 09, 2009 at 10:48:34AM -0400, Alex Deucher wrote:

...

On Wed, Sep 9, 2009 at 7:39 AM, Luc Verhaegen wrote:

...
Now, what i also remember noticing then is that the code behind DRM_RADEON_CP_IDLE did a whole lot more than just checking whether the CP was idle.

Perhaps the naming is a bit too exact (it's a bit more than just CP idle, but that's what we want). It waits for the command fifo to drain and then waits for the entire GUI engine to idle (CP, 2D, and 3D). Bit 31 of RBMM_STATUS, will be set if: 2D is busy or 3D is busy or Command fifo is not empty or CP is busy or CSQ is not empty or Ring buffer is not empty

You can also poll other bits of that reg for the status of individual engine blocks.

Why is there this reason to wait for all engines to be idle here? Isn't there this unix mindset which kind of says: do one small thing, but do it well?

...

This is not a different part of the engine, this is what you want to poll to get the busy status. I pasted the code that actually polls to wait for the engines (CP, 2D, and 3D) to be idle. The function used in the drm directly as well as in the CP_IDLE ioctl.

...

We do wait for the CP to idle. See above. We could probably add better hang detection by checking whether the RTPR is actually progressing when we time-out waiting for the fifo or busy bit to clear. Unfortunately, if one of the engine blocks has hung, the CP may still be fetching stuff until it all falls over.

If the PTRs are still moving, there is no point in checking whether the CP claims to be idle. Check that first, then when the RPTR reaches the WPTR, start waiting on the CP to become idle. Then, return 0, so that the caller can then go and granularly check in on whichever engine it also requires to idle in whatever way it knows.

...

The ioctl is not broken although perhaps we need some better logic for detecting whether the GPU has actually hung. However, in most cases it has hung if the ioctl fails. Generally this is caused by a bad command stream or combination of command streams. Unfortunately, sorting that out is hard since the command that actually hung the chip may not be the one currently being processed. The new kms-enabled drm adds some debugfs features for dumping IBs and the command fifo which makes this easier to sort out.

I think that the ioctl is broken. One should care about only the state of CP here, and not of anything else. I just threw a tiny bit of money at getting some r5xx hw again, so it is possible that i take a much closer look to this stuff again somewhere next week. Luc Verhaegen. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Alex Deucher

16:29

On Wed, Sep 9, 2009 at 12:14 PM, Luc Verhaegen wrote:

...

On Wed, Sep 09, 2009 at 10:48:34AM -0400, Alex Deucher wrote:

...
On Wed, Sep 9, 2009 at 7:39 AM, Luc Verhaegen wrote:

...
Now, what i also remember noticing then is that the code behind DRM_RADEON_CP_IDLE did a whole lot more than just checking whether the CP was idle.

Perhaps the naming is a bit too exact (it's a bit more than just CP idle, but that's what we want). It waits for the command fifo to drain and then waits for the entire GUI engine to idle (CP, 2D, and 3D). Bit 31 of RBMM_STATUS, will be set if: 2D is busy or 3D is busy or Command fifo is not empty or CP is busy or CSQ is not empty or Ring buffer is not empty

You can also poll other bits of that reg for the status of individual engine blocks.

Why is there this reason to wait for all engines to be idle here?

Isn't there this unix mindset which kind of says: do one small thing, but do it well?

While the ioctl is called DRM_RADEON_CP_IDLE it's purpose it to ensure that the entire engine is idle (CP, 2D, 3D) and all data has hit memory. It is used for that purpose in both the ddx and mesa. So perhaps the name is not ideal.

...

...
This is not a different part of the engine, this is what you want to poll to get the busy status. I pasted the code that actually polls to wait for the engines (CP, 2D, and 3D) to be idle. The function used in the drm directly as well as in the CP_IDLE ioctl.

...
We do wait for the CP to idle. See above. We could probably add better hang detection by checking whether the RTPR is actually progressing when we time-out waiting for the fifo or busy bit to clear. Unfortunately, if one of the engine blocks has hung, the CP may still be fetching stuff until it all falls over.

If the PTRs are still moving, there is no point in checking whether the CP claims to be idle. Check that first, then when the RPTR reaches the WPTR, start waiting on the CP to become idle. Then, return 0, so that the caller can then go and granularly check in on whichever engine it also requires to idle in whatever way it knows.

If the fifo is still full or the GUI_ACTIVE bit is still set in RBBM_STATUS, it's the same as checking the pointers. The CP is still active either way. Detecting whether or not it has hung is a different matter. You can still check further in the client if the ioctl returns an error before attempting to reset the engine.

...

...
The ioctl is not broken although perhaps we need some better logic for detecting whether the GPU has actually hung. However, in most cases it has hung if the ioctl fails. Generally this is caused by a bad command stream or combination of command streams. Unfortunately, sorting that out is hard since the command that actually hung the chip may not be the one currently being processed. The new kms-enabled drm adds some debugfs features for dumping IBs and the command fifo which makes this easier to sort out.

I think that the ioctl is broken. One should care about only the state of CP here, and not of anything else.

I just threw a tiny bit of money at getting some r5xx hw again, so it is possible that i take a much closer look to this stuff again somewhere next week.

If you only care about the CP, then you will need to add a new ioctl and restructure the ddx. We currently use this ioctl to ensure the hw has quiesced. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

Luc Verhaegen

16:33

On Wed, Sep 09, 2009 at 12:29:47PM -0400, Alex Deucher wrote:

...

On Wed, Sep 9, 2009 at 12:14 PM, Luc Verhaegen wrote:

...
Why is there this reason to wait for all engines to be idle here?

Isn't there this unix mindset which kind of says: do one small thing, but do it well?

While the ioctl is called DRM_RADEON_CP_IDLE it's purpose it to ensure that the entire engine is idle (CP, 2D, 3D) and all data has hit memory. It is used for that purpose in both the ddx and mesa. So perhaps the name is not ideal.

_The_ DDX? Luc Verhaegen. -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org

5350

Age (days ago)

5351

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Alex Deucher
Luc Verhaegen
Matthias Hopf
mhopf＠kemper.freedesktop.org