On Wed, Sep 9, 2009 at 7:39 AM, Luc Verhaegen
On Tue, Sep 08, 2009 at 11:41:02AM -0400, Alex Deucher wrote:
It's not just a simple counter. The code in the drm calls udelay for each iteration of the loop. From radeon_cp.c:
for (i = 0; i < dev_priv->usec_timeout; i++) { if (!(RADEON_READ(RADEON_RBBM_STATUS) & RADEON_RBBM_ACTIVE)) { radeon_do_pixcache_flush(dev_priv); return 0; } DRM_UDELAY(1); }
Alex
Alex,
It seems that you are right, there does seem to have always been a DRM_UDELAY there. I wonder how i missed this then.
I had to massively bump the usec_delay in rhd_dri.c (up to the limit) and then noticed that this still wasn't enough, and therefor massively increased the counterlimit in DRMCPIdle in rhd_cs.c for it to succeed and to not drop into an engine reset.
Now, what i also remember noticing then is that the code behind DRM_RADEON_CP_IDLE did a whole lot more than just checking whether the CP was idle.
Perhaps the naming is a bit too exact (it's a bit more than just CP idle, but that's what we want). It waits for the command fifo to drain and then waits for the entire GUI engine to idle (CP, 2D, and 3D). Bit 31 of RBMM_STATUS, will be set if: 2D is busy or 3D is busy or Command fifo is not empty or CP is busy or CSQ is not empty or Ring buffer is not empty You can also poll other bits of that reg for the status of individual engine blocks.
The bit you pasted here is a clear example of that. We are talking about the DRM_RADEON_CP_IDLE ioctl, but you paste code that deals solely with the RB, a completely different part of the engine, one that gets just some of the register writes that the CP outputs queued into it. And one that might become empty when the CP itself is still chewing on P3 commands.
This is not a different part of the engine, this is what you want to poll to get the busy status. I pasted the code that actually polls to wait for the engines (CP, 2D, and 3D) to be idle. The function used in the drm directly as well as in the CP_IDLE ioctl.
Actually, looking over the same code again now, i am unable to find anything where we wait for the actual CP to go idle. We put some cache flushing commands and such in the the cp ringbuffer and then flush the ringbuffer out. And then we go and wait for the RBBM to go idle.
We never wait for the RPTR to catch up with the WPTR, we never wait for the CP to claim that it is idle. All we do is wait for the RBBM to become idle, and the thing is, it might already become idle while the CP is still busy working off something else.
We do wait for the CP to idle. See above. We could probably add better hang detection by checking whether the RTPR is actually progressing when we time-out waiting for the fifo or busy bit to clear. Unfortunately, if one of the engine blocks has hung, the CP may still be fetching stuff until it all falls over.
So with that knowledge, it now seems quite clear that the DRM_RADEON_CP_IDLE command is broken. But. It has been this kind of broken for ages, and its behaviour is probably depended upon in many places. If this is to be put straight, then another ioctl has to be created to fix this.
The ioctl is not broken although perhaps we need some better logic for detecting whether the GPU has actually hung. However, in most cases it has hung if the ioctl fails. Generally this is caused by a bad command stream or combination of command streams. Unfortunately, sorting that out is hard since the command that actually hung the chip may not be the one currently being processed. The new kms-enabled drm adds some debugfs features for dumping IBs and the command fifo which makes this easier to sort out. Alex -- To unsubscribe, e-mail: radeonhd+unsubscribe@opensuse.org For additional commands, e-mail: radeonhd+help@opensuse.org