http://bugzilla.novell.com/show_bug.cgi?id=482824
User teheo@novell.com added comment
http://bugzilla.novell.com/show_bug.cgi?id=482824#c8
Tejun Heo changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
Info Provider| |DOlsson@WEB.de
--- Comment #8 from Tejun Heo 2009-09-02 17:42:17 MDT ---
Hello, sorry about the delay. Somehow it took forever to get triaged. Your
problem is pretty interesting. This is the first time I see a system going
through that many host side errors. What happened is that under higher
transfer load the controller is detecting protocol mismatch or memory access
error on the _HOST_ side bus which would be PCI-e in this case. From ahci doc,
Host Bus Fatal Error Status (HBFS): Indicates that the HBA encountered a
host bus error that it cannot recover from, such as a bad software pointer.
In PCI, such an indication would be a target or master abort.
This can either be a genuine hardware problem and the controller is having
difficulties accessing the command structures or DMA buffers in the memory or
can be a software problem where the driver is programming the wrong addresses
into the command structure. I think it's the former here because when the
command gets retried after EH, the same memory buffers are being used and as
ahci is 64bit capable I don't think iommu would be stepping in here. So, the
controller would end up with exactly the same command structure but it succeeds
on the second try.
One thing to try tho, does anything change if you limit the amount of memory to
4G? ie. "mem=4G iommu=off"?
Also, it would be great if you can test SL111. There have been some updates to
the iommu part between them.
Thanks.
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.