[Bug 241334] New: Failure to start SATA device on an Athlon 64 X2 (HP dc5750)
https://bugzilla.novell.com/show_bug.cgi?id=241334 Summary: Failure to start SATA device on an Athlon 64 X2 (HP dc5750) Product: openSUSE 10.2 Version: Final Platform: x86-64 OS/Version: Other Status: NEW Severity: Blocker Priority: P5 - None Component: Kernel AssignedTo: kernel-maintainers@forge.provo.novell.com ReportedBy: pablomme@googlemail.com QAContact: qa@suse.de I installed the x86_64 version of openSUSE 10.2, not without issues, alongside the 32-bit version of windows xp in my computer. After a couple of days using it, I started getting the following during boot-up (typos are possible; I had to copy it by hand): ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: configured for UDMA/100 ata2: EH complete ata2.00: exception Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 ata2.00: irq_stat 0x40000001 ata2.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x25 data 8 in ata2.00: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error) ata2: soft resetting port This is repeated indefinitely, about once per second. The problem seems to appear somewhat at random --sometimes the system still boots without whingeing at all. I'm using the kernel distributed with openSUSE 10.2, recompiled after adding a patch from AMD to make the kernel detect and use the second CPU (the original kernel didn't, although it was meant to include SMP support). I didn't get the problem before patching the kernel, but it may have appeared if I had given it time (I suppose). My computer is an HP dc5750 (Athlon 64 X2, 1.5GB RAM), whose hard drive is a Samsung HD160JJ/P. The SATA controller has vendor/device no. 1002/4380, corresponding to ATI's 'SB600 Non-Raid-5 SATA'. Is there any way of, at least, bypassing this problem by issuing appropriate kernel boot options? Like '-choke-on-sata=no', but with a flag that the kernel would understand? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 gregkh@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel- |teheo@novell.com |maintainers@forge.provo.nove| |ll.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 teheo@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |pablomme@googlemail.com ------- Comment #1 from teheo@novell.com 2007-02-01 23:16 MST ------- Can't say much about the error with the info you provided. Please report... * The content of file /var/log/boot.msg * The result of 'dmesg' on failure (not snippet of the failure, the whole result) * The result of 'hwinfo --all' Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 ------- Comment #2 from pablomme@googlemail.com 2007-02-05 11:40 MST ------- Created an attachment (id=117439) --> (https://bugzilla.novell.com/attachment.cgi?id=117439&action=view) Output of hwinfo --all I can provide the output of `hwinfo --all`, run when the system fully boots. However, when the problem arises the system just won't boot, and no logs are kept on the disk, as it's the HDD that encounters trouble when starting. The system has booted successfully lately, but it's the second time the problem goes away mysteriously, i.e., it may reappear. I'll try to collect more info if this happens. How can I freeze the boot-up message-log screen to be able to copy what's being printed? ('pause' doesn't seem to work) -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 ------- Comment #3 from teheo@novell.com 2007-02-06 07:23 MST ------- To grab boot messages when boot fails, you need serial console or netconsole which is quite easy to setup if you have another machine handy, but as that's a choir. Let's do other things first. 1. Please post /var/log/boot.msg. 2. The error message is reporting SError 0x800 - SERR_INTERNAL. This is the first time I see any controller claiming that error condition. According to the specification, it's... -- Internal error: The host bus adapter experienced an internal error that caused the operation to fail and may have put the host bus adapter into an error state. Host software should reset the interface before re-trying the operation. If the condition persists, the host bus adapter may suffer from a design issue rendering it incompatible with the attached device. -- libata error handling currently soft resets the channel when that happens and your controller responds with hard lockup to that. Maybe we need to hardreset in that case. I'll attach a patch to do that. If you can't compile the kernel and test it, please let me know I'll prepare a test rpm for you but I'm gonna be traveling for the next two weeks, so it's gonna take some time. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 ------- Comment #4 from teheo@novell.com 2007-02-06 07:25 MST ------- Created an attachment (id=117644) --> (https://bugzilla.novell.com/attachment.cgi?id=117644&action=view) libata-hardreset-on-internal-error -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 ------- Comment #5 from teheo@novell.com 2007-02-21 05:10 MST ------- PING. If you have problems applying the patch I can prepare kernel rpm. Please respond. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 ------- Comment #6 from pablomme@googlemail.com 2007-02-27 07:03 MST ------- Created an attachment (id=121299) --> (https://bugzilla.novell.com/attachment.cgi?id=121299&action=view) boot.msg for a successful boot PONG. I've patched the kernel, but haven't run into the problem since then (have only rebooted twice). I'll post another boot.msg if I see a hard-reset message. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
https://bugzilla.novell.com/show_bug.cgi?id=241334 teheo@novell.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |RESOLVED Info Provider|pablomme@googlemail.com | Resolution| |FIXED ------- Comment #7 from teheo@novell.com 2007-03-13 10:14 MST ------- Forwarding the patch upstream and applying it to kernel CVS as the patch is useful whether it actually fixes the problem or not. I'm resolving this bug as FIXED for now. Please reopen if you reencounter the problem and hardresetting doesn't help. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
participants (1)
-
bugzilla_noreply@novell.com