[Bug 474879] New: Cannot make use of HighPoint RocketRAID 133 (HPT372A/372N, V2.35) controller
https://bugzilla.novell.com/show_bug.cgi?id=474879 Summary: Cannot make use of HighPoint RocketRAID 133 (HPT372A/372N, V2.35) controller Classification: openSUSE Product: openSUSE 11.1 Version: Final Platform: x86-64 OS/Version: openSUSE 11.1 Status: NEW Severity: Critical Priority: P5 - None Component: Kernel AssignedTo: bnc-team-screening@forge.provo.novell.com ReportedBy: DOlsson@WEB.de QAContact: qa@suse.de Found By: --- User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.0.6) Gecko/2009012700 SUSE/3.0.6-0.1 Firefox/3.0.6 When booting up the Rescue system from the openSUSE 11.1 GM x86_64 DVD (downloaded version), it is not possible to make use of my HighPoint RocketRAID 133 (HPT372A/372N, V2.35) controller. On this controller, I have these discs attached: Channel #0, master: <disc attached> Channel #1, master: <disc attached> Channel #0, slave: <none> Channel #1, slave: <disc attached> Reading from and writing to disc on #0m works nicely. Reading from disc on #1m also works nicely, but writing to this disc results in driver crash, followed by a detachment of the disc from the controller! Reproducible: Always Steps to Reproduce: I have done the following (for details for each of the tests done, see the following comment entries): - Booting up from the DVD as normal. - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in drive crash, except for once (see below). Actual Results: When accessing the disc for writing attached to channel #1m, the driver crashes with a kernel message that no one cares for IRQ 19. Expected Results: That the "fsck" checks even on channel #1m runs without causing driver crashes. A common observation for all the driver crashes are that in "boot.msg", you can observe that the drive first attaches itself to IRQ 19 and shortly there after detaches itself from it again, and somewhat later the "pata_hpt3x2n" driver not long also registeres its interest in IRQ 19, but also annouces that it is handling the controller including the discs. This is very strang and odd, I find! :-) It also looks as if the three HighPoint drivers really cannot agree upon which one of them that is taking care of the HighPoint controller, in that they are strangly intervened in their attachment to IRQ 19. When the driver has crashed, while the kernel has detected that no one feels responsible for IRQ 19, the disc used will be detached from the controller (and thus no longer available/cannot be seen any more). Another strange thing is that the order in which the discs appear, changes "wildly" during the testing with "brokenmodules". Why, any specific reasons for this? Finally, when trying to do a "reboot" after the driver has crashed, it is no longer possible to perform an actual reboot of the PC -- "reboot" does its normal work of shutting down everything, but the last part, where it should cause the rebooting of the system, "reboot" just hangs -- only choice is to push the reset button to get the PC to reboot. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c1 --- Comment #1 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:30:20 MST --- Created an attachment (id=271988) --> (https://bugzilla.novell.com/attachment.cgi?id=271988) Output from "lspci -nn" -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c2 --- Comment #2 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:40:10 MST --- Created an attachment (id=271994) --> (https://bugzilla.novell.com/attachment.cgi?id=271994) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before partitions-1after Booting normally, i.e. selecting the Rescue entry and hitting return: - Everything gets booted and all discs get discovered. - Kernel seems to have selected the "pata_hpt3x2n" driver as being the driver of choice, although the other two ("pata_hpt37x" and "hpt366") drivers also gets loaded, just as it seems that they are being initialized. - Expected disc order was: HighPoint with 3 discs: sda, sdb, sdc Sil controller with 3 discs: sdd, sde, sdf, sdg USB stick: sdh but the order is: sil: sd[abcd] usb: sde hpt: sd[fgh] meaning that the "<disc #0m>1" = "/dev/sde1" and "<disc #1m>" = "/dev/sdf1". How come that this order is used? Needless to say that the order changes, when I do not have an USB stick attached... :-) - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in drive crash. A common observation for all the driver crashes are that in "boot.msg", you can observe that the drive first attaches itself to IRQ 19 and shortly thereafter detaches itself from it again(??). It also looks are if the three HighPoint drivers really cannot agree upon which one of them that is taking care of the HighPoint controller, in that they are strangly intervened in their attachment to IRQ 19. After "fsck -f <disc #1m>1" has finished and output the line "bootnew: 11/64256 files ...", it stops for ca. 40 secs. before the prompt appears. During this stopping a lot of kernel messages are being written to "/var/log/messages" -- especially interesting to obverse is that the disc during the driver crash gets detached from the controller/system -- very odd! Rebooting same issue => Reset needed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c3 --- Comment #3 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:41:26 MST --- Created an attachment (id=271995) --> (https://bugzilla.novell.com/attachment.cgi?id=271995) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before partitions-1after Booting with brokenmodules=pata_hpt37x results in disc order: hpt: sd[abc] sil: sd[defg] usb: sdh meaning that the "<disc #0m>1" = "/dev/sda1" and "<disc #1m>" = "/dev/sdb1". How come that this order is used? - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in drive crash. Rebooting same issue => Reset needed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c4 --- Comment #4 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:44:12 MST --- Created an attachment (id=271996) --> (https://bugzilla.novell.com/attachment.cgi?id=271996) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before partitions-1after Booting with brokenmodules=pata_hpt37x,pata_hpt3x2n results in disc order: usb: sda sil: sd[bcde] hpt: hda, hdc, hdd !!! meaning that the "<disc #0m>1" = "/dev/hda1" and "<disc #1m>" = "/dev/hdc1". - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in OK!!! This works without any crashing!! I can even do a fsck -f[n] /dev/os103/root of the LVMs that are stripped over the three discs! But, unfortunately, the discs are now "hd*"s instead of "sd*"s!! Do not know, if this is OK -- i.e. of the driver is good for the HighPoint controller as for my purposes!! I am sceptical about using this driver, just as it is not "good" for my system I have on the discs, that the discs "suddently" are "hd*"s instead of "sd*"s (although I suppose, I could workaround this;-). Anyway, I find it much more interessing to know, what is wrong with these HighPoint drivers, and why they do not work. Rebooting works here (naturally;-). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c5 --- Comment #5 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:45:20 MST --- Created an attachment (id=271997) --> (https://bugzilla.novell.com/attachment.cgi?id=271997) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before partitions-1after Booting with brokenmodules=pata_hpt3x2n,hpt366 results in disc order: sil: sd[abcd] usb: sde hpt: sd[fgh] meaning that the "<disc #0m>1" = "/dev/sdf1" and "<disc #1m>" = "/dev/sdg1". - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in drive crash. Rebooting same issue => Reset needed. Although the "brokenmodules=pata_hpt3x2n,hpt366" has been given, both of these drivers are *still* being loaded and used (see "lsmod" and "boot.msg")!! Why?? -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c6 --- Comment #6 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:46:24 MST --- Created an attachment (id=272000) --> (https://bugzilla.novell.com/attachment.cgi?id=272000) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before partitions-1after Booting with brokenmodules=pata_hpt37x,hpt366 results in disc order: hpt: sd[abc] sil: sd[defg] usb: sdh meaning that the "<disc #0m>1" = "/dev/sda1" and "<disc #1m>" = "/dev/sdb1". How come that this order is used? - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in drive crash. Rebooting same issue => Reset needed. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c7 --- Comment #7 from Dennis Olsson <DOlsson@WEB.de> 2009-02-11 11:48:56 MST --- Created an attachment (id=272002) --> (https://bugzilla.novell.com/attachment.cgi?id=272002) Tar (-cjf) file containing: boot.msg lsmod lspci-nn messages partitions-0before Booting with irqpoll as suggested in the error messages found in "/var/log/message", results in disc order: sil: sd[abcd] usb: sde hpt: sd[fgh] meaning that the "<disc #0m>1" = "/dev/sdf1" and "<disc #1m>" = "/dev/sdg1". How come that this order is used? - Doing a "fsck -fn <disc #0m>1" => always results in OK. - Doing a "fsck -f <disc #0m>1" => always results in OK. - Doing a "fsck -fn <disc #1m>1" => always results in OK. - Doing a "fsck -f <disc #1m>1" => always results in total system death! After having received the "bootnew: 11/64256 files ..." line from "fsck", the system is completely dead! No reaction on ALT-Fx, RETURN or CTRL-ALT-DEL -- Only pushing "reset" button gets the system back running. And, although a "tail -f /var/log/messages" was running in the background, no messages were displayed after the last line from "fsck" (as was the case in during the other crashes). -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c8 --- Comment #8 from Dennis Olsson <DOlsson@WEB.de> 2009-02-23 03:13:46 MST --- Ping -- pong... Hello, is anyone/will anyone be looking at this in a very near future? I am stuck and would like to know, whether I can expect a fix or not. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 Cyril Hrubis <chrubis@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|bnc-team-screening@forge.pr |kernel-maintainers@forge.pr |ovo.novell.com |ovo.novell.com -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 Greg Kroah-Hartman <gregkh@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|kernel-maintainers@forge.pr |teheo@novell.com |ovo.novell.com | -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c9 Tejun Heo <teheo@novell.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |NEEDINFO Info Provider| |DOlsson@WEB.de --- Comment #9 from Tejun Heo <teheo@novell.com> 2009-02-25 01:13:38 MST --- Hello, sorry about the delay. * Highpoint controllers curiously share the same PCI ID for completely different controllers. So, the driver loader (udev in this case) doesn't know in prior which driver to load. Only the drivers themselves can determine whether they can drive the controller during initialization, so they take turn until the correct one (pata_hpt3x2n in this case) grabs the controller. And the hpt366 driver is the original IDE driver which is there only for debugging and falling back when the new libata ones don't work as in your case. * Device names are not guaranteed to be stable. libata no longer pre-allocates device nodes to specific port as IDE did (just can't do it in reasonable way anymore). It generally stays stable with PCI devices but there is no guarantee. USB probing is done asynchronously, so depending on luck, it can end up anywhere. Plus, with upcoming parallel libata probing, things will get even more dynamic. So, devices should be selected via unique device ID (the default) or filesystem labels. This also allows putting in extra controllers or moving drives around can be done without worrying about device node renames. * There are still some rough edges on a few libata drivers for old PATA controllers, mostly due to lack of hardware availability and test coverage. Are you interested in shipping the controller to me so that I can work on it? I'll pay for the shipping + replacement controller. Thanks. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c10 --- Comment #10 from Dennis Olsson <DOlsson@WEB.de> 2009-03-05 04:09:38 MST --- Hello Tejun. Many thanks for your very clear and understandable answers. I am basically positive and prepared to ship the controller to you, although I would like to give it a try myself working on the HighPoint drivers, before I do just that, if you do not mind. Will keep you posted about my progress (or lack therefore;-). ++CHeers/DO++ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c11 --- Comment #11 from Dennis Olsson <DOlsson@WEB.de> 2009-03-08 13:55:34 MST --- Hello Tejun. Well, this drive complex turns out to be a lot more complex than expected. Anyway, I did find a bug in "pata_hpt3x2n.c" line 585, which explains my confusion reading through the log file: drivers/ata/pata_hpt3x2n.c:585 printk(KERN_INFO "pata_hpt37x: bus clock %dMHz, using 66MHz DPLL.\n", pci_mhz); should read: printk(KERN_INFO "pata_hpt3x2n: bus clock %dMHz, using 66MHz DPLL.\n", pci_mhz); Also, I have tried to boot from openSUSE 10.3 and openSUSE 11.0 GM DVDs: openSUSE 10.3: - During driver probing: HighPoint HPT372A/372N drivers: pata_hpt37x, pata_hpt3x2n, hpt366 loading: pata_hpt37x - lsmod shows all hpt drivers as being loaded Using the "pata_hpt37x" driver here in openSUSE 10.3, the driver works. Discs accessable via "/dev/hd[acd]". openSUSE 11.0: - During driver probing: HighPoint HPT372A/372N drivers: pata_hpt37x, pata_hpt3x2n, hpt366 loading: pata_hpt3x2n - lsmod shows *only* "pata_hpt3x2n" driver as being loaded System crashes just as under openSUSE 11.1, when using the discs "/dev/sd[efg]" associated with the controller. OpenSUSE 11.1: - During driver probing: HighPoint HPT372A/372N drivers: pata_hpt37x, pata_hpt3x2n*, hpt366 - lsmod shows all hpt drivers as being loaded Hope this info is of some help to you. ++CHeers/DO++ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User DOlsson@WEB.de added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c12 Dennis Olsson <DOlsson@WEB.de> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW Info Provider|DOlsson@WEB.de | --- Comment #12 from Dennis Olsson <DOlsson@WEB.de> 2009-03-08 13:58:07 MST --- In reply to comment #9:
* There are still some rough edges on a few libata drivers for old PATA controllers, mostly due to lack of hardware availability and test coverage. Are you interested in shipping the controller to me so that I can work on it? I'll pay for the shipping + replacement controller.
Sure. Suggest that you contact me via email about the details of getting this done. ++CHeers/DO++ -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
https://bugzilla.novell.com/show_bug.cgi?id=474879 User teheo@novell.com added comment https://bugzilla.novell.com/show_bug.cgi?id=474879#c13 --- Comment #13 from Tejun Heo <teheo@novell.com> 2009-03-08 18:09:19 MST --- Alright, just sent. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com