Thanks in advance for any input: I have struggled with this annoyance for some time - thinking that it was either hardware or something I was doing wrong. But now with up to 40 machines all getting the same treatment, between 30 and 60 % will hang on initrd after I run mkinitrd. I have been using mostly Western Digital WD400VE (2.5" notebook drives with an adapter) but have also seen this occur with the WD400JB (the 3.5" variant). Since all the machines have the exact same build and I do the same thing to all of them, yet some of them work and some don't - I can't believe that I am making a fundamental mistake (and I don't believe I am). On the other hand, the numbers are way to high and consistent to be obviously hardware. My base system is a SuSE 9.3 distro (although I've seen it with 9.1 to some degree) in which I have included the opensource highpoint driver which has been compiled locally. The /boot is in the / partition which is formatted with reiserfs. I have the following modules listed in /etc/sysconfig/kernel for MKINITRD_MODULES: "reiserfs scsi_mod hptmv xfs". The most consistent fix we have found is the tedious process of tarring up "/" (from a rescue disk), copying it to another machine (usually at this stage the machines are unique enough, that we can't lose data), writing zeroes to the first few sectors with dd, re-partitioning the drive, copying the tarball back to the disk and untarring. This requires re-installing grub which sometimes produces a hang on loading stage1.5. We are going to go away from the 2.5" drive anyway, but for the meantime (and until we have time to re-work the existing ~200 machines), I have to deal with it - and I am concerned since I saw it on the 3.5" drives, anyway. Before I push this issue back to the disk manufacturer, has anyone seen this kind of behavior before, or can you offer me any insight into what may be causing this? Thank you very much, Patrick W. Freeman Systems Specialist DATAllegro
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Saturday 2005-12-24 at 14:34 -0500, Patrick Freeman wrote:
The most consistent fix we have found is the tedious process of tarring up "/" (from a rescue disk), copying it to another machine (usually at this stage the machines are unique enough, that we can't lose data), writing zeroes to the first few sectors with dd, re-partitioning the drive, copying the tarball back to the disk and untarring. This requires re-installing grub which sometimes produces a hang on loading stage1.5.
That's the fix... but I don't see clearly what is the problem, the symptoms :-? - -- Cheers, Carlos Robinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFDrc/wtTMYHG2NR9URAhmaAKCCXiCo6FoKQL4P2Ddg94IdpNp1AACfWEw1 Rda3SIGWi0ipZoQ8bqSox1c= =M/Tn -----END PGP SIGNATURE-----
On Saturday 24 December 2005 14:34, Patrick Freeman wrote: <snippage>
...But now with up to 40 machines all getting the same treatment, between 30 and 60 % will hang on initrd after I run mkinitrd.
Does this mean 12 have exhibited this behavior (30% of 40) or 24 (60% of 40)? If I were going to investigate the drive subsystems, I would: - verify the quality of the power supplies and the data cables. - check for poorly designed cases not allowing proper ventillation. - ensure all drives have the same *current* firmware installed. - ditto the mainboards and add-in controllers - plan on a realistic 2% to 3% field failure rate in the first 30 to 90 days. (no OEM can afford to burn-in and stress-test every single drive) FYI: I used to sell between $1 and $2 Million in drives a year into the RISC/UNIX market. I've never purchased a WD drive, nor have I strayed from Seagate. I've never regretted that decision. Finally, I think you're going to have a real hard time pointing fingers back at WD since you haven't clearly ruled out the HighPoint driver that you're compiling. HTH & regards, - Carl
participants (3)
-
Carl Hartung
-
Carlos E. R.
-
Patrick Freeman