do you know any utility able to do for a hdd what memtest do for RAM? It seems like I have HW problems and I can't figure out what. fsck don't gives anything. symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-( I don't remember having seen badblocks listed by fsck, arrors where like date of the superblock in the future or inode error (and there has been a crash, so this is not surprising) the cpu could be hot, but I fixed the fan (some oil) and still freezes :-( thanks jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 12/12/06, jdd
do you know any utility able to do for a hdd what memtest do for RAM?
It seems like I have HW problems and I can't figure out what. fsck don't gives anything.
symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-(
I don't remember having seen badblocks listed by fsck, arrors where like date of the superblock in the future or inode error (and there has been a crash, so this is not surprising)
the cpu could be hot, but I fixed the fan (some oil) and still freezes :-(
thanks jdd
Are you getting anything in your dmesg output? If your lucky it will show up in /var/log/warn. If not and the freeze is hard enough you would need to connect up a serial console. If you think it is a disk media failure you can try a simple "dd if=/dev/hda of=/ev/null bs=4k" from rescue mode on the boot CD/DVD. See /var/log/warn for specific sector failures. Since you say it happens under load, I'm more inclined to think it is a power issue or a bad driver. Is this PATA or SATA (etc.)? What driver? What kernel? If it does look like a driver issue, the lkml-ide list is pretty active these days supporting bugs, but they are going to be more supportive if your running at least a 2.6.18 kernel (ie. the one from SUSE 10.2) Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
jdd wrote:
do you know any utility able to do for a hdd what memtest do for RAM?
It seems like I have HW problems and I can't figure out what. fsck don't gives anything.
symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-(
the cpu could be hot, but I fixed the fan (some oil) and still freezes :-(
Does the disk support SMART and do you not use SATA? Then smartmontools might help you, you could check the error state of the disk. Furthermore, you might want to change the disk cable. Quite often such intermittent problems like yours are caused by defect cables. I had one myself last year and recognized it only after I changed the disk and the same error occured again. On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel? Joachim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Joachim Schrod Email: jschrod@acm.org Roedermark, Germany -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel?
Joachim
I believe "smartctl -a -d ata /dev/sda" should work with recent kernels. I run 3ware cards on my systems so I can't test it. Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 12 December 2006 16:48, Greg Freemyer wrote:
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel?
Joachim
I believe "smartctl -a -d ata /dev/sda" should work with recent kernels.
I run 3ware cards on my systems so I can't test it.
Greg
That does indeed work well Greg (opensuse 10.2 released): talshiar:/home/peter # smartctl -a -d ata /dev/sda smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar T7K250 series Device Model: HDT722516DLA380 Serial Number: VDK71GTE0HSVWK Firmware Version: V43OA96A User Capacity: 164,696,555,520 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1 Local Time is: Tue Dec 12 17:30:08 2006 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled Cheers Pete -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer wrote:
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel?
Joachim
I believe "smartctl -a -d ata /dev/sda" should work with recent kernels.
Thanks, then I might try an update to a factory kernel and see if that works. Maybe that also resolves my "cdparanoia-causes-thousands-of-syslog-message" problem. ;-) Joachim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Joachim Schrod Email: jschrod@acm.org Roedermark, Germany -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Joachim Schrod wrote:
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel? It must, it works on 9.3. There was an option in the config file for SATA drives in 9.3.
-- Joe Morris Registered Linux user 231871 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Joe Morris (NTM) wrote:
Joachim Schrod wrote:
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel?
It must, it works on 9.3. There was an option in the config file for SATA drives in 9.3.
Hmm, doesn't work here. I assume that you mean smartd.conf with the config file. There is a comment that one should use -d ata for SATA disks. Well: puma:/etc # smartctl -d ata -i /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. If I use the default for /dev/sda, or make them explicitly SCSI devices: puma:/etc # smartctl -d scsi -i /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3200822AS Version: 3.01 SATA disks accessed via libata are not currently supported by smartmontools. When libata is given an ATA pass-thru ioctl() then an additional '-d libata' device type will be added to smartmontools. Well, the promised device type libata doesn't exist yet: puma:/etc # smartctl -d libata -i /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ =======> INVALID ARGUMENT TO -d: libata =======> VALID ARGUMENTS ARE: ata, scsi, 3ware,N <======= Use smartctl -h to get a usage summary Thus, I don't see how I can make it work with smartmontools from the distribution. Am I doing something wrong here? Joachim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Joachim Schrod Email: jschrod@acm.org Roedermark, Germany -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Joachim Schrod wrote:
Joe Morris (NTM) wrote:
Joachim Schrod wrote:
On a related note, for other readers: In 10.0, smartmontools don't work with SATA disks. Anybody knows if this has changed in 10.1 or 10.2, with a more recent kernel?
It must, it works on 9.3. There was an option in the config file for SATA drives in 9.3.
Hmm, doesn't work here.
I assume that you mean smartd.conf with the config file. Correct. I just added a couple weeks ago 2 SATA disks to 9.3. I only added /dev/sda -d ata /dev/sdb -d ata
Thus, I don't see how I can make it work with smartmontools from the distribution.
Am I doing something wrong here? I'm not sure. Maybe I see now. Dec 13 22:30:18 server smartd[9016]: Configuration file /etc/smartd.conf
to my smartd.conf, restarted the daemon, and it worked, i.e. Dec 12 01:12:12 server smartd[22028]: Device: /dev/sda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 101 to 108 Dec 12 01:12:12 server smartd[22028]: Device: /dev/sdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 112 to 113 parsed. Dec 13 22:30:18 server smartd[9016]: Device: /dev/sda, opened Dec 13 22:30:18 server smartd[9016]: Device: /dev/sda, not found in smartd datab ase. Dec 13 22:30:18 server smartd[9016]: Device: /dev/sda, is SMART capable. Adding to "monitor" list. Dec 13 22:30:18 server smartd[9016]: Device: /dev/sdb, opened Dec 13 22:30:18 server smartd[9016]: Device: /dev/sdb, not found in smartd datab ase. Dec 13 22:30:18 server smartd[9016]: Device: /dev/sdb, is SMART capable. Adding to "monitor" list. Dec 13 22:30:18 server smartd[9016]: Device: /dev/hda, opened Dec 13 22:30:18 server smartd[9016]: Device: /dev/hda, not found in smartd datab ase. Dec 13 22:30:18 server smartd[9016]: Device: /dev/hda, enabled SMART Attribute A utosave. Dec 13 22:30:18 server smartd[9016]: Device: /dev/hda, enabled SMART Automatic O ffline Testing. But when I try your command, server:/home/joe # smartctl -i /dev/sda smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3320620AS Version: 3.AA SATA disks accessed via libata are not currently supported by smartmontools. When libata is given an ATA pass-thru ioctl() then an additional '-d libata' device type will be added to smartmontools. So I guess it is not fully functional. I do remember reading something about smartmontools and SATA in the 10.2 release notes. So I guess it is maybe only partially working with SATA in 9.3. Not real sure, and since I plan on updating the server to 10.2 in the near future, I won't worry too much about it. -- Joe Morris Registered Linux user 231871 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Joe Morris (NTM) wrote:
So I guess it is not fully functional. I do remember reading something about smartmontools and SATA in the 10.2 release notes.
We upgrade our systems only once every 18 months, so 10.2 won't be it here. But I'll probably take a test system and try a factory kernel and current smartmontools; I'll see if that works. This won't be something to run on our production systems, but in case of error it might come handy to have it available. Thanks for looking at your configuration and sharing your experience, Joachim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Joachim Schrod Email: jschrod@acm.org Roedermark, Germany -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 12 December 2006 10:19, jdd wrote:
do you know any utility able to do for a hdd what memtest do for RAM?
It seems like I have HW problems and I can't figure out what. fsck don't gives anything.
symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-(
Check your power supply too. Joachim's suggestions on cables is good too. A useful tool I've found is www.UltimateBootCD.com. Comes with lots of hard drive diagnostic utilities as well as other troubleshooting software. Depending on the drive manufacturer their utilities may offer both non-destructive as well as destructive options. Haven't tried all of them but the ones I've tried offer surface test analysis.
thanks jdd
Stan -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
S Glasoe a écrit :
A useful tool I've found is www.UltimateBootCD.com.
I'm downloading this and will try it dd did no error. smartctrl passes. If I understand the logs, the disk is not in good shape (but smart does not always give relevant infos, alas) fsck.reiser passes also... I don't really see how I can test the alim (the fan blows) this is an old computer, celeron 600, 4.5Gb HDD no sata there :-). I even changed the video card and will complete the tests, because the last freeze was surprising. kde, yast frozen (on software install) - not a zen problem, no mouse at all :-( ping responded ssh responded! I could open a ssh session as root. I could kill one after the other all kde processes, but no change on video and in ps ax I couldn't identify the X basic system I tryed init 3, no change. I tryed init 1, no change on the screen, still no keyboard and still the kde screen, but I lost the ssh session, so I beg it did init 1 on reboot, lot of journal replaying :-( I made the hdd tests after that (no problem seen) and replaced the video card. I'm there :-) jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On 12/12/06, jdd
ping responded ssh responded! I could open a ssh session as root. I could kill one after the other all kde processes, but no change on video and in ps ax I couldn't identify the X basic system
What does dmesg show from that ssh session? paste a copy of the output here. Greg -- Greg Freemyer The Norcross Group Forensics for the 21st Century -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Greg Freemyer a écrit :
On 12/12/06, jdd
wrote: ping responded ssh responded! I could open a ssh session as root. I could kill one after the other all kde processes, but no change on video and in ps ax I couldn't identify the X basic system
What does dmesg show from that ssh session?
paste a copy of the output here.
Greg
interesting idea and interesting result: Dec 12 17:14:17 survivor sshd[3741]: Accepted keyboard-interactive/pam for jdd from 10.3.204.103 port 59661 ssh2 Dec 12 17:14:23 survivor su: (to root) jdd on /dev/pts/3 Dec 12 17:16:04 survivor init: Switching to runlevel: 3 Dec 12 17:22:10 survivor sshd[3966]: Accepted keyboard-interactive/pam for jdd from 10.3.204.103 port 59216 ssh2 Dec 12 17:22:17 survivor su: (to root) jdd on /dev/pts/0 Dec 12 17:22:42 survivor sshd[3994]: Accepted keyboard-interactive/pam for jdd from 10.3.204.103 port 59217 ssh2 Dec 12 17:22:47 survivor su: (to root) jdd on /dev/pts/0 Dec 12 17:23:27 survivor init: Switching to runlevel: 1 Dec 12 17:23:35 survivor auditd[2711]: The audit daemon is exiting. I had two ssh sessions (true) and the init command was accepted I should have tried init 0, but made a reset instead (the last line) I'm working now of this computer (new video card) - no freeze that far, but I got a lot of "I can't install that package" (I try to replace kde by gnome, a really heavy task :-) I quit the install to test the drive... once again jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
S Glasoe wrote:
A useful tool I've found is www.UltimateBootCD.com. Comes with lots of hard drive diagnostic utilities as well as other troubleshooting software.
Cool, I didn't know that. Great tip, thank you. Joachim -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Joachim Schrod Email: jschrod@acm.org Roedermark, Germany -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Tuesday 12 December 2006 17:19, jdd wrote:
do you know any utility able to do for a hdd what memtest do for RAM?
It seems like I have HW problems and I can't figure out what. fsck don't gives anything.
symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-(
I don't remember having seen badblocks listed by fsck,
There is a utility called "badblocks". See it's manpage for usage. It lives in /sbin/ (of course), and is probably already installed. Maybe this is what you're looking for?
arrors where like date of the superblock in the future or inode error (and there has been a crash, so this is not surprising)
the cpu could be hot, but I fixed the fan (some oil) and still freezes :-(
Any other disk drives in the same PC that do not have this behavior, or this drive showing same erroneous behavior in a different PC are indications dat it might be a faulty drive. Cheers, Leen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Leendert Meyer a écrit :
I don't remember having seen badblocks listed by fsck,
There is a utility called "badblocks". See it's manpage for usage. It lives in /sbin/ (of course), and is probably already installed.
this is already used by fsck. smartcltrl works and passes good
Any other disk drives in the same PC that do not have this behavior, or this drive showing same erroneous behavior in a different PC are indications dat it might be a faulty drive.
not sure i will go so far... jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Tuesday 2006-12-12 at 19:41 +0100, jdd wrote:
There is a utility called "badblocks". See it's manpage for usage. It lives in /sbin/ (of course), and is probably already installed.
this is already used by fsck.
I don't think so. It is run when the filesystem is created by mkfs, but not by fsck, afaik.
smartcltrl works and passes good
Yes, but not "-a", that doesn't really "test". Run "smartctl --test=short /dev/hda", and see the result after the specified time with "smartctl - --log=selftest /dev/hda". If it passes, use "--test=long", and wait. You can keep using the system while it runs, but it may take an hour or two. In the case of seagate, those two tests are basically the same as their verification disquete does. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFFf0T+tTMYHG2NR9URArUYAJ0WuuG8PXj/4Uc7qpq+blYmLlvn9gCcDszS YG0TkWpuhUzJm9iKyBHYcmc= =xC0p -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
On Wednesday 13 December 2006 01:10, Carlos E. R. wrote:
The Tuesday 2006-12-12 at 19:41 +0100, jdd wrote:
There is a utility called "badblocks". See it's manpage for usage. It lives in /sbin/ (of course), and is probably already installed.
this is already used by fsck.
I don't think so. It is run when the filesystem is created by mkfs, but not by fsck, afaik.
From the badblocks manual, right before the "Options" header: --8<-- For this reason, it is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs. --8<-- Cheers, Leen -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
Carlos E. R. a écrit :
Yes, but not "-a", that doesn't really "test". Run "smartctl --test=short
my disk don't support self test :-( I will stay on ubcd for now of change disk jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The Wednesday 2006-12-13 at 09:21 +0100, jdd wrote:
Carlos E. R. a écrit :
Yes, but not "-a", that doesn't really "test". Run "smartctl --test=short
my disk don't support self test :-(
I will stay on ubcd for now of change disk
Argh! Must be old, then. You can test it using an external utility. There are several, booting from a disquette. You can download them and dump to a disquette. Seagate has one, for instance, and several others, perhaps the maker of your hd. - -- Cheers, Carlos E. R. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFFf+RltTMYHG2NR9URAjiVAKCHXAKSfgo70++IGDh1rQ+B7mRt2ACdE74N NOiVgcVq6FrSDzq5N07AyMo= =bgoE -----END PGP SIGNATURE-----
Carlos E. R. a écrit :
Argh! Must be old, then.
You can test it using an external utility.
this is what ubcd gives (the floppies on a cd); nothing usefull for me :-( I gave up and changed the drive, just finished the new install we will se if this still freeze :-) jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
do you know any utility able to do for a hdd what memtest do for RAM?
It seems like I have HW problems and I can't figure out what. fsck don't gives anything.
symptoms: on heavy work, linux freezes, disk is corrupted, but as soon as I reformat it fsck -c don't complain anymore, but the freeze keep occuring again. tested with ext3 and reiserfs :-(
I don't remember having seen badblocks listed by fsck, arrors where like date of the superblock in the future or inode error (and there has been a crash, so this is not surprising)
the cpu could be hot, but I fixed the fan (some oil) and still freezes :-(
If this is an older mainboard (+3 yrs), check the caps on the mobo. Top of the elco's needs to be flat, not like an english bowlerhat. Known problem with older hardware. Could be a problem inside your powersupply too, by the way. If you still suspect your drive, your diskmanufacturer should have testtools on their site available, some of them (like powermax from Maxtor) even repair a disk by doing a lowlevel format. Have been able to revive drives that were having a lot of bad blocks. Of course you understand that this is destructive for your data.
thanks jdd -- http://www.dodin.net http://dodin.org/mediawiki/index.php/GPS_Lowrance_GO -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
-- L. de Braal BraHa Systems NL - Terneuzen T +31 115 649333 F +31 115 649444 -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org For additional commands, e-mail: opensuse+help@opensuse.org
participants (9)
-
Carlos E. R.
-
Greg Freemyer
-
jdd
-
Joachim Schrod
-
Joe Morris (NTM)
-
Leen de Braal
-
Leendert Meyer
-
Pete Connolly
-
S Glasoe