Re: [SLE] e2fsck command
El 2004-10-01 a las 18:54 +0200, Hylton Conacher (ZR1HPC) escribió: (you forgot to email to the list)
You do not need that. You only have to tune your partitions so that they "want" to be fsck-ed oftener. Th st I did, but reading the fsck man page I see there is no option to check for bad blocks, unlike e2fsck which does have tis option. I initially thought along the lines of an alias for fsck as e2fsck but realised that it could be more problematic.
fsck does in fact pass options to fsck.ext3
Remember that suse scripts boot.rootfsck and boot.localfs check the filesystem when required. I did remember that fsck will fsck the system when required, but wanted additional safety nets.
Also, creating the file "/forcefsck" will force fsck of all partitions on every boot. It doesn't need to be on every boot, but rather on a weekly or even daily basis, no matter how many times the machine is rebooted.
Well, you can create that file weekly using a cron job, then delete it after boot.
# e2fsck -pcfv </device>| > /home/hylton/e2fsckresult
Remember that you can not check a mounted partition, so above you can not check home. mmm, I saw that oops as I sent it. The reason for putting the location in is that I do not know where to put system admin stuff. In retrospect I would probably have the result mailed to a user.
It doesn't matter, you can not email or save it at that time, because the filesystem is mounted read-only at the time of checking - unless you are checking manually some other partition. You have to "see" it, or invent some way to keep it memory and save later, kind of what SuSE does with the kernel booting messages.
I know an ext3 fs is 'safe' but it is not the fs I am worried about but the hardware deteriorating. Call me paranoid but rather safer than sorrier. :)
Then, what you need is smartd instead. fsck will only tell about logical errors on the filesystem sctructure, not about the motor getting overhot, for example. SMART will, however. mmm, I think that a more substantial backup method is needed. JIC
SMART is quite good. It can quick test your HD at the same time you are using it, any time. It can check it thoroughly, in an hour or so, also transparently. When done, you read the result. It checks many things, including read errors, hardware, interfaces, temperatures, moving parts wear, voltages... read it up (smartctl). -- Saludos Carlos Robinson
Carlos E. R. wrote:
El 2004-10-01 a las 18:54 +0200, Hylton Conacher (ZR1HPC) escribió:
(you forgot to email to the list) Tnx I keep forgetting as it is not the same as the many other lists I am on, Tnx.
You do not need that. You only have to tune your partitions so that they "want" to be fsck-ed oftener.
Th st I did, but reading the fsck man page I see there is no option to check for bad blocks, unlike e2fsck which does have tis option. I initially thought along the lines of an alias for fsck as e2fsck but realised that it could be more problematic.
fsck does in fact pass options to fsck.ext3 Yes, but does fsck.ext3 check for bad blocks as that is what I am trying to get done here.
Remember that suse scripts boot.rootfsck and boot.localfs check the filesystem when required.
I did remember that fsck will fsck the system when required, but wanted additional safety nets.
Also, creating the file "/forcefsck" will force fsck of all partitions on every boot.
It doesn't need to be on every boot, but rather on a weekly or even daily basis, no matter how many times the machine is rebooted.
Well, you can create that file weekly using a cron job, then delete it after boot.
# e2fsck -pcfv </device>| > /home/hylton/e2fsckresult
Remember that you can not check a mounted partition, so above you can not check home.
mmm, I saw that oops as I sent it. The reason for putting the location in is that I do not know where to put system admin stuff. In retrospect I would probably have the result mailed to a user.
It doesn't matter, you can not email or save it at that time, because the filesystem is mounted read-only at the time of checking - unless you are checking manually some other partition. You have to "see" it, or invent some way to keep it memory and save later, kind of what SuSE does with the kernel booting messages.
I know an ext3 fs is 'safe' but it is not the fs I am worried about but the hardware deteriorating. Call me paranoid but rather safer than sorrier. :)
Then, what you need is smartd instead. fsck will only tell about logical errors on the filesystem sctructure, not about the motor getting overhot, for example. SMART will, however.
mmm, I think that a more substantial backup method is needed. JIC
SMART is quite good. It can quick test your HD at the same time you are using it, any time. It can check it thoroughly, in an hour or so, also transparently. When done, you read the result. It checks many things, including read errors, hardware, interfaces, temperatures, moving parts wear, voltages... read it up (smartctl). I did a search on man and apropos of smartctl and came up with nothing. Even the SuSE help couldn't find anything in GUI mode. :(
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too. Pat, any 'man ' comments from you :) ? -- ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
* Hylton Conacher (ZR1HPC)
I did a search on man and apropos of smartctl and came up with nothing. Even the SuSE help couldn't find anything in GUI mode. :(
smartctl is in pkg smartmontools. If it hasn't been installed there would be no man pages, ie: apropos would not find it.
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too.
Pat, any 'man ' comments from you :) ?
Above. You asked <grin>. gud luk, -- Patrick Shanahan Registered Linux User #207535 http://wahoo.no-ip.org @ http://counter.li.org HOG # US1244711 Photo Album: http://wahoo.no-ip.org/photos
Patrick Shanahan wrote:
* Hylton Conacher (ZR1HPC)
[10-10-04 02:16]: I did a search on man and apropos of smartctl and came up with nothing. Even the SuSE help couldn't find anything in GUI mode. :(
smartctl is in pkg smartmontools. If it hasn't been installed there would be no man pages, ie: apropos would not find it. It helps markedly if the package is installed :), Tnx Carlos for suggesting the smartctl, and of course Pat for suggesting it may not be installed ie no man pages.
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too. Any comments about the difference between the BIOS and software SMART? What if they are both enabled, can one break the other, does one rely on the other, is the BIOS on even used by Linux? Do I need to turn it on in BIOS so that the software SMART can work etc?
Pat, any 'man ' comments from you :) ?
Above. You asked <grin>. Sometimes a man must learn howto fish :)
-- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Monday 2004-10-11 at 13:37 +0200, Hylton Conacher (ZR1HPC) wrote:
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too. Any comments about the difference between the BIOS and software SMART? What if they are both enabled, can one break the other, does one rely on the other, is the BIOS on even used by Linux? Do I need to turn it on in BIOS so that the software SMART can work etc?
* The computer BIOS does have an entry named SMART. I do have it enabled, but I don't know what it does, and I don't care. * The SMART test procedures, logs, etc, reside inside the HD and HD firmware. The computer is not involved: not the BIOS, not the CPU, not the software. There is no software smart. * What you need is just a piece of software to read SMART data, and to start the tests. Once started, the computer does nothing else, and it can continue working normally, except that the HD is slower to respond to request. When the test finishes, you can use that software to retrieve the test results from the HD itself. - Let me repeat: it is not the computer which runs the test: it is the HD which runs it, on its own. * There are two flavors of such software in the linux package. One, a CLI program. The other is a daemon. You choose, or use both. * Some HD makers provide a HD test floppy disk, bootable. The Seagate one, for example, can trigger a smart test, or it can replicate it under CPU control; this can be done as Seagate obviously knows how to test Seagate's HD. That is a diferent thing. More info, on smartmon package docs. :-) -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The Monday 2004-10-11 at 13:37 +0200, Hylton Conacher (ZR1HPC) wrote:
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too.
Any comments about the difference between the BIOS and software SMART? What if they are both enabled, can one break the other, does one rely on the other, is the BIOS on even used by Linux? Do I need to turn it on in BIOS so that the software SMART can work etc?
* The computer BIOS does have an entry named SMART. I do have it enabled, but I don't know what it does, and I don't care. * The SMART test procedures, logs, etc, reside inside the HD and HD firmware. The computer is not involved: not the BIOS, not the CPU, not the software. There is no software smart. * What you need is just a piece of software to read SMART data, and to start the tests. Once started, the computer does nothing else, and it can continue working normally, except that the HD is slower to respond to request. When the test finishes, you can use that software to retrieve the test results from the HD itself. - Let me repeat: it is not the computer which runs the test: it is the HD which runs it, on its own. * There are two flavors of such software in the linux package. One, a CLI program. The other is a daemon. You choose, or use both.
* Some HD makers provide a HD test floppy disk, bootable. The Seagate one, for example, can trigger a smart test, or it can replicate it under CPU control; this can be done as Seagate obviously knows how to test Seagate's HD. That is a diferent thing.
More info, on smartmon package docs. :-) Tnx for the info and sorry for the belated reply. I have enabled smartctl to perform checks by issuing the command, as root:
smartctl -s on /dev/hdb and smartctl -o /dev/hdb and hope that will be partly sufficient. The next step is not to worry about the HDD physical structure but to concentrate on making sure the data structure is error ie no bad blocks etc ie the e2fsck cmd. -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
Warning: some email lines way longer than 72 chars. The Friday 2004-10-15 at 16:44 +0200, Hylton Conacher (ZR1HPC) wrote:
Tnx for the info and sorry for the belated reply. I have enabled smartctl to perform checks by issuing the command, as root:
smartctl -s on /dev/hdb and smartctl -o /dev/hdb
and hope that will be partly sufficient. The next step is not to worry about the HDD physical structure but to concentrate on making sure the data structure is error ie no bad blocks etc ie the e2fsck cmd.
I see you are still confused. Ok, I'll try to explain a bit more. The above line is not needed, I have never used it. What I do is: Manual testing: ----------------------------- Launching sort test: nimrodel:~ # smartctl --test=short /dev/hda smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 1 minutes for test to complete. Use smartctl -X to abort test. After about one minute, I can see the results, using this command: nimrodel:~ # smartctl --log=selftest /dev/hda smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log, version number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short off-line Completed 00% 5069 - # 2 Short off-line Completed 00% 4424 - # 3 Short off-line Completed 00% 1868 - # 4 Short off-line Completed 00% 345 - The results #1 are those of the last test performed (notice lifetime column). To launch the complete test, I do: nimrodel:~ # smartctl --test=long /dev/hda smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 62 minutes for test to complete. Use smartctl -X to abort test. Notice that I can test simultaneously all my drives, and that I can continue using my computer simultaneously - albeit slower sometimes, when file requests collide with the tests. Doesn't matter, it works. I can check the progress of the tests with this command: nimrodel:~ # smartctl --log=selftest /dev/hda smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log, version number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short off-line Completed 00% 5069 - # 2 Short off-line Completed 00% 4424 - # 3 Short off-line Completed 00% 1868 - # 4 Short off-line Completed 00% 345 - #21 Extended off-line Test in progress 90% 5069 - Notice that #21 entry is for the current test, of which only 10% has been done. Of course, exact text depends on you HD maker. Finally, I can see the result - I will print here those of my other drive, which had some problems time ago: nimrodel:~ # smartctl --log=selftest /dev/hdb smartctl version 5.1-4 Copyright (C) 2002 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log, version number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short off-line Completed 00% 4915 - # 2 Short off-line Completed: read failure 90% 4272 0x0170169c # 3 Short off-line Completed: read failure 90% 4272 0x0170169c # 4 Short off-line Completed 00% 1909 - # 5 Extended off-line Completed: read failure 90% 1902 0x0060da4e # 6 Short off-line Completed 00% 1902 - # 7 Short off-line Completed 00% 400 - #21 Extended off-line Test in progress 90% 4918 - Notice that it did a read test of the surface and it failed. The HD having space reserved for this eventuality, it relocated the bad records, and I have continued using the same disk for over a year since then. No problem. To see the long information, you should use option "--all" (The man says: "This is equivalent to ´-H -i -c -A -l error -l selftest´ (for SCSI, ´-H -i -A -l error -l selftest´"). A brief one could be "--health". An interesting part of the data is the "Vendor Specific SMART Attributes with Thresholds" table. It has be read with care, it is confusing: nimrodel:~ # smartctl -A /dev/hda smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 060 053 025 Pre-fail Always - 243624074 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1189 5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 39 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 278640814 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7650 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1846 194 Temperature_Celsius 0x0022 031 049 000 Old_age Always - 31 195 Hardware_ECC_Recovered 0x001a 100 253 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 Pick one entry, for example. "Raw_Read_Error_Rate". Notice the column "type", it says "Pre-fail". This does NOT mean that my disk is about to fail. It means that if the value is wrong it indicates a pre-failure notice. A wrong value would be below 025 - I think: that is the confusing part. As "health" says it is correct, then it is correct. automated testing: ----------------------------- Enable SuSE service "rcsmartd start". Configuration is done in /etc/smartd.conf. SuSE puts a sample file there. Comment line "DEVICESCAN", and manually list your devices with appropriate lines. For example: # First (primary) ATA/IDE hard disk. Monitor all attributes, enable # automatic online data collection, automatic Attribute autosave, and # do a short self-test every day at 2am, and a long self test # Saturdays at 3am. /dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03) /dev/hdb -a -o on -S on -s (S/../.././02|L/../../6/03) (the above is too verbose for my liking) That's all :-) -- Cheers, Carlos Robinson
Carlos E. R. wrote: WOW, TNX Carlos for the effort!! I have trimmed most of quotes for brevity.
The Friday 2004-10-15 at 16:44 +0200, Hylton Conacher (ZR1HPC) wrote:
I have enabled smartctl to perform checks by issuing the command, as root:
smartctl -s on /dev/hdb and smartctl -o /dev/hdb
I see you are still confused. Ok, I'll try to explain a bit more. The above line is not needed, Ho come as according to the man page this are listed under the section of SMART FEATURE ENABLE/DISABLE COMMANDS to enable it and allow offline testing. Perhaps it is only needed if the /etc/smartd.conf file is used?
I have never used it. What I do is:
Manual testing: ----------------------------- Aaah,so just because I have SMART enabled doesn't mean it is automatically going to check the devices, unless of course the /etc file is called by a cron job?
Launching short test:
nimrodel:~ # smartctl --test=short /dev/hda Tnx, I was unsure of the command syntax after reading man pages.
After about one minute, I can see the results, using this command:
nimrodel:~ # smartctl --log=selftest /dev/hda Tnx, I was unsure of the command syntax after reading man pages.
[SNIP]
That's all :-)
Nothing else needed, although new brains would be a good idea as mine are getting fried with all the concentrating and remembering they are doing. :) -- The SMARTer Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Thursday 2004-10-21 at 14:30 +0200, Hylton Conacher (ZR1HPC) wrote:
I see you are still confused. Ok, I'll try to explain a bit more. The above line is not needed, Ho come as according to the man page this are listed under the section of SMART FEATURE ENABLE/DISABLE COMMANDS to enable it and allow offline testing. Perhaps it is only needed if the /etc/smartd.conf file is used?
I have not used that feature. I think it is to tell the HD that it should monitor smart values (or not), but not to tell you about it.
I have never used it. What I do is:
Manual testing: ----------------------------- Aaah,so just because I have SMART enabled doesn't mean it is automatically going to check the devices, unless of course the /etc file is called by a cron job?
For example. Or activate the smartd daemon, which is better than cron. -- Cheers, Carlos Robinson
The Saturday 2004-10-09 at 21:17 +0200, Hylton Conacher (ZR1HPC) wrote:
fsck does in fact pass options to fsck.ext3 Yes, but does fsck.ext3 check for bad blocks as that is what I am trying to get done here.
Yes. man fsck.ext3 will tell you how.
I did a search on man and apropos of smartctl and came up with nothing. Even the SuSE help couldn't find anything in GUI mode. :(
You did not look hard enough. Obviously, man something or apropos something can not work till you install that something, don't you think?
If I remember correctly SMART is a technology implemented by the BIOS and so I shall check there too.
It is implemented by the HD bios, let's say. It can also be supported or enabled from the Bios config. -- Cheers, Carlos Robinson
participants (3)
-
Carlos E. R.
-
Hylton Conacher (ZR1HPC)
-
Patrick Shanahan