Preventing fs errors with the e2fsck command?
Looking into running a periodic check on my partitions to check for bad sectors. Reading the man pages on fsck and e2fsck I decided that e2fsck was the better of the two for this purpose and therefore will use the command: #> e2fsck -ccfpvC 0 </dev/hdxx> The above command will as per the man page: e2fsck -cc This option causes e2fsck to run the badblocks(8) program If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test. -f Force checking even if the file system seems clean. -p Automatically repair ("preen") the file system without any questions. -v Verbose mode on -C 0 This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal. Do I specify a double option(-c) as above? Is this advised over a single c ie which is better to detect fs errors and repair them? -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Reading the man pages on fsck and e2fsck I decided that e2fsck was the better of the two for this purpose and therefore will use the command:
#> e2fsck -ccfpvC 0 </dev/hdxx>
The above command will as per the man page: e2fsck -cc This option causes e2fsck to run the badblocks(8) program If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test. -f Force checking even if the file system seems clean. -p Automatically repair ("preen") the file system without any questions. -v Verbose mode on -C 0 This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal.
Do I specify a double option(-c) as above? Is this advised over a single c ie which is better to detect fs errors and repair them?
You installed ext2 partitions? EXT2 died since about SuSE 6.x or is this an old distro? If you have reiserfs, don't try e2fsck on it, it gives horrible errors, thankfully when I acccidentally did it some years back, it didn't do any damage, just put up strange errors. Regards Sid. -- Sid Boyce .... Hamradio G3VBV and Keen Flyer =====ALMOST ALL LINUX USED HERE, Solaris 10 SPARC is just for play=====
Sid Boyce wrote:
Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Reading the man pages on fsck and e2fsck I decided that e2fsck was the better of the two for this purpose and therefore will use the command:
#> e2fsck -ccfpvC 0 </dev/hdxx>
The above command will as per the man page: e2fsck -cc This option causes e2fsck to run the badblocks(8) program If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test. -f Force checking even if the file system seems clean. -p Automatically repair ("preen") the file system without any questions. -v Verbose mode on -C 0 This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal.
Do I specify a double option(-c) as above? Is this advised over a single c ie which is better to detect fs errors and repair them?
You installed ext2 partitions? EXT2 died since about SuSE 6.x or is this an old distro? Thankfully not, I have all my partitions ext3. Reading the man page I see that fsck is more for ext2 fs while e2fsck is for ext3 ones.
If you have reiserfs, don't try e2fsck on it, it gives horrible errors, thankfully when I acccidentally did it some years back, it didn't do any damage, just put up strange errors. I have the ability for Reiser but happen to be comfortable with ext3, and happen to know it has a longer track record.
-- The e2fsck Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
Hylton Conacher (ZR1HPC) wrote:
Sid Boyce wrote:
Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Reading the man pages on fsck and e2fsck I decided that e2fsck was the better of the two for this purpose and therefore will use the command:
#> e2fsck -ccfpvC 0 </dev/hdxx>
The above command will as per the man page: e2fsck -cc This option causes e2fsck to run the badblocks(8) program If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test. -f Force checking even if the file system seems clean. -p Automatically repair ("preen") the file system without any questions. -v Verbose mode on -C 0 This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal.
Do I specify a double option(-c) as above? Is this advised over a single c ie which is better to detect fs errors and repair them?
You installed ext2 partitions? EXT2 died since about SuSE 6.x or is this an old distro?
Thankfully not, I have all my partitions ext3. Reading the man page I see that fsck is more for ext2 fs while e2fsck is for ext3 ones.
If you have reiserfs, don't try e2fsck on it, it gives horrible errors, thankfully when I acccidentally did it some years back, it didn't do any damage, just put up strange errors.
I have the ability for Reiser but happen to be comfortable with ext3, and happen to know it has a longer track record.
See http://www.redhat.com/support/wpapers/redhat/ext3/ for the full writeup. ext3 has not been stable for long, reiserfs predates it by a fair margin of time, I think we saw reiserfs in either SuSE 6.x or 7.x, less than a year ago ext3 was having serious problems, it's also a slower performer in the published comparison tests. Regards Sid. -- Sid Boyce .... Hamradio G3VBV and Keen Flyer =====ALMOST ALL LINUX USED HERE, Solaris 10 SPARC is just for play=====
Sid Boyce wrote:
Hylton Conacher (ZR1HPC) wrote:
[snip] [ext3 vs Reiser]
See http://www.redhat.com/support/wpapers/redhat/ext3/ for the full writeup. ext3 has not been stable for long, reiserfs predates it by a fair margin of time, I think we saw reiserfs in either SuSE 6.x or 7.x, less than a year ago ext3 was having serious problems, it's also a slower performer in the published comparison tests. Tnx Sid, I'll have a look at the document but it doesn't answer the initial question.
-- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
On Monday 10 Jan 2005 05:49, Hylton Conacher (ZR1HPC) wrote:
Sid Boyce wrote:
Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Reading the man pages on fsck and e2fsck I decided that e2fsck was the better of the two for this purpose and therefore will use the command:
#> e2fsck -ccfpvC 0 </dev/hdxx>
The above command will as per the man page: e2fsck -cc This option causes e2fsck to run the badblocks(8) program If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test. -f Force checking even if the file system seems clean. -p Automatically repair ("preen") the file system without any questions. -v Verbose mode on -C 0 This option causes e2fsck to write completion information to the specified file descriptor so that the progress of the filesystem check can be monitored. This option is typically used by programs which are running e2fsck. If the file descriptor specified is 0, e2fsck will print a completion bar as it goes about its business. This requires that e2fsck is running on a video console or terminal.
Do I specify a double option(-c) as above? Is this advised over a single c ie which is better to detect fs errors and repair them?
You installed ext2 partitions? EXT2 died since about SuSE 6.x or is this an old distro?
Thankfully not, I have all my partitions ext3. Reading the man page I see that fsck is more for ext2 fs while e2fsck is for ext3 ones.
If you have reiserfs, don't try e2fsck on it, it gives horrible errors, thankfully when I acccidentally did it some years back, it didn't do any damage, just put up strange errors.
I have the ability for Reiser but happen to be comfortable with ext3, and happen to know it has a longer track record.
I think not by a very very long margin ext3 has recently (last year or so had some serious problems) I have been using Reiser now since the early 6.X releases of suse and the only problems i have had are due to failed hardware hard drives are not built to last now the die when left on for months at a time . Pete -- Linux user No: 256242 Machine No: 139931 G6NJR Pete also MSA registered "Quinton 11" A Linux Only area Happy bug hunting M$ clan, The time is here to FORGET that M$ Corp ever existed the world does not NEED M$ Corp the world has NO USE for M$ Corp it is time to END M$ Corp , Play time is over folks time for action approaches at an alarming pace the death knell for M$ Copr has been sounded . Termination time is around the corner ..
Just remember that before you run fsck (or e2fsck) that file system MUST be unmounted or, if you run it on root, then root must be mounted as read only. -- Jerry Feldman <gaf@blu.org> Boston Linux and Unix user group http://www.blu.org PGP key id:C5061EA9 PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9
The Monday 2005-01-10 at 07:49 +0200, Hylton Conacher (ZR1HPC) wrote:
Thankfully not, I have all my partitions ext3. Reading the man page I see that fsck is more for ext2 fs while e2fsck is for ext3 ones.
Not so; fsck is for _any_ format, whereas e2fsck checks "Linux second extended file system", ie, ext2, and has been extended for ext3, as its own manual page says: DESCRIPTION e2fsck is used to check a Linux second extended file system(ext2fs). E2fsck also supports ext2 filesystems countaining a journal, which are also sometimes known as ext3 filesystems, by first applying the journal to the filesystem before continuing with normal e2fsck processing. After the journal has been applied, a filesystem will normally be marked as clean. Hence, for ext3 filesystems, e2fsck will normally run the journal and exit, unless its superblock indicates that further checking is required. If you type fsck[tab][tab] you will see there are more programs: fsck fsck.ext2 fsck.jfs fsck.msdos fsck.vfat fsck.cramfs fsck.ext3 fsck.minix fsck.reiserfs fsck.xfs and fsck simply calls the one that is adecuate in each case, automatically.
I have the ability for Reiser but happen to be comfortable with ext3, and happen to know it has a longer track record.
Er... not really; ext2 is certainly older than reiser, but ext3 is younger. -- Cheers, Carlos Robinson
The Saturday 2005-01-08 at 20:09 +0200, Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system. -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The Saturday 2005-01-08 at 20:09 +0200, Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system.
chkconfig smartd on /etc/init.d/smartd start In 9.2 /etc/smartd.conf is already setup to scan all ATA and SCSI devices, you can set up the intervals there also. Regards Sid. -- Sid Boyce .... Hamradio G3VBV and Keen Flyer =====ALMOST ALL LINUX USED HERE, Solaris 10 SPARC is just for play=====
On Sunday 09 January 2005 03:44, Sid Boyce wrote:
Carlos E. R. wrote:
The Saturday 2005-01-08 at 20:09 +0200, Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system.
chkconfig smartd on /etc/init.d/smartd start
In 9.2 /etc/smartd.conf is already setup to scan all ATA and SCSI devices, you can set up the intervals there also. Regards Sid.
Sid, Running 9.2 but don't seem to have any of those files on my system. Is it installed by default ? B. Stia
:On Monday 10 January 2005 05:00, B. Stia wrote:
Running 9.2 but don't seem to have any of those files on my system. Is it installed by default ?
B. Stia
Install the smartmontools package from Yast. Use the smartctl command to initially test the disk : smartctl -t long /dev/hda You can do this on a running system. The test is asynchronous. To see if the test has finished type : smartctl -c /dev/hda | less Look at some of the ondisk logs : smartctl -l error | less smartctl -l selftest | less then set up your smartd.conf file. The default file automatically scans your system which is inefficient. Read it to get some examples of usage. My file is : smartd.conf: # Sample configuration file for smartd. See man smartd.conf. # Home page is: http://smartmontools.sourceforge.net # $Id: smartd.conf,v 1.33 2004/01/13 16:53:06 ballen4705 Exp $ # smartd will re-read the configuration file if it receives a HUP # signal # The file gives a list of devices to monitor using smartd, with one # device per line. Text after a hash (#) is ignored, and you may use # spaces and tabs for white space. You may use '\' to continue lines. /dev/hda -a -m root -o on -S on -s (S/../.././02|L/../../6/03) -M test /dev/hde -a -m root -o on -S on -s (S/../.././03|L/../../6/05) -M test End of smartd.conf: Then make the smartd service permanent
On Monday 10 January 2005 02:37, Paul Hewlett wrote:
:On Monday 10 January 2005 05:00, B. Stia wrote: :
Running 9.2 but don't seem to have any of those files on my system. Is it installed by default ?
B. Stia
Install the smartmontools package from Yast. Use the smartctl command to initially test the disk :
smartctl -t long /dev/hda
You can do this on a running system.
The test is asynchronous. To see if the test has finished type :
smartctl -c /dev/hda | less
Look at some of the ondisk logs :
smartctl -l error | less smartctl -l selftest | less
OK, Installed and ran - Took about 25 minutes. Then ran smartctl -t long /dev/hda Reported no errors or never run before. Then tried to look at the logs for error & selftest and got the following message.(Copied your example into the CLI). ERROR: smartctl requires a device name as the final command-line argument. Something wrong with what I am doing ??
then set up your smartd.conf file. The default file automatically scans your system which is inefficient. Read it to get some examples of usage. My file is :
OK, that will be the next step.
smartd.conf:
# Sample configuration file for smartd. See man smartd.conf.
# Home page is: http://smartmontools.sourceforge.net
# $Id: smartd.conf,v 1.33 2004/01/13 16:53:06 ballen4705 Exp $
# smartd will re-read the configuration file if it receives a HUP # signal
# The file gives a list of devices to monitor using smartd, with one # device per line. Text after a hash (#) is ignored, and you may use # spaces and tabs for white space. You may use '\' to continue lines.
/dev/hda -a -m root -o on -S on -s (S/../.././02|L/../../6/03) -M test /dev/hde -a -m root -o on -S on -s (S/../.././03|L/../../6/05) -M test
End of smartd.conf:
Then make the smartd service permanent
And how/where do I do that. Then run it as a chron ?? Sorry about my ignorance. I think this is very important and would like to have it work. Bob S.
The Friday 2005-01-21 at 06:14 -0500, B. Stia wrote:
smartctl -l selftest | less
OK, Installed and ran - Took about 25 minutes. Then ran smartctl -t long /dev/hda Reported no errors or never run before.
Then tried to look at the logs for error & selftest and got the following message.(Copied your example into the CLI). ERROR: smartctl requires a device name as the final command-line argument.
Something wrong with what I am doing ??
Yes, you have to type "a device name as the final command-line argument". That will be /dev/hda, probably, depends on your configuration.
Then make the smartd service permanent
And how/where do I do that. Then run it as a chron ?? Sorry about my ignorance. I think this is very important and would like to have it work.
As every other service: in yast, "System/Runlevel Editor". Or, on the CLI "chkconfig smartd on". -- Cheers, Carlos Robinson
B. Stia wrote:
On Sunday 09 January 2005 03:44, Sid Boyce wrote:
Carlos E. R. wrote:
The Saturday 2005-01-08 at 20:09 +0200, Hylton Conacher (ZR1HPC)
wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system.
chkconfig smartd on /etc/init.d/smartd start
In 9.2 /etc/smartd.conf is already setup to scan all ATA and SCSI devices, you can set up the intervals there also. Regards Sid.
Sid,
Running 9.2 but don't seem to have any of those files on my system. Is it installed by default ?
B. Stia
smartmontools-5.32-2 needs to be installed. Regards Sid. -- Sid Boyce .... Hamradio G3VBV and Keen Flyer =====ALMOST ALL LINUX USED HERE, Solaris 10 SPARC is just for play=====
Carlos E. R. wrote:
The Saturday 2005-01-08 at 20:09 +0200, Hylton Conacher (ZR1HPC) wrote:
Looking into running a periodic check on my partitions to check for bad sectors.
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system. OK SMART enabled but I am still wondering about e2fsck.
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup. I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs. -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Monday 2005-01-10 at 07:49 +0200, Hylton Conacher (ZR1HPC) wrote:
Use the SMART capabilities of your drive, for example using smartctl. It can be configured as a daemon and test your drives fully at specified times without closing or stopping the system. OK SMART enabled but I am still wondering about e2fsck.
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup.
fsck tests the partition logically, not physically. It can also run a badblock check (in some filetypes), but is certainly not as complete in that respect as smart.
I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs.
Then just force a check during boot, by creating the file "/forcefsck". An ext3 partition will be checked to the needed level, not more. Doing a badblock check everytime is an overkill, and will not really protect your data. For a somewhat more complete check, boot from the rescue CD and test from there. -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The Monday 2005-01-10 at 07:49 +0200, Hylton Conacher (ZR1HPC) wrote:
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup.
fsck tests the partition logically, not physically. It can also run a badblock check (in some filetypes), but is certainly not as complete in that respect as smart. Just to clarify, would a bad block be a physical defect or a logic error ie the fs thinks the physical media is bad but it isn't? Does SMART technology take care of looking after the physical state of
I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs.
Then just force a check during boot, by creating the file "/forcefsck". An ext3 partition will be checked to the needed level, not more. Doing a badblock check everytime is an overkill, and will not really protect your data. That '/forcefsck' option is a little strong but see the next paragraph for my suggestion. Why have the 'bad block' option and why will it not
[snip] the disk, bad blocks included? IF SMART doesn't check for bad blocks, then in theory fsck should check for bad blocks as logic would say writing data to those bad blocks will result in data loss? Bad block checking can be implemented on a ext3 fs with e2fsck but I wonder why the bootup fsck doesn't do a bad blocks check? mmmm, Running the following: man fsck.ext3 brings up the e2fsck man page protect my data? Surely it will make sure that data is not lost because the block has been marked as bad and therefore the data will be written to a good block?
For a somewhat more complete check, boot from the rescue CD and test from there. I was thinking more along the lines of possibly aliasing the boot fsck to e2fsck and having it run e2fsck each time the fsck is supposed to run on a partition set with the tune2fs cmd ie every 3rd mount or 15 days etc.
-- The bad block Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Thursday 2005-01-13 at 15:14 +0200, Hylton Conacher (ZR1HPC) wrote:
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup.
fsck tests the partition logically, not physically. It can also run a badblock check (in some filetypes), but is certainly not as complete in that respect as smart.
Just to clarify, would a bad block be a physical defect or a logic error ie the fs thinks the physical media is bad but it isn't? Does SMART technology take care of looking after the physical state of the disk, bad blocks included?
Ok, it goes like this: 1) There is a physical error in the media, meaning that what the kernel tries to write is not the same as what it reads back. 2) "Something" marks the block(s) containing that error as bad. Note: the reiserfsck program since SuSE 9.? can also mark badblocks. Version 8.2 could not. By logical error I mean some kind of indexing error, data that it is not where it should, etc. A software error, like the kernel failing to write something to disk. These errors are detected and corrected only by fsck.
IF SMART doesn't check for bad blocks,
The program smartctl fires a program residing in the HD bios itself, and that does detect bad blocks (depending on the manufacturer). It does not repair them. However, a modern hard disk can remap bad blocks to somewhere else reserved by the manufacturer. This is transparent to the OS, but it is triggered only when writing to a sector that is detected at that moment to be unwritable reliably.
then in theory fsck should check for bad blocks
By default, it does not.
as logic would say writing data to those bad blocks will result in data loss? Bad block checking can be implemented on a ext3 fs with e2fsck but I wonder why the bootup fsck doesn't do a bad blocks check?
Because there is no need, and because it is terribly time consuming, a matter of several hours. You are too paranoid about them, I think :-)
mmmm, Running the following: man fsck.ext3 brings up the e2fsck man page
Yes.
I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs.
Then just force a check during boot, by creating the file "/forcefsck". An ext3 partition will be checked to the needed level, not more. Doing a badblock check everytime is an overkill, and will not really protect your data. That '/forcefsck' option is a little strong
Why? It just tells the '/etc/init.d/boot.localfs' that you want to check the filesystems regardless of whether it is needed or not. It doesn't do anything "drastic".
but see the next paragraph for my suggestion. Why have the 'bad block' option and why will it not protect my data? Surely it will make sure that data is not lost because the block has been marked as bad and therefore the data will be written to a good block?
It only detects what new sectors are bad at that moment. What if the error develops later, while the system is running? In fact, while running the HD is more vulnerable, because the heads are not parked, but flying at a very small distance from the HD surface.
For a somewhat more complete check, boot from the rescue CD and test from there. I was thinking more along the lines of possibly aliasing the boot fsck to e2fsck and having it run e2fsck each time the fsck is supposed to run on a partition set with the tune2fs cmd ie every 3rd mount or 15 days etc.
fsck calls e2fsck for you. Playing with that is dangerous, because at some time you may have a differently formated partition and apply the wrong program. Look: SuSE people are quite expert and wise, and they have designed those scripts with a lot of thought and care. You really do not have to modify them. If you are worried about bad blocks, do: 1) configure smartctld to run tests periodically on the background. 2) Keep your backups current. 3) If you really need it, use raid setups. -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The Thursday 2005-01-13 at 15:14 +0200, Hylton Conacher (ZR1HPC) wrote: Sorry. I've rearranged the order of the Q+A also also added substantially to it to ease my understanding . I am trying to find some kind of link/reason/relationship between using fsck, e2fsck and smart if fsck and smart are enabled.
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup.
fsck tests the partition logically, not physically. It can also run a badblock check (in some filetypes), but is certainly not as complete in that respect as smart. So the fsck cmd at boot checks the logical fs and assumes the physical structure to be OK.
If smartctl is enabled and run on the HDD, it checks the physical characteristics ie for bad blocks and keeps a list. It does not however keep a list of them that the fs can access. If smartctl has been run and a block has been marked as bad, the smartctl HDD utility can remap the bad block so that data written to the disk lands on a good block. When the fs is checked at boot time, if it should find a error on the fs caused by the physical HDD( and as a result of it not being able to read the smartctl bad blocks list), it assumes that fsck should be run manually ie e2fsck with -c /dev/hdx to check for bad blocks. This then creates a list of bad blocks the fs cannot use. In effect though we have two lists of bad blocks or does the fsck program via e2fsck -c take the smartctl list and add/append to it? What is the relationship between smartctl, fsck, and e2fsck(being a type of fsck for ext3 fs)? [snip]
Ok, it goes like this: 1) There is a physical error in the media, meaning that what the kernel tries to write is not the same as what it reads back. 2) "Something" marks the block(s) containing that error as bad. Could I assume the 'Something' to be either smartctl or e2fsck?
IF SMART doesn't check for bad blocks, OK, sorry it does check for them but doesn't repair them. What would repair them as neither the e2fsck or mk2fs don't seem to offer a repair option?
then in theory fsck should check for bad blocks
By default, it does not. Because it assume that the physical disk is OK. personally I think the fsck should have the e2fsck -c option added into boot. I'll have to investigate if there is a kernel wishlist.
Because there is no need, and because it is terribly time consuming, a matter of several hours. You are too paranoid about them, I think :-) :) Paranoid I might be but with a stable and safe HDD space to put data on, is something that computer folk(read I) have dreamt of for years ie a self healing system. Heck it would almost negate the purpose of backups, saving companies fortunes.
I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs.
Then just force a check during boot, by creating the file "/forcefsck". An ext3 partition will be checked to the needed level, not more. Doing a badblock check everytime is an overkill, and will not really protect your data. /forcefsck is a partial solution as fsck itself doesn't check bad blocks, only e2fsck does. The next thing is to find out where and what syntax to use to get e2fsck to run at boot, if the standard boot fsck fails.
That '/forcefsck' option is a little strong
Why? The standard fsck normally runs at boot anyway, it is the e2fsck -c
/forcefsck just tells the '/etc/init.d/boot.localfs' that you want to check the filesystems regardless of whether it is needed or not. It doesn't do anything "drastic". mmmm, had a look at the /etc...bootlocalfs file and see that it mentions
It may be overkill but it provides folk with a little more data safety. option that has to be called on if the standard fsck fails. It would be a good idea to '/force2fsck -c' if and only if the normal fsck failed. that if the error code is 1 then all is OK, if 2 the fsck failed and the message of 'fsck has failed. Please run fsck manually is displayed. what I am saying is that on returning an error 2 the kernel should automatically run e2fsck -c.
but see the next paragraph for my suggestion. Why have the 'bad block' option and why will it not protect my data? Surely it will make sure that data is not lost because the block has been marked as bad and therefore the data will be written to a good block?
It only detects what new sectors are bad at that moment. What if the error develops later, while the system is running? In fact, while running the HD is more vulnerable, because the heads are not parked, but flying at a very small distance from the HD surface. So running e2fsck -c regularly might prevent writing to a bad block that developed since your last e2fsck -c?
For a somewhat more complete check, boot from the rescue CD and test from there.
I was thinking more along the lines of possibly aliasing the boot fsck to e2fsck and having it run e2fsck each time the fsck is supposed to run on a partition set with the tune2fs cmd ie every 3rd mount or 15 days etc.
fsck calls e2fsck for you. How so, fsck doesn't check for bad blocks you end up having to run e2fsck if the boot fsck fails?
Playing with that is dangerous, because at some time you may have a differently formated partition and apply the wrong program. And not being a programmer, do not worry, I'll stay well away.
Look: SuSE people are quite expert and wise, and they have designed those scripts with a lot of thought and care. You really do not have to modify them.
If you are worried about bad blocks, do: 1) configure smartctld to run tests periodically on the background. 2) Keep your backups current. 3) If you really need it, use raid setups.
OK. To summarize: A) HDD has not had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad 3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block check can be run via e2fsck -c. 4) e2fsck is run, finds the bad block and marks it as such. 4.1) The data on that bad block is lost in my case: B) /home HDD and backup HDD have had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad 3) Not knowing that some of the data has landed on a bad sector I run a smartctl scan 3.1) It finds the error and marks the bad clock 3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block list of the fs can be updated via e2fsck -c. 4) e2fsck is run, finds the bad block and marks it as such. 4.1) The data on that bad block is lost So how to fsck, e2fsck and smart integrate? Sorry if I've left out/assumed steps. -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Saturday 2005-01-15 at 15:41 +0200, Hylton Conacher (ZR1HPC) wrote:
So the fsck cmd at boot checks the logical fs and assumes the physical structure to be OK.
Yes.
If smartctl is enabled and run on the HDD, it checks the physical characteristics ie for bad blocks and keeps a list.
It does far more than that, but it does not keep a list, not exactly: it keeps a log. This log is not a file, but resides on somewhere on the HD (there is a method to retrieve it).
It does not however keep a list of them that the fs can access. If smartctl has been run and a block has been marked as bad,
It does not mark a block as bad, because it knows nothing about the structure of the filesystem. Remember, it is the HD hardware and bios, independent of the main CPU or operating system, that does the testing. The program smartctl just starts the process, it doesn't actually test anything. The HD routine just takes note that there are sectors with read errors, logs that fact, and later smartctl tells you.
the smartctl HDD utility can remap the bad block so that data written to the disk lands on a good block.
No, it doesn't: it is your job to take appropriate action. You can, for example, simply try to write to the bad sector. The disk notes that it can't, and at that moment remaps the sector. This is transparent to any filesystem or operating system you use. Or you can take down the computer, reboot from the rescue CD, and then use the tools there to repair the filesystem: not really repair, but use a filesystem check utility that can locate and remap bad blocks. For example.
When the fs is checked at boot time, if it should find a error on the fs caused by the physical HDD( and as a result of it not being able to read the smartctl bad blocks list), it assumes that fsck should be run manually ie e2fsck with -c /dev/hdx to check for bad blocks. This then creates a list of bad blocks the fs cannot use.
In effect though we have two lists of bad blocks or does the fsck program via e2fsck -c take the smartctl list and add/append to it?
No, no. smartctl does not generate any list.
What is the relationship between smartctl, fsck, and e2fsck(being a type of fsck for ext3 fs)?
smartctl is independent. fsck is a wrapper that calls the actual program needed to check each partition type with its own different checker program. In effect, fsck calls fsck.ext3, which is the same as e2fsck, but with a different name. Look and believe: -rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/fsck.ext3* -rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/e2fsck* nimrodel:~ # cmp /sbin/e2fsck /sbin/fsck.ext3 nimrodel:~ # Convinced? Both are the _same_ program. To see the list of errors SMART logs, do: smartctl --log=error /dev/hdb |less SMART Error Log Version: 1 ATA Error Count: 325 (device log contains only the most recent five errors) ... Error 325 occurred at disk power-on lifetime: 4275 hours When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 64 9c 16 70 51 Error: UNC at LBA = 0x0170169c = 24123036 There you see the LBA address of the error I had _years_ ago; this is not usable for fsck. With this other command, I see the log of the last 20 tests: smartctl --log=selftest /dev/hdb |less smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 8099 - # 2 Short offline Completed without error 00% 8092 - # 3 Short offline Completed without error 00% 8092 - # 4 Short offline Completed without error 00% 8085 - # 5 Short offline Completed without error 00% 8085 - # 6 Short offline Completed without error 00% 8072 - # 7 Short offline Completed without error 00% 8072 - # 8 Short offline Completed without error 00% 8058 - # 9 Short offline Completed without error 00% 8057 - #10 Short offline Completed without error 00% 8046 - #11 Short offline Completed without error 00% 8045 - #12 Short offline Completed without error 00% 8043 - #13 Short offline Completed without error 00% 8038 - #14 Extended offline Completed without error 00% 8038 - If there were errors, the sector (LBA sector, not ext3 sector) would be logged there. But only the first one. And... it is not the first time I tell you: I did last October: Date: Mon, 18 Oct 2004 13:07:25 +0200 (CEST) From: Carlos E. R. To: SLE <suse-linux-e Subject: Re: [SLE] Re: {SLE} SMART HDD technology was Re: [SLE] e2fsck command
[snip]
Ok, it goes like this: 1) There is a physical error in the media, meaning that what the kernel tries to write is not the same as what it reads back. 2) "Something" marks the block(s) containing that error as bad. Could I assume the 'Something' to be either smartctl or e2fsck?
smartctl no. e2fsck, yes. Also, the kernel "could" do it on the fly.
IF SMART doesn't check for bad blocks, OK, sorry it does check for them but doesn't repair them. What would repair them as neither the e2fsck or mk2fs don't seem to offer a repair option?
Bad blocks can not be "repaired": they are bad, and remain so for ever. They can be "remapped", depending on the filesystem: FAT, ext2, reiserfs partially (since SuSE 9.0 or 9.1). And e2fsck can do that (with -c).
then in theory fsck should check for bad blocks
By default, it does not. Because it assume that the physical disk is OK. personally I think the fsck should have the e2fsck -c option added into boot. I'll have to investigate if there is a kernel wishlist.
No, I don't. Nobody will add it, and certainly not to the kernel, as it is the boot.localfs script that does the testing, not the kernel. Nobody will want that "feature", because then boot time will be measured in HOURS!
Because there is no need, and because it is terribly time consuming, a matter of several hours. You are too paranoid about them, I think :-) :) Paranoid I might be but with a stable and safe HDD space to put data on, is something that computer folk(read I) have dreamt of for years ie a self healing system. Heck it would almost negate the purpose of backups, saving companies fortunes.
You can never ensure that a system will never fail. You can just make it less probable, and limit the damage. My home disks are now about 8000 hours old (of real usage). One of them developed a few bad blocks a long time ago (more than a year or two). I solved the incident, manually, and that is the last I heard of that for a very long time. And that doesn't mean that I'm going to worry to the extreme of having to wait several _hours_ after power on before the system comes on line, because it is testing itself for I/O errors on all the HD surfaces! Do you go every day to your doctor to have a full checkout before getting up? Because that is what you are asking.
That '/forcefsck' option is a little strong
Why? The standard fsck normally runs at boot anyway, it is the e2fsck -c option that has to be called on if the standard fsck fails. It would be a good idea to '/force2fsck -c' if and only if the normal fsck failed.
/forcefsck just tells the '/etc/init.d/boot.localfs' that you want to check the filesystems regardless of whether it is needed or not. It doesn't do anything "drastic". mmmm, had a look at the /etc...bootlocalfs file and see that it mentions that if the error code is 1 then all is OK, if 2 the fsck failed and the message of 'fsck has failed. Please run fsck manually is displayed.
Right.
what I am saying is that on returning an error 2 the kernel should automatically run e2fsck -c.
No. The error can be anything! It is you who must decide what to do. It can be a false alarm, a misconfiguration... more automated action and you may ruin your HD completely :-|
but see the next paragraph for my suggestion. Why have the 'bad block' option and why will it not protect my data? Surely it will make sure that data is not lost because the block has been marked as bad and therefore the data will be written to a good block?
It only detects what new sectors are bad at that moment. What if the error develops later, while the system is running? In fact, while running the HD is more vulnerable, because the heads are not parked, but flying at a very small distance from the HD surface. So running e2fsck -c regularly might prevent writing to a bad block that developed since your last e2fsck -c?
Yes. So would smartctl, and that doesn't hold up the system. No! If the HD detects an error when it is going to _write_ to a sector, that sector is remapped then and there, without telling anybody. Certainly the kernel knows nothing (only a time delay). An fsck run later would know nothing about it, the sector would show as correct (because it has been remaped somewere else transparently).
For a somewhat more complete check, boot from the rescue CD and test from there.
I was thinking more along the lines of possibly aliasing the boot fsck to e2fsck and having it run e2fsck each time the fsck is supposed to run on a partition set with the tune2fs cmd ie every 3rd mount or 15 days etc.
fsck calls e2fsck for you. How so, fsck doesn't check for bad blocks you end up having to run e2fsck if the boot fsck fails?
Read again the man page. fsck passes options it doesn't know how to handle to e2fsck.
To summarize: A) HDD has not had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad 3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block check can be run via e2fsck -c.
Not by default. There is no logical or structure error.
4) e2fsck is run, finds the bad block and marks it as such. 4.1) The data on that bad block is lost
Correction. If the error is noticed at write time, the HD itself takes action (if it is reasonably modern). If the kernel gets to know, it "could" take action as well (even old MsDOS had this case designed for). If the error develops after writing, it is detected when someone tries to read it again. Remember that fsck and derivatives do not check for read errors.
in my case: B) /home HDD and backup HDD have had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad
As I said above, no.
3) Not knowing that some of the data has landed on a bad sector I run a smartctl scan 3.1) It finds the error and marks the bad clock
No, it is just reported. You could configure the daemon to email you, or sound your beeper. Whatever. :-)
3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block list of the fs can be updated via e2fsck -c.
No, it doesn't. It will be detected only if the error falls in a sector belonging to the structure, like a directory, for the simple reason that it doesn't read everything, unless you give it the -c option.
4) e2fsck is run, finds the bad block and marks it as such.
No. Only if you run it manually with the right option.
4.1) The data on that bad block is lost
Maybe yes, maybe not. It may just mark a sector as bad, or it may try to read it first and remap leaving the file usable. Old chkdsk for MsDOS did, so why not fsck? Of course, as the sector has a read error, the contents of the sector could be wrong. That's a reason more to run it manually, so that you check the file against a backup if you can.
So how to fsck, e2fsck and smart integrate? Sorry if I've left out/assumed steps.
Ough... see somewhere above. fsck just calls e2fsck as soon as it notices it is an ext2/3 partition. fsck itself does nothing (have a look at its size), except detecting the type of partition and pass control to whatever program is appropiate. -- Cheers, Carlos Robinson
Carlos E. R. wrote:
The Saturday 2005-01-15 at 15:41 +0200, Hylton Conacher (ZR1HPC) wrote:
So the fsck cmd at boot checks the logical fs and assumes the physical structure to be OK.
Yes. OK
If smartctl is enabled and run on the HDD, it checks the physical characteristics ie for bad blocks and keeps a list.
It does far more than that, but it does not keep a list, not exactly: it keeps a log. This log is not a file, but resides on somewhere on the HD (there is a method to retrieve it). In this case a log of all the bad blocks, amongst other things.
It does not however keep a list of them that the fs can access. If smartctl has been run and a block has been marked as bad,
It does not mark a block as bad, because it knows nothing about the structure of the filesystem. Remember, it is the HD hardware and bios, independent of the main CPU or operating system, that does the testing. The program smartctl just starts the process, it doesn't actually test anything. The HD routine just takes note that there are sectors with read errors, logs that fact, and later smartctl tells you. Ahh.
the smartctl HDD utility can remap the bad block so that data written to the disk lands on a good block.
No, it doesn't: it is your job to take appropriate action.
You can, for example, simply try to write to the bad sector. The disk notes that it can't, and at that moment remaps the sector. This is transparent to any filesystem or operating system you use. And smartctl does it.
Or you can take down the computer, reboot from the rescue CD, and then use the tools there to repair the filesystem: not really repair, but use a filesystem check utility that can locate and remap bad blocks. Or could you schedule frequent e2fsck -c scans of all the partitions other than /, which would have to be checked via a boot/rescue disk?
When the fs is checked at boot time, if it should find a error on the fs caused by the physical HDD( and as a result of it not being able to read the smartctl log), it assumes that fsck should be run manually ie e2fsck with -c /dev/hdx to check for bad blocks. This then creates a list of bad blocks the fs can use in the future.
In effect though we have two lists of bad blocks or does the fsck program via e2fsck -c take the smartctl list and add/append to it?
No, no. smartctl does not generate any list. OK, Sorry. I understand there is not a smartctl list generated. There is a smartctl log and a e2fsck -c list. Would the first paragraph be right?
What is the relationship between smartctl, fsck, and e2fsck(being a type of fsck for ext3 fs)?
smartctl is independent.
fsck is a wrapper that calls the actual program needed to check each partition type with its own different checker program. In effect, fsck calls fsck.ext3, which is the same as e2fsck, but with a different name. Look and believe:
-rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/fsck.ext3* -rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/e2fsck*
nimrodel:~ # cmp /sbin/e2fsck /sbin/fsck.ext3 nimrodel:~ #
Convinced? Both are the _same_ program. OK, I believe, I believe! :) When fsck calls fsck.ext3, it does not run the program, ie e2fsck -c, but it allows the fs to be checked according to the ext3 rules. If it finds an error it says check manually?
To see the list of errors SMART logs, do:
smartctl --log=error /dev/hdb |less Great, no errors here. But then you already knew that. :)
SMART Error Log Version: 1 ATA Error Count: 325 (device log contains only the most recent five errors) ... Error 325 occurred at disk power-on lifetime: 4275 hours When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 64 9c 16 70 51 Error: UNC at LBA = 0x0170169c = 24123036
There you see the LBA address of the error I had _years_ ago; this is not usable for fsck. With this other command, I see the log of the last 20 tests:
smartctl --log=selftest /dev/hdb |less smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 8099 - # 2 Short offline Completed without error 00% 8092 - # 3 Short offline Completed without error 00% 8092 - # 4 Short offline Completed without error 00% 8085 - # 5 Short offline Completed without error 00% 8085 - # 6 Short offline Completed without error 00% 8072 - # 7 Short offline Completed without error 00% 8072 - # 8 Short offline Completed without error 00% 8058 - # 9 Short offline Completed without error 00% 8057 - #10 Short offline Completed without error 00% 8046 - #11 Short offline Completed without error 00% 8045 - #12 Short offline Completed without error 00% 8043 - #13 Short offline Completed without error 00% 8038 - #14 Extended offline Completed without error 00% 8038 -
If there were errors, the sector (LBA sector, not ext3 sector) would be logged there. But only the first one.
And... it is not the first time I tell you: I did last October: Alright, alright, so I asked it before. Unfortunately I couldn't reference my message store as the dual boot HDD my Linux was on died just after our little episode. :( I also only just found out how to query the archives via Google.
It was a pity but this explanation was a little better ;) I'll ask it again in 6 months time for all the green newbies too. :) Just kidding :) To FINALLY summarize: smartctl looks after the physical HDD and keeps its own log of errors it has found. It does not interface with any other program. fsck looks after the logical fs structure and uses the fs specific fsck option to check that the structure of the fs is per standard. If it finds a ,physical or logical, error it requires manual intervention ie running the e2fsck -c option to confirm the fs is logically correctly and also to scan for errors on the fs caused by the hardware ie bad blocks. Thanks a TON Carlos. -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================
The Thursday 2005-01-20 at 18:42 +0200, Hylton Conacher (ZR1HPC) wrote:
If smartctl is enabled and run on the HDD, it checks the physical characteristics ie for bad blocks and keeps a list.
It does far more than that, but it does not keep a list, not exactly: it keeps a log. This log is not a file, but resides on somewhere on the HD (there is a method to retrieve it). In this case a log of all the bad blocks, amongst other things.
No, it is a log of each error together with the internal command stack leading to the error. Here, look at a sample (one of my disk had errors two years ago): Error 325 occurred at disk power-on lifetime: 4275 hours When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 64 9c 16 70 51 Error: UNC at LBA = 0x0170169c = 24123036 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name -- -- -- -- -- -- -- -- --------- -------------------- 40 d0 00 00 15 70 51 00 3.514 READ VERIFY SECTOR(S) 40 d0 00 00 14 70 51 00 3.476 READ VERIFY SECTOR(S) 40 d0 00 00 13 70 51 00 7.384 READ VERIFY SECTOR(S) 40 d0 00 00 12 70 51 00 3.537 READ VERIFY SECTOR(S) 40 d0 00 00 11 70 51 00 3.499 READ VERIFY SECTOR(S) Error 324 occurred at disk power-on lifetime: 4274 hours When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 64 9c 16 70 51 Error: UNC at LBA = 0x0170169c = 24123036 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name -- -- -- -- -- -- -- -- --------- -------------------- 40 d0 00 00 15 70 51 00 3.473 READ VERIFY SECTOR(S) 40 d0 00 00 14 70 51 00 3.441 READ VERIFY SECTOR(S) 40 d0 00 00 13 70 51 00 7.341 READ VERIFY SECTOR(S) 40 d0 00 00 12 70 51 00 3.476 READ VERIFY SECTOR(S) 40 d0 00 00 11 70 51 00 3.474 READ VERIFY SECTOR(S) See? It is _not_ a list of bad sectors.
You can, for example, simply try to write to the bad sector. The disk notes that it can't, and at that moment remaps the sector. This is transparent to any filesystem or operating system you use. And smartctl does it.
No, not smartctl, but the HD bios, independently. smartctl is just an interface to it - after it happened.
Or you can take down the computer, reboot from the rescue CD, and then use the tools there to repair the filesystem: not really repair, but use a filesystem check utility that can locate and remap bad blocks. Or could you schedule frequent e2fsck -c scans of all the partitions other than /, which would have to be checked via a boot/rescue disk?
Er... yes, I think fsck can do it, but... don't do it frequently. Even more, you can not schedule it, because it has to be a manual operation: the partition has to be umounted first, and that will fail if in use. As I said, for schedule maintenance, use the smartctl daemon, and run the badblocks check manually, _after_ you know there is an error.
Convinced? Both are the _same_ program. OK, I believe, I believe! :)
:-)
When fsck calls fsck.ext3, it does not run the program, ie e2fsck -c, but it allows the fs to be checked according to the ext3 rules. If it finds an error it says check manually?
actually, fsck directly calls fsck.ext3 for ext3 partitions, fsck.reiserfs for reiser partitions, etc. Notice that fsck is a small program, 18484 bytes. Here, have a look at the sizes: -rwxr-xr-x 1 root root 18484 Apr 6 2004 /sbin/fsck* -rwxr-xr-x 1 root root 10580 May 27 2004 /sbin/fsck.cramfs* -rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/fsck.ext2* -rwxr-xr-x 3 root root 131344 Apr 6 2004 /sbin/fsck.ext3* -rwxr-xr-x 2 root root 419926 Apr 6 2004 /sbin/fsck.jfs* -rwxr-xr-x 1 root root 22356 May 27 2004 /sbin/fsck.minix* lrwxrwxrwx 1 root root 7 Aug 15 14:01 /sbin/fsck.msdos -> dosfsck* -rwxr-xr-x 2 root root 275884 Apr 6 2004 /sbin/fsck.reiserfs* lrwxrwxrwx 1 root root 7 Aug 15 14:01 /sbin/fsck.vfat -> dosfsck* -rwxr-xr-x 1 root root 4475 Apr 6 2004 /sbin/fsck.xfs* All the fsck.whatever are much bigger than fsck, they are the real workforces. fsck on its own does nearly nothing, except call the others. Notice also that fsck.ext2 is also the same as fsck.ext3.
To see the list of errors SMART logs, do:
smartctl --log=error /dev/hdb |less
Great, no errors here. But then you already knew that. :)
No errors? What is this, then:
SMART Error Log Version: 1 ATA Error Count: 325 (device log contains only the most recent five errors)
........................^^^ So, this disk did have serious errors, once upon a time. I'm still using it. It cured itself, with some help from me. The manufacturer had taken measures in advance to handle some I/O errors, that's designed for.
If there were errors, the sector (LBA sector, not ext3 sector) would be logged there. But only the first one.
And... it is not the first time I tell you: I did last October: Alright, alright, so I asked it before. Unfortunately I couldn't reference my message store as the dual boot HDD my Linux was on died just after our little episode. :(
Ah... so you are worried about HD dying on you. I see.
I also only just found out how to query the archives via Google.
It was a pity but this explanation was a little better ;) I'll ask it again in 6 months time for all the green newbies too.
:) Just kidding :)
X-)
To FINALLY summarize: smartctl looks after the physical HDD and keeps its own log of errors it has found. It does not interface with any other program.
Right. But it does way more than that, it does some manufacturer tests. Look at the output of "smartctl -a /dev/hdb|less" on your drive, or more to the point, "--attributes". It is a table of some parameters that try to determine the health status of the drive. For example, if the Raw_Read_Error_Rate goes over a certain value, the output of the command: nimrodel:~ # smartctl --health /dev/hdb smartctl version 5.30 Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED would directly say that my drive was about to fail, soon. That's why I say that daemon is _important_. You see, there are more things that can go bad than badblocks. For example, the spindle could have excessive wear, and wobble. The motor may overheat. The controller could have bad chips. The head could have a very bad crash landing... tons of things. Bad blocks on their own are not critical. The first drive I had (a 32Mb unit on a full length ISA card) came with a paper label with a list of six bad sectors, handwritten by the manufacturer at their final QA test. You were supposed to low level format the disk your self, and enter the list. Now days that is no longer necessary, but HDs come with space reserved by the manufacturer to automatically remap there the bad sectors that will develop during its life.
fsck looks after the logical fs structure and uses the fs specific fsck option to check that the structure of the fs is per standard. If it finds a ,physical or logical, error it requires manual intervention ie running the e2fsck -c option to confirm the fs is logically correctly and also to scan for errors on the fs caused by the hardware ie bad blocks.
Yes! :-) Well, almost O:-) The "-c" option is not usually needed. A logical error can appear without having any badblock at all, and may need manual intervention. The thing is, fsck is run automatically at boot time, with the options selected so that it is safe to leave it in automatic mode, or little intervention. It is the script written by SuSE that decides not to continue booting the system and request you run fsck manually - because, for example, it can be a mistaken /etc/fstab file (it happened to me once).
Thanks a TON Carlos.
Welcome :-) -- Cheers, Carlos Robinson
participants (7)
-
B. Stia
-
Carlos E. R.
-
Hylton Conacher (ZR1HPC)
-
Jerry Feldman
-
Paul Hewlett
-
peter Nikolic
-
Sid Boyce