Carlos E. R. wrote:
The Thursday 2005-01-13 at 15:14 +0200, Hylton Conacher (ZR1HPC) wrote: Sorry. I've rearranged the order of the Q+A also also added substantially to it to ease my understanding . I am trying to find some kind of link/reason/relationship between using fsck, e2fsck and smart if fsck and smart are enabled.
My understanding is that although SMART will check the physical disk, e2fsck will check that the data is able to be written to the physical hdd, and of course according to the fs. So therefore eventhough I have SMART enabled, there might still be a case where data cannot be written to the hdd, resulting in a failed fsck on that partition on bootup.
fsck tests the partition logically, not physically. It can also run a badblock check (in some filetypes), but is certainly not as complete in that respect as smart. So the fsck cmd at boot checks the logical fs and assumes the physical structure to be OK.
If smartctl is enabled and run on the HDD, it checks the physical characteristics ie for bad blocks and keeps a list. It does not however keep a list of them that the fs can access. If smartctl has been run and a block has been marked as bad, the smartctl HDD utility can remap the bad block so that data written to the disk lands on a good block. When the fs is checked at boot time, if it should find a error on the fs caused by the physical HDD( and as a result of it not being able to read the smartctl bad blocks list), it assumes that fsck should be run manually ie e2fsck with -c /dev/hdx to check for bad blocks. This then creates a list of bad blocks the fs cannot use. In effect though we have two lists of bad blocks or does the fsck program via e2fsck -c take the smartctl list and add/append to it? What is the relationship between smartctl, fsck, and e2fsck(being a type of fsck for ext3 fs)? [snip]
Ok, it goes like this: 1) There is a physical error in the media, meaning that what the kernel tries to write is not the same as what it reads back. 2) "Something" marks the block(s) containing that error as bad. Could I assume the 'Something' to be either smartctl or e2fsck?
IF SMART doesn't check for bad blocks, OK, sorry it does check for them but doesn't repair them. What would repair them as neither the e2fsck or mk2fs don't seem to offer a repair option?
then in theory fsck should check for bad blocks
By default, it does not. Because it assume that the physical disk is OK. personally I think the fsck should have the e2fsck -c option added into boot. I'll have to investigate if there is a kernel wishlist.
Because there is no need, and because it is terribly time consuming, a matter of several hours. You are too paranoid about them, I think :-) :) Paranoid I might be but with a stable and safe HDD space to put data on, is something that computer folk(read I) have dreamt of for years ie a self healing system. Heck it would almost negate the purpose of backups, saving companies fortunes.
I would like to run the e2fsck command to prevent the failure of partition checking by fsck on bootup as reding the man page on fsck it does not seem up to working on a ext3 fs.
Then just force a check during boot, by creating the file "/forcefsck". An ext3 partition will be checked to the needed level, not more. Doing a badblock check everytime is an overkill, and will not really protect your data. /forcefsck is a partial solution as fsck itself doesn't check bad blocks, only e2fsck does. The next thing is to find out where and what syntax to use to get e2fsck to run at boot, if the standard boot fsck fails.
It may be overkill but it provides folk with a little more data safety.
That '/forcefsck' option is a little strong
Why? The standard fsck normally runs at boot anyway, it is the e2fsck -c option that has to be called on if the standard fsck fails. It would be a good idea to '/force2fsck -c' if and only if the normal fsck failed.
/forcefsck just tells the '/etc/init.d/boot.localfs' that you want to check the filesystems regardless of whether it is needed or not. It doesn't do anything "drastic". mmmm, had a look at the /etc...bootlocalfs file and see that it mentions that if the error code is 1 then all is OK, if 2 the fsck failed and the message of 'fsck has failed. Please run fsck manually is displayed.
what I am saying is that on returning an error 2 the kernel should automatically run e2fsck -c.
but see the next paragraph for my suggestion. Why have the 'bad block' option and why will it not protect my data? Surely it will make sure that data is not lost because the block has been marked as bad and therefore the data will be written to a good block?
It only detects what new sectors are bad at that moment. What if the error develops later, while the system is running? In fact, while running the HD is more vulnerable, because the heads are not parked, but flying at a very small distance from the HD surface. So running e2fsck -c regularly might prevent writing to a bad block that developed since your last e2fsck -c?
For a somewhat more complete check, boot from the rescue CD and test from there.
I was thinking more along the lines of possibly aliasing the boot fsck to e2fsck and having it run e2fsck each time the fsck is supposed to run on a partition set with the tune2fs cmd ie every 3rd mount or 15 days etc.
fsck calls e2fsck for you. How so, fsck doesn't check for bad blocks you end up having to run e2fsck if the boot fsck fails?
Playing with that is dangerous, because at some time you may have a differently formated partition and apply the wrong program. And not being a programmer, do not worry, I'll stay well away.
Look: SuSE people are quite expert and wise, and they have designed those scripts with a lot of thought and care. You really do not have to modify them.
If you are worried about bad blocks, do: 1) configure smartctld to run tests periodically on the background. 2) Keep your backups current. 3) If you really need it, use raid setups.
OK. To summarize: A) HDD has not had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad 3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block check can be run via e2fsck -c. 4) e2fsck is run, finds the bad block and marks it as such. 4.1) The data on that bad block is lost in my case: B) /home HDD and backup HDD have had smartctl enabled 1) When fsck is run it scans the fs on the disk and makes sure it is logically correct. 2) Data is written to the HDD 2.1) Part of the data written lands in a bad block not marked as bad 3) Not knowing that some of the data has landed on a bad sector I run a smartctl scan 3.1) It finds the error and marks the bad clock 3) If fsck is run now it will determine that there is an error on the fs and instruct you to run it manually, presumably so that the bad block list of the fs can be updated via e2fsck -c. 4) e2fsck is run, finds the bad block and marks it as such. 4.1) The data on that bad block is lost So how to fsck, e2fsck and smart integrate? Sorry if I've left out/assumed steps. -- The Little Helper ======================================================================== Hylton Conacher - Linux user # 229959 at http://counter.li.org Currently using SuSE 9.0 Professional with KDE 3.1 Licenced Windows user ========================================================================