[opensuse-factory] smartmontools: Proposal to enable regular HDD Self Tests
Hallo. Many years ago we decided that smartd will not enable self tests by default. In these old years, there were discs with firmware problems, and self test sometimes caused delays or even freezes. (In these old days, smartd was even disabled by default for the samer reason.) There was discussed possible shortening of life time by self tests. Over the years, hardware changed a lot. Only two HDD manufacturers remain, and no firmware crashes/freezes/delays caused by S.M.A.R.T. were reported for nearly 10 years. With raising density of discs, importance of regular checks raises as well. Detection of weak data can even prevent data losses before they even occur. That is why I think there is a time to re-evaluate the old decision, and think about enabling regular Self Tests by default. Nowadays, without running any self tests: If we don't run any self tests, then there is no guarantee (i. e. specification does not require it) that S.M.A.R.T. is monitored at all. However it seems, that most drives perform Offline Self Test every 4 hours (see item "Automatic Offline Testing") and update core parameters. smartd is able to predict some types of failures that are related to inferior operation, and maybe failing mechanical parts (permanent seek errors, crash-landing of head, degradation of head) or failing electronics. But it is not capable to predict failing disc surface. Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour. No crash triggered by smartd on faulty firmware was not reported since 2005. I propose to keep vital S.M.A.R.T. data frequency check on this default. I also propose to not perform Offline tests by smartd, as most (all?) discs do it every 4 hours, and do Short Self Tests instead. Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes. I propose to run Short Self Test once a day. Benefits: There is no guarantee that the Offline Self Test covers all tests of Short Self Test, and that Offline Self Test is even regularly called by the firmware. Depending onf firmware implementation, Short Self Test may be required to predict some failures of core functions of HDD (i. e. total HDD failures). If firmware does the same during the Offline Self Test and Short Self Test is enabled once per day, number of tests raises from 4 daily to 5. If it does something more, it could give a better prediction. Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours. If it finds a weak but still readable data, it silently relocates them (you only see change in S.M.A.R.T. statistics). If it finds unreadable sector, it retries to read it for some time, and if it fails, S.M.A.R.T. changes overall status to FAILED, and error is reported to the user. Benefits: Running Long Self Test can prevent data loss in files that are not accessed, and if it happens, it is detected early and reported. I propose to run Long Self Test once a month. Risks: There is a risk, that Long Self Test slows down I/O operation due to inferior firmware. But if the firmware is written in a smart way, any read or write request should pause the Self Test, and it should be resumed after some time of being idle. If we suppose well written firmware, it should cause minimal delays. Self Tests resume after reboot. Note that the disc in status FAILED due to unreadable sector can still be "healed" by writing data to the failed place. Writing data to the failed place stops read retrying (now the data are not lost, but overwritten). Immediate relocation is performed, pending unreadable sector count is decreased, and if it reaches zero, next self test returns PASS. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, 4 Mar 2015 22:41, Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
[snip reasons]
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
+1, maybe as the last of 'cron.daily' or what-ever systemd calls that now. ...
Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours. ... I propose to run Long Self Test once a month.
+1, also as last of 'cron.monthly' Thank you for bringing this up to general notice. - Yamaban -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wed, 4 Mar 2015 22:41, Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
[snip reasons]
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
+1, maybe as the last of 'cron.daily' or what-ever systemd calls that now.
...
Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours. ... I propose to run Long Self Test once a month.
+1, also as last of 'cron.monthly'
Thank you for bringing this up to general notice.
- Yamaban Hi How will this affect SSD's and using bcache, fstrim etc? I'm assuming
On Wed 04 Mar 2015 11:30:09 PM CST, Yamaban wrote: the systemd smartd service will still be disabled? -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.36-38-default up 6 days 4:48, 4 users, load average: 0.42, 0.38, 0.27 CPU AMD A4-5150M APU @ 3.3GHz | GPU Richland Radeon HD 8350G -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Malcolm wrote:
Hi How will this affect SSD's and using bcache, fstrim etc?
Exactly as other S.M.A.R.T. capable devices: Run regular device checks. I have no idea what exactly does SSD on these commands. But I can imagine that a lot of checks can be done by the hardware: - checking buffers for possible broken cells - checking consistence of block mapping - checking of checksums of all blocks - checking charge levels of all cells for uncertain values I'm assuming
the systemd smartd service will still be disabled?
No. systemd smartd service is enabled since its introduction. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Stanislav Brabec wrote:
Malcolm wrote:
Hi How will this affect SSD's and using bcache, fstrim etc?
Exactly as other S.M.A.R.T. capable devices: Run regular device checks.
I have no idea what exactly does SSD on these commands. But I can imagine that a lot of checks can be done by the hardware: - checking buffers for possible broken cells - checking consistence of block mapping - checking of checksums of all blocks - checking charge levels of all cells for uncertain values
I'm assuming
the systemd smartd service will still be disabled?
No. systemd smartd service is enabled since its introduction.
Not in 13.2 - # zypper in smartmontools Loading repository data... Reading installed packages... Resolving package dependencies... The following NEW package is going to be installed: smartmontools 1 new package to install. Overall download size: 399.6 KiB. After the operation, additional 1.4 MiB will be used. Continue? [y/n/? shows all options] (y): [snip] office33:~ # systemctl status smartd smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled) Active: inactive (dead) -- Per Jessen, Zürich (5.7°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Per Jessen wrote:
No. systemd smartd service is enabled since its introduction.
Not in 13.2 -
Something gone wrong while handling systemd/sysv. It was started in past by default for many years.
smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled) Active: inactive (dead)
Thanks to pointing to it. Probably the best way would be adding it to /usr/lib/systemd/system-preset/90-default-openSUSE.preset -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu 05 Mar 2015 04:28:34 PM CST, Stanislav Brabec wrote:
Per Jessen wrote:
No. systemd smartd service is enabled since its introduction.
Not in 13.2 -
Something gone wrong while handling systemd/sysv.
It was started in past by default for many years.
smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled) Active: inactive (dead)
Thanks to pointing to it.
Probably the best way would be adding it to /usr/lib/systemd/system-preset/90-default-openSUSE.preset
Hi My 13.1 system doesn't have it enabled or running by default. The disks in this system are over 10 years old with > 30K hours and still running fine (1 x 36GB Raptor (2003) and 2 x 500GB WD RE's). I don't have any issue with changing the default setup etc as seen fit. Just a caution in the release notes that it may be active and can be disabled and/or further configuration may be required for the end users setup. -- Cheers Malcolm °¿° LFCS, SUSE Knowledge Partner (Linux Counter #276890) SUSE Linux Enterprise Desktop 12 GNOME 3.10.1 Kernel 3.12.36-38-default up 1:09, 3 users, load average: 0.03, 0.07, 0.12 CPU AMD A4-5150M APU @ 3.3GHz | GPU Richland Radeon HD 8350G -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Stanislav Brabec wrote:
Per Jessen wrote:
No. systemd smartd service is enabled since its introduction.
Not in 13.2 -
Something gone wrong while handling systemd/sysv.
It was started in past by default for many years.
smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled) Active: inactive (dead)
Thanks to pointing to it.
Probably the best way would be adding it to /usr/lib/systemd/system-preset/90-default-openSUSE.preset
https://bugzilla.opensuse.org/show_bug.cgi?id=921075 -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Am Donnerstag, 5. März 2015, 16:16:41 schrieb Per Jessen:
Stanislav Brabec wrote:
Malcolm wrote:
Hi How will this affect SSD's and using bcache, fstrim etc?
Exactly as other S.M.A.R.T. capable devices: Run regular device checks.
I have no idea what exactly does SSD on these commands. But I can imagine that a lot of checks can be done by the hardware: - checking buffers for possible broken cells - checking consistence of block mapping - checking of checksums of all blocks - checking charge levels of all cells for uncertain values
I'm assuming
the systemd smartd service will still be disabled?
No. systemd smartd service is enabled since its introduction.
Not in 13.2 -
# zypper in smartmontools Loading repository data... Reading installed packages... Resolving package dependencies...
The following NEW package is going to be installed: smartmontools
1 new package to install. Overall download size: 399.6 KiB. After the operation, additional 1.4 MiB will be used. Continue? [y/n/? shows all options] (y): [snip] office33:~ # systemctl status smartd smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; disabled) Active: inactive (dead) Hello Per,
then enable it with: systemctl enable smartd and, when it is not started automatically, do it with: systemctl start smartd or systemctl restart smartd Bye, Emil -- Registered Linux User since 19940320 ---------------------------------------------------------- Emil Stephan, Albersloher Weg 571A, 48167 Münster, Germany Accelerate Windows: 9.80665 m/sec^2 would be adequate -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 04 March 2015 23.30:09 Yamaban wrote:
On Wed, 4 Mar 2015 22:41, Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
[snip reasons]
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
+1, maybe as the last of 'cron.daily' or what-ever systemd calls that now.
...
Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours. ... I propose to run Long Self Test once a month.
+1, also as last of 'cron.monthly'
Thank you for bringing this up to general notice.
- Yamaban
Is it really the best way? If you have a whatever storage computer with lot of disk and there's a running backup at that time, it will kill all the io bandwidth. The mention of that "behaviour for some" "bug for some other" will be hard to detect no? I guess that most real admin in charge of server/dekstop knows the perfect time to run full deep long test. We will impose 2 actions, remove default scripts and install theirs. And what about smartd and udisk2 doing the same work? On most desktop udiskd2 is running and already doing smart check status ? -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Board, fsfe fellowship GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Bruno Friedmann wrote:
On Wednesday 04 March 2015 23.30:09 Yamaban wrote:
On Wed, 4 Mar 2015 22:41, Stanislav Brabec wrote: +1, maybe as the last of 'cron.daily' or what-ever systemd calls that now. +1, also as last of 'cron.monthly'
Not from cron. smartd itself has this capability. It is configured in /etc/smartd.conf. It starts it with delays, and check, whether the last check already finished.
Is it really the best way? If you have a whatever storage computer with lot of disk and there's a running backup at that time, it will kill all the io bandwidth.
This should not kill I/O bandwidth. Firmware should postpone the check until the device is idle.
The mention of that "behaviour for some" "bug for some other" will be hard to detect no?
It would not be our bug, but HDD manufacturers' bug. The last report about such bug in our Bugzilla is dated People will still have a chance to: - Upgrade HDD firmware - Notify HDD manufacturer about this problem. - Turn S.M.A.R.T. checks off. - Run checks manually when the system is idle. Well, I can imagine a test, that can detect broken firmware: 1. Start some I/O. 2. Start self test. 3. Start the same I/O as 1. did. If there is a big difference in speed, report bad firmware.
I guess that most real admin in charge of server/dekstop knows the perfect time to run full deep long test. We will impose 2 actions, remove default scripts and install theirs.
S.M.A.R.T. checks are fully configurable by /etc/smartd.conf. Are people aware that S.M.A.R.T. is configured to an inferior mode and cannot predict many failures? What is better? Risk slow down on some devices (and people no being aware of it source) or risk disk death that can be predicted but it is not (because self tests are not started)?
And what about smartd and udisk2 doing the same work? On most desktop udiskd2 is running and already doing smart check status ?
smartd is a hardware monitor that communicates with firmware using special commands. Disk health monitoring is the only purpose of smartd. It never reads any data from the disk. udisks2 is a standard block device manager. It does not any health checks. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Stanislav Brabec wrote:
I guess that most real admin in charge of server/dekstop knows the perfect time to run full deep long test. We will impose 2 actions, remove default scripts and install theirs.
S.M.A.R.T. checks are fully configurable by /etc/smartd.conf.
Are people aware that S.M.A.R.T. is configured to an inferior mode and cannot predict many failures?
Unless people take explicit action, they won't know about SMART and/or the monitoring. That's why I'm not convinced enabling the automatic selftest is worth the effort. But it won't hurt. Anyway, for many controllers I think people have to edit the config file anyway. -- Per Jessen, Zürich (6.0°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Per Jessen wrote:
Anyway, for many controllers I think people have to edit the config file anyway.
For the standard small systems, DEVICESCAN works well, especially if the drive is in database. (By the way, I just rewritten packaging checks to support and not owerwrite database updates done by update-smart-drivedb. https://build.opensuse.org/request/show/289331 ) If people have a bridge, RAID controller or so, it needs manual action. Also USB disks often need manual action. There are many types of bridges, and there is no easier way to detect a working way than testing different combinations. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thursday 05 March 2015 15.50:12 Stanislav Brabec wrote:
Bruno Friedmann wrote:
On Wednesday 04 March 2015 23.30:09 Yamaban wrote:
On Wed, 4 Mar 2015 22:41, Stanislav Brabec wrote: +1, maybe as the last of 'cron.daily' or what-ever systemd calls that now. +1, also as last of 'cron.monthly'
Not from cron. smartd itself has this capability. It is configured in /etc/smartd.conf. It starts it with delays, and check, whether the last check already finished.
In that case no problems, I was thinking the idea was a /etc/cron.*/thing
Is it really the best way? If you have a whatever storage computer with lot of disk and there's a running backup at that time, it will kill all the io bandwidth.
This should not kill I/O bandwidth. Firmware should postpone the check until the device is idle.
The mention of that "behaviour for some" "bug for some other" will be hard to detect no?
It would not be our bug, but HDD manufacturers' bug. The last report about such bug in our Bugzilla is dated
People will still have a chance to: - Upgrade HDD firmware - Notify HDD manufacturer about this problem. - Turn S.M.A.R.T. checks off. - Run checks manually when the system is idle.
Well, I can imagine a test, that can detect broken firmware: 1. Start some I/O. 2. Start self test. 3. Start the same I/O as 1. did.
If there is a big difference in speed, report bad firmware. Oh that is not always useful, I know a numerous bad firmware of WD harddrive, that WD doesn't want to offer a way to upgrade them. (Not high end hdd sas)
I guess that most real admin in charge of server/dekstop knows the perfect time to run full deep long test. We will impose 2 actions, remove default scripts and install theirs.
S.M.A.R.T. checks are fully configurable by /etc/smartd.conf.
Are people aware that S.M.A.R.T. is configured to an inferior mode and cannot predict many failures?
What is better? Risk slow down on some devices (and people no being aware of it source) or risk disk death that can be predicted but it is not (because self tests are not started)? Nope having smartd running is perfect, (I'm fan boy of it and doesn't have trouble with since 10 years) Still I'm not sure that casual Joe will see the log in time or inside smartd package we will have a kinda script able to wall everybody connected or desktop notification. (Mail warning are more used by experienced admins)
And what about smartd and udisk2 doing the same work? On most desktop udiskd2 is running and already doing smart check status ?
smartd is a hardware monitor that communicates with firmware using special commands. Disk health monitoring is the only purpose of smartd. It never reads any data from the disk.
udisks2 is a standard block device manager. It does not any health checks. Then why it is shouting when you remove a removable device from system for example? déc 28 11:04:40 yoda udisksd[5446]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/SAMSUNG_HD501LJ_S0MUJ1PP431740: Error updating SMART data: sk_disk_open: No such file or directory (seems to be related to https://bugs.launchpad.net/ubuntu/+source/udisks2/+bug/1281588/comments/29)
And as the doc also refer it http://udisks.freedesktop.org/docs/1.91.0/gdbus-org.freedesktop.UDisks2.Driv... I'm still asking myself if it doesn't create a double check with smartd activated. -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Board, fsfe fellowship GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Bruno Friedmann wrote:
On Thursday 05 March 2015 15.50:12 Stanislav Brabec wrote:
Not from cron. smartd itself has this capability. It is configured in /etc/smartd.conf. It starts it with delays, and check, whether the last check already finished. In that case no problems, I was thinking the idea was a /etc/cron.*/thing
No, it should be a task for smartd itself. smartd has its own self test scheduler that is aware of running tests. And smartd has to be running to poll for self test results. (Self test could take days to finish.) Doing it in cron would require code duplicating smartd. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On 03/04/2015 04:41 PM, Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
In these old years, there were discs with firmware problems, and self test sometimes caused delays or even freezes. (In these old days, smartd was even disabled by default for the samer reason.) There was discussed possible shortening of life time by self tests.
Over the years, hardware changed a lot. Only two HDD manufacturers remain, and no firmware crashes/freezes/delays caused by S.M.A.R.T. were reported for nearly 10 years.
With raising density of discs, importance of regular checks raises as well. Detection of weak data can even prevent data losses before they even occur.
That is why I think there is a time to re-evaluate the old decision, and think about enabling regular Self Tests by default.
Nowadays, without running any self tests: If we don't run any self tests, then there is no guarantee (i. e. specification does not require it) that S.M.A.R.T. is monitored at all. However it seems, that most drives perform Offline Self Test every 4 hours (see item "Automatic Offline Testing") and update core parameters.
smartd is able to predict some types of failures that are related to inferior operation, and maybe failing mechanical parts (permanent seek errors, crash-landing of head, degradation of head) or failing electronics.
But it is not capable to predict failing disc surface.
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour. No crash triggered by smartd on faulty firmware was not reported since 2005.
I propose to keep vital S.M.A.R.T. data frequency check on this default. I also propose to not perform Offline tests by smartd, as most (all?) discs do it every 4 hours, and do Short Self Tests instead.
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
Benefits: There is no guarantee that the Offline Self Test covers all tests of Short Self Test, and that Offline Self Test is even regularly called by the firmware. Depending onf firmware implementation, Short Self Test may be required to predict some failures of core functions of HDD (i. e. total HDD failures).
If firmware does the same during the Offline Self Test and Short Self Test is enabled once per day, number of tests raises from 4 daily to 5. If it does something more, it could give a better prediction.
Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours.
If it finds a weak but still readable data, it silently relocates them (you only see change in S.M.A.R.T. statistics). If it finds unreadable sector, it retries to read it for some time, and if it fails, S.M.A.R.T. changes overall status to FAILED, and error is reported to the user.
Benefits: Running Long Self Test can prevent data loss in files that are not accessed, and if it happens, it is detected early and reported.
I propose to run Long Self Test once a month.
Risks: There is a risk, that Long Self Test slows down I/O operation due to inferior firmware. But if the firmware is written in a smart way, any read or write request should pause the Self Test, and it should be resumed after some time of being idle. If we suppose well written firmware, it should cause minimal delays.
Self Tests resume after reboot.
Note that the disc in status FAILED due to unreadable sector can still be "healed" by writing data to the failed place. Writing data to the failed place stops read retrying (now the data are not lost, but overwritten). Immediate relocation is performed, pending unreadable sector count is decreased, and if it reaches zero, next self test returns PASS.
It would nice if Yast had a similar module. -- Cheers! Roman -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Roman Bysh wrote:
It would nice if Yast had a similar module.
I got this idea as well. But I would have to learn Ruby first. - Tune smartd parameters. - Run firmware quality test to detect possible slowdown. - Perform such tests for all system disks during installation. YaST team pointed me to: https://news.opensuse.org/2015/02/25/openness-brings-fresh-air-to-yast/ -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 04 of March 2015 22:41:58 Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
In these old years, there were discs with firmware problems, and self test sometimes caused delays or even freezes. (In these old days, smartd was even disabled by default for the samer reason.) There was discussed possible shortening of life time by self tests.
Over the years, hardware changed a lot. Only two HDD manufacturers remain, and no firmware crashes/freezes/delays caused by S.M.A.R.T. were reported for nearly 10 years.
With raising density of discs, importance of regular checks raises as well. Detection of weak data can even prevent data losses before they even occur.
That is why I think there is a time to re-evaluate the old decision, and think about enabling regular Self Tests by default.
Nowadays, without running any self tests: If we don't run any self tests, then there is no guarantee (i. e. specification does not require it) that S.M.A.R.T. is monitored at all. However it seems, that most drives perform Offline Self Test every 4 hours (see item "Automatic Offline Testing") and update core parameters.
smartd is able to predict some types of failures that are related to inferior operation, and maybe failing mechanical parts (permanent seek errors, crash-landing of head, degradation of head) or failing electronics.
But it is not capable to predict failing disc surface.
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour. No crash triggered by smartd on faulty firmware was not reported since 2005.
I propose to keep vital S.M.A.R.T. data frequency check on this default. I also propose to not perform Offline tests by smartd, as most (all?) discs do it every 4 hours, and do Short Self Tests instead.
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
Benefits: There is no guarantee that the Offline Self Test covers all tests of Short Self Test, and that Offline Self Test is even regularly called by the firmware. Depending onf firmware implementation, Short Self Test may be required to predict some failures of core functions of HDD (i. e. total HDD failures).
If firmware does the same during the Offline Self Test and Short Self Test is enabled once per day, number of tests raises from 4 daily to 5. If it does something more, it could give a better prediction.
Long Self Test: Long Self Test (nearly for sure) performs full surface scan. It typically takes several to many hours.
If it finds a weak but still readable data, it silently relocates them (you only see change in S.M.A.R.T. statistics). If it finds unreadable sector, it retries to read it for some time, and if it fails, S.M.A.R.T. changes overall status to FAILED, and error is reported to the user.
Benefits: Running Long Self Test can prevent data loss in files that are not accessed, and if it happens, it is detected early and reported.
I propose to run Long Self Test once a month.
Risks: There is a risk, that Long Self Test slows down I/O operation due to inferior firmware. But if the firmware is written in a smart way, any read or write request should pause the Self Test, and it should be resumed after some time of being idle. If we suppose well written firmware, it should cause minimal delays.
Self Tests resume after reboot.
Note that the disc in status FAILED due to unreadable sector can still be "healed" by writing data to the failed place. Writing data to the failed place stops read retrying (now the data are not lost, but overwritten). Immediate relocation is performed, pending unreadable sector count is decreased, and if it reaches zero, next self test returns PASS.
+1 Martin Pluskal
Stanislav Brabec wrote:
Hallo.
Many years ago we decided that smartd will not enable self tests by default.
In these old years, there were discs with firmware problems, and self test sometimes caused delays or even freezes. (In these old days, smartd was even disabled by default for the samer reason.) There was discussed possible shortening of life time by self tests.
Over the years, hardware changed a lot. Only two HDD manufacturers remain,
Three - Seagate (+Samsung), Western Digital (+HGST), Toshiba (+Fujitsu).
That is why I think there is a time to re-evaluate the old decision, and think about enabling regular Self Tests by default.
I've been running short selftests every day and a long test on weekends for years on all (S)ATA drives.
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
I propose to keep vital S.M.A.R.T. data frequency check on this default. I also propose to not perform Offline tests by smartd, as most (all?) discs do it every 4 hours, and do Short Self Tests instead.
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
+1
I propose to run Long Self Test once a month.
+1 However, automatic testing is all very well, but unless someone reads the reports sent by email, they're worthless. I know it is up to the user, but unless he/she is aware, automatic testing has zero value. I suspect the vast majority of emails sent to root@localhost are never read. -- Per Jessen, Zürich (2.1°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thu, 5 Mar 2015 09:07, Per Jessen wrote:
Stanislav Brabec wrote: [snip]
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
look into the contents of /var/lib/smartmontools/ for config: /etc/smartd.conf and look for the default values in "man 5 smartd.conf" The logs are there, just not pressed upon the users.
However, automatic testing is all very well, but unless someone reads the reports sent by email, they're worthless. I know it is up to the user, but unless he/she is aware, automatic testing has zero value. I suspect the vast majority of emails sent to root@localhost are never read.
I'm using wall for the failure warnings, and logging for anything beyond info. But, that is my private addenum since my last HDD crash in 2013. - Yamaban. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Yamaban wrote:
On Thu, 5 Mar 2015 09:07, Per Jessen wrote:
Stanislav Brabec wrote: [snip]
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
look into the contents of /var/lib/smartmontools/
only one file: cat /var/lib/smartmontools/smartd_opts # Generated by /usr/lib/smartmontools/generate_smartd_opts smartd_opts=""
for config: /etc/smartd.conf and look for the default values in "man 5 smartd.conf"
/etc/smartd.conf only has one line: DEVICESCAN -d removable
The logs are there, just not pressed upon the users.
Where do I look for the logs? -- Per Jessen, Zürich (3.1°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Per Jessen wrote:
Yamaban wrote:
On Thu, 5 Mar 2015 09:07, Per Jessen wrote:
Stanislav Brabec wrote: [snip]
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
This is not in smartd.conf. It is defined by command line of smartd. See smartd --help or smartd(8) man page: -A and -s. If not specified, both are on by default.
look into the contents of /var/lib/smartmontools/
only one file:
cat /var/lib/smartmontools/smartd_opts # Generated by /usr/lib/smartmontools/generate_smartd_opts smartd_opts=""
This file is used by SUSE specific connector to YaST sysconfig editor: See "yast2 sysconfig" Hardware->S.M.A.R.T. Here are two configuration options. Both should be "yes" by default. SMARTD_SAVESTATES SMARTD_ATTRLOG The activation script just creates command line for smartd. Empty smartd_opts variable means, that both features are on.
for config: /etc/smartd.conf and look for the default values in "man 5 smartd.conf"
/etc/smartd.conf only has one line: DEVICESCAN -d removable
The logs are there, just not pressed upon the users.
Where do I look for the logs?
If smartd is running, it creates: /var/lib/smartmontools/attrlog.MODEL-SERIAL.ata.csv and /var/lib/smartmontools/smartd.MODEL-SERIAL.TYPE.state -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Stanislav Brabec wrote:
Per Jessen wrote:
Yamaban wrote:
On Thu, 5 Mar 2015 09:07, Per Jessen wrote:
Stanislav Brabec wrote: [snip]
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
This is not in smartd.conf. It is defined by command line of smartd. See smartd --help or smartd(8) man page: -A and -s. If not specified, both are on by default.
Ah, thanks. I see the data now. -- Per Jessen, Zürich (-0.2°C) http://www.dns24.ch/ - your free DNS host, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Per Jessen:
Stanislav Brabec wrote:
Monitoring vital data is enabled in openSUSE since 2010. It is done twice a hour.
I'm curious, how is this done? I don't see anything in smartd.conf nor any cronjob. I don't recall receiving any reports either.
My mistake, I overseen that smartd service was not enabled in the default openSUSE preset file after migration to systemd. I am going to fix it, and perform one-time enabling. If smartd runs, it polls hardware twice a hour by default.
However, automatic testing is all very well, but unless someone reads the reports sent by email, they're worthless. I know it is up to the user, but unless he/she is aware, automatic testing has zero value. I suspect the vast majority of emails sent to root@localhost are never read.
This is a problem. If user has a terminal opened, critical issues are displayed there. We need a connection to desktop environment. In past, the reporting was done by powersave-notify. Probably a small application connected to the notification service (that is compatible over desktops), needs to be written. Or, if somebody already wrote it, it should be packages. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
* Stanislav Brabec <sbrabec@suse.cz> [2015-03-05 17:00]:
However, automatic testing is all very well, but unless someone reads the reports sent by email, they're worthless. I know it is up to the user, but unless he/she is aware, automatic testing has zero value. I suspect the vast majority of emails sent to root@localhost are never read.
This is a problem. If user has a terminal opened, critical issues are displayed there.
We need a connection to desktop environment. In past, the reporting was done by powersave-notify.
Probably a small application connected to the notification service (that is compatible over desktops), needs to be written. Or, if somebody already wrote it, it should be packages.
Use wall(1), ie. in one of the included examples, KDE turns that into a desktop notification by default, for other desktops there is xwrited which I could submit to Factory if desired. -- Guido Berhoerster -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Guido Berhoerster wrote:
* Stanislav Brabec <sbrabec@suse.cz> [2015-03-05 17:00]:
Probably a small application connected to the notification service (that is compatible over desktops), needs to be written. Or, if somebody already wrote it, it should be packages.
Use wall(1), ie. in one of the included examples, KDE turns that into a desktop notification by default, for other desktops there is xwrited which I could submit to Factory if desired.
I support this. It would be nice and useful not only for smartd. And then add it to GNOME and XFCE patters. But it is a bit sub-optimal, as it does not allow internationalization. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 If the target PC was in state "shutdown", when a scheduled job (for example, monthly check) was scheduled, does the smartd run a missed scheduled task by itself as soon as a PC will start again ? 05.03.2015 18:39, Stanislav Brabec пишет:
Guido Berhoerster wrote:
* Stanislav Brabec <sbrabec@suse.cz> [2015-03-05 17:00]:
Probably a small application connected to the notification service (that is compatible over desktops), needs to be written. Or, if somebody already wrote it, it should be packages.
Use wall(1), ie. in one of the included examples, KDE turns that into a desktop notification by default, for other desktops there is xwrited which I could submit to Factory if desired.
I support this. It would be nice and useful not only for smartd. And then add it to GNOME and XFCE patters.
But it is a bit sub-optimal, as it does not allow internationalization.
-----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJU+I50AAoJEHRzCo0swmJSzpYP/AkTs1CAF+Q2ZFNCqY4mYdZL ++6n7IXKuJ60D78gvHTa7xD7+v5+uQwafKOJiJ+NIrXGKeZ4U5zFsT/N92XPYgsD yhZM+2fVX+SNME9/h1giHcZZtSJhiz7OMOeq3jjqENVRi6dN8hsSSH2mguzXiZx9 pH6xu62QBSLsR7u2bP5eLri+k+oI6xT5qpnjzWVxa1MS5qi+SM/ijgqCa5UDKkru rxNQd+6qju/5bKubLygIKEozCPYxlXd5XK3t2bADDjCZ5HZ0b7ySMZqB9EaZxnhn h6hGlirijhAuiWjox11rK3PXnd6UzHa7fpaRQTiWqHDdVmiWM3W4zzrB9u8Kci7r bz+9cnUqZoqPvJzfhJSWy1oZchly2P3bcexNspua2JJ70xNgwzTS6aRQiFaz2uF/ OU1xsvl0TX0VHZ+jiB6CMxRw1tJ9GMMCwIF5mrdP7eecJWVBTv0DmJOpBY0HB2kp lI/us5KICrP11bg1WTEVd4d8JQRndomOpIgvhFXs63IAiBNj0UXAhlISnlTGzh7u S/BtIun/xwVKK+gtZo8a2RrYkODdBUht8BPhXvwPQ7WD1FrO+Jt5/gM30+coAgca jRED3uojMOGYVwW472cJ4L5IBxJCg17FrqHHX16XIjGw/kexOUFdosOv6ppgmt5p OiAnBFtjfuJbfCsckyC5 =XOVP -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Opensuse user wrote:
If the target PC was in state "shutdown", when a scheduled job (for example, monthly check) was scheduled, does the smartd run a missed scheduled task by itself as soon as a PC will start again ?
smartd uses following logic: - If the system was shut down, start the longest test scheduled in the time of shut down. - If the system was up, but the drive was down, the check is skipped, and you can configure, how many times it can be skipped. For more, see smartd.conf(5) man page. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
* Stanislav Brabec <sbrabec@suse.cz> [2015-03-05 17:39]:
Guido Berhoerster wrote:
* Stanislav Brabec <sbrabec@suse.cz> [2015-03-05 17:00]:
Probably a small application connected to the notification service (that is compatible over desktops), needs to be written. Or, if somebody already wrote it, it should be packages.
Use wall(1), ie. in one of the included examples, KDE turns that into a desktop notification by default, for other desktops there is xwrited which I could submit to Factory if desired.
I support this. It would be nice and useful not only for smartd. And then add it to GNOME and XFCE patters.
How about the following default configuration: /etc/smartd.conf: ----8<---- # send mail to root and execute scripts in /etc/smartd_warning.d DEVICESCAN -d removable -m root,@ALL -M exec /etc/smartd_warning.sh ---->8---- /etc/smartd_warning.d/wall.sh: ----8<---- #! /bin/sh # Warn all users of a problem wall <<EOF Problem detected with disk: $SMARTD_DEVICESTRING $SMARTD_MESSAGE EOF ---->8----
But it is a bit sub-optimal, as it does not allow internationalization.
True, but I guess it is not much worse than sending untranslated mails to root. For translated desktop notifications it'd have to be implemented the other way around, i.e. have a per-user daemon which uses some form of IPC like DBus to listen to standardized smartd messages which can the be presented in a translated form. That also won't reach users logged in on a console or via ssh. -- Guido Berhoerster -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Guido Berhoerster wrote:
How about the following default configuration:
/etc/smartd.conf: ----8<---- # send mail to root and execute scripts in /etc/smartd_warning.d DEVICESCAN -d removable -m root,@ALL -M exec /etc/smartd_warning.sh ---->8----
/etc/smartd_warning.d/wall.sh: ----8<---- #! /bin/sh
# Warn all users of a problem wall <<EOF Problem detected with disk: $SMARTD_DEVICESTRING
$SMARTD_MESSAGE EOF ---->8----
Yes, something like that looks good. Thinking if somebody still reads mails to root...
But it is a bit sub-optimal, as it does not allow internationalization.
True, but I guess it is not much worse than sending untranslated mails to root. For translated desktop notifications it'd have to be implemented the other way around, i.e. have a per-user daemon which uses some form of IPC like DBus to listen to standardized smartd messages which can the be presented in a translated form. That also won't reach users logged in on a console or via ssh.
There are desktops with more users logged, each with a different locale. It complicates such things a bit. Desktop application has to translate them. But it is hard for already composed messages with variable parts. Note that similar problem was successfully solved in sane project in past. Well, this could be a longer time task. In the first step, we need to deliver messages at least in English. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Hello, Am Donnerstag, 5. März 2015 schrieb Stanislav Brabec:
Guido Berhoerster wrote:
How about the following default configuration: ... # Warn all users of a problem wall <<EOF Problem detected with disk: $SMARTD_DEVICESTRING
$SMARTD_MESSAGE EOF ---->8----
Yes, something like that looks good. Thinking if somebody still reads mails to root...
Yes, for example on webservers where typically nobody is logged in (which means that wall warnings won't be noticed on them), the admin will (hopefully) read the root mails. I'd also guess several people on this mailinglist read their root@localhost mails, but I'd also guess the average user differs ;-)
But it is a bit sub-optimal, as it does not allow internationalization.>
Well, this could be a longer time task. In the first step, we need to deliver messages at least in English.
Right. Better implement a warning _now_ (even if it's english only) - that's much better than no warning at all ;-) Regards, Christian Boltz -- Das ist etwa so, als würde man Anfängern ed(1) statt $EDITOR empfehlen, weil man damit ja alles machen kann, weil man den ja gut kennen würde, weil der ja nach $ÜBLE_VERRENKUNG auch in der einen oder anderen Funktion fast so einfach zu benutzen wäre wie andere Editoren. [Thorsten Haude in suse-linux] -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Christian Boltz wrote:
Hello,
Am Donnerstag, 5. März 2015 schrieb Stanislav Brabec:
Guido Berhoerster wrote:
How about the following default configuration: ... # Warn all users of a problem wall <<EOF Problem detected with disk: $SMARTD_DEVICESTRING
$SMARTD_MESSAGE EOF ---->8----
Yes, something like that looks good. Thinking if somebody still reads mails to root...
Yes, for example on webservers where typically nobody is logged in (which means that wall warnings won't be noticed on them), the admin will (hopefully) read the root mails.
'server' is the keyword here - the admin will make sure the mails are read.
Well, this could be a longer time task. In the first step, we need to deliver messages at least in English.
Right. Better implement a warning _now_ (even if it's english only) - that's much better than no warning at all ;-)
Yup. -- Per Jessen, Zürich (-0.2°C) http://www.hostsuisse.com/ - virtual servers, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Stanislav Brabec wrote:
But it is a bit sub-optimal, as it does not allow internationalization.
True, but I guess it is not much worse than sending untranslated mails to root. For translated desktop notifications it'd have to be implemented the other way around, i.e. have a per-user daemon which uses some form of IPC like DBus to listen to standardized smartd messages which can the be presented in a translated form. That also won't reach users logged in on a console or via ssh.
There are desktops with more users logged, each with a different locale. It complicates such things a bit.
A desktop with multiple seats will (presumably) have an experienced admin person, he will make sure mails from smartd are read. -- Per Jessen, Zürich (-0.2°C) http://www.dns24.ch/ - free dynamic DNS, made in Switzerland. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Thursday, March 05, 2015 21:46:13 Stanislav Brabec wrote:
Guido Berhoerster wrote: [...]
True, but I guess it is not much worse than sending untranslated mails to root. For translated desktop notifications it'd have to be implemented the other way around, i.e. have a per-user daemon which uses some form of IPC like DBus to listen to standardized smartd messages which can the be presented in a translated form. That also won't reach users logged in on a console or via ssh.
There are desktops with more users logged, each with a different locale. It complicates such things a bit.
Desktop application has to translate them. But it is hard for already composed messages with variable parts.
Note that similar problem was successfully solved in sane project in past.
Well, this could be a longer time task. In the first step, we need to deliver messages at least in English.
There already is a daemon running providing smart status to desktop applications/notifiers: udisks There is a dbus object for every drive, several smart attributes are exported as properties, and a PropertiesChanged signal is generated when the disk starts to fail. See http://udisks.freedesktop.org/docs/latest/gdbus-org.freedesktop.UDisks2.Driv... for smart properties. You can use udiskctl dump to show the current state on your system. Udisks also works on a multiseat system, plug in an external hard drive, and an eventual notification will be shown for the user at the corresponding seat. AFAIK only GNOME provides a desktop notification currently, but providing a desktop agnostic bridge between udisks and the notification spec for anything but gnome (KDE, xfce, ...) should be easy. Kind regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen home: +49 241 53809034 mobile: +49 151 50412019 work: +49 2405 49936-424-- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
On Wednesday 04 March 2015, Stanislav Brabec wrote:
That is why I think there is a time to re-evaluate the old decision, and think about enabling regular Self Tests by default.
Generally I find this a good idea but I have a few objections.
Short Self Test: Short Self Test verifies status of the hardware function. It takes several minutes.
I propose to run Short Self Test once a day.
Maybe better once per (HD)-uptime-day. Machines with only 10 minutes uptime per day shouldn't do it every day. Also sleeping discs have to be ignored. The user should be able to set fixed times for the checks.
I propose to run Long Self Test once a month.
Risks: There is a risk, that Long Self Test slows down I/O operation due to inferior firmware. But if the firmware is written in a smart way, any read or write request should pause the Self Test, and it should be resumed after some time of being idle. If we suppose well written firmware, it should cause minimal delays.
Has anybody checked if firmware is written in that smart way usually?
Self Tests resume after reboot.
I don't like seeing my machine busy right after boot-up, reminds me about the rug or beagle mess. cu, Rudi -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
Ruediger Meier wrote:
On Wednesday 04 March 2015, Stanislav Brabec wrote:
I propose to run Short Self Test once a day.
Maybe better once per (HD)-uptime-day. Machines with only 10 minutes uptime per day shouldn't do it every day. Also sleeping discs have to be ignored. The user should be able to set fixed times for the checks.
I just looked into the documentation of smartd.conf. It seems that this feature is not yet implemented. Ignoring sleeping discs is the default. See smartd.conf(8) -n POWERMODE. Some discs have had bad behavior in past - any S.M.A.R.T. command caused spinning it up.
I propose to run Long Self Test once a month.
Risks: There is a risk, that Long Self Test slows down I/O operation due to inferior firmware. But if the firmware is written in a smart way, any read or write request should pause the Self Test, and it should be resumed after some time of being idle. If we suppose well written firmware, it should cause minimal delays.
Has anybody checked if firmware is written in that smart way usually?
I have no positive verification. But the last report complaining on it was created in 2006: https://bugzilla.novell.com/show_bug.cgi?id=192591 It should be easy to write test that check the slowdown.
Self Tests resume after reboot.
I don't like seeing my machine busy right after boot-up, reminds me about the rug or beagle mess.
Self tests are hardware driven. If you does no power cycling, the self test may even not interrupt. If you do, then it depends on the firmware. ATA supports Selective Self Test. It could allow to implement for example checking of 20GB every day, but it would require additional coding (get the disk capacity, compute parts per day, store results, etc.) SCSI does not have this feature. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
I just wrote a small script that checks for possible slowdowns caused by inferior firmware implementation of Long Self Test. It is recommended to run the test on idle system. The script is trivial and can edit block sizes or other parameters (e. g. test reading with delays). ftp://ftp.suse.com/pub/people/sbrabec/smartmontools/ -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.cz Lihovarská 1060/12 tel: +49 911 7405384547 190 00 Praha 9 fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76 -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (13)
-
Bruno Friedmann
-
Brüns, Stefan
-
Christian Boltz
-
Emil Stephan
-
Guido Berhoerster
-
Malcolm
-
Martin Pluskal
-
Opensuse user
-
Per Jessen
-
Roman Bysh
-
Ruediger Meier
-
Stanislav Brabec
-
Yamaban