[opensuse] damaged HDD?
Following on from my post last night in another thread of mine: https://lists.opensuse.org/opensuse/2017-07/msg00853.html This is a new problem so I don't want that post going off-topic. I just wonder if anyone else has experience with this drive model, or more general thoughts. I have a Transcend Storejet 25M3 2TB external SATA USB drive, which contains a Seagate Samsung SpinPoint M9T, model ST2000LM003 HN-M201RAD inside. Here's some details on it: https://us.transcend-info.com/Products/No-284 I only bought it in December 2016, it's still under warranty. I originally split it into 2 partitions - the first, 700GB or so, ext4, for backup of someone else's machine; the second, taking the rest of the disk, xfs, for my music collection (mostly FLAC). Until the other day, I'd plugged it in directly to a few Linux machines, reading my music files directly off the xfs partition. Always ensured I mount/unmount properly. As seen in the above specifications, the drive has 'Military-grade shock resistance'. That said, it's never been bashed about in any way or mishandled, and certainly not subjected to any shock or other misuse whilst plugged in. Since I recently exchanged my router for an up-to-date version from my French ISP, I decided to plug the drive into the router so as to access the music collection from a central location regardless of which device I'm using (all over-detailed in the above thread). The French ISP, "Free", is very geek-friendly and the router supports a few filesystems with ext4 being the native format. XFS is also available but it prompts warnings of slow access or other difficulties, recommending the use of ext4 instead. Using the router's built-in partition verification tool (non-destructive), the first ext4 partition check passed in under a minute, but the second xfs partition just hangs and grinds away at the drive for hours, eventually forcing me to disconnect using the router's front panel display and reboot. I could nonetheless read/write the xfs partition and access my music files. But trying to update my music collection across the LAN in Clementine was going up to around 95% complete, then dropping back down a few per cent, endlessly, for hours. So I decided to back up the music data and reformat the 2nd partition as ext4. I used the openSUSE partitioner, copied the music back over (about 150GB). Plugged the drive back into the router, but the drive access light is permanently flashing every couple of seconds, which it never did before. If I touch the drive I can feel that every few seconds there is some clunking, though not audible. It seems to be in a seek loop. I plugged it directly into my PCs and the same thing happens when the second partition is mounted. Mounting the first partition on its own is fine. I tried deleting and reformatting the second partition (again as ext4) with GParted on the other PC, and even with no data copied over, as soon as I mount it I get the same seek issues. Suspecting drive failure, I've tried running S.M.A.R.T. tests using GSmartControl. Attempting the extended test which requires about 6 hours, three times it has stopped saying 'User aborted', but I never touched it, and I ensured there was no power-saving that would have messed it up. Twice it only made it to 10~20%, the other time it got 80% through overnight but then bottled out again. Here's the log file: http://paste.opensuse.org/4cc41f6b I know these SMART tests have a tendency to show things like 'old-age' and 'pre-fail' on almost any drive, so I don't know if that's really relevant, but does anything spring out as worrisome here? I don't really know what to look for before I contact Transcend support. Of course I immediately removed the Transcend Win/Mac software and tools that were on the drive so as to reformat. I don't want to try low-level format or suchlike before contacting support. Anybody know if these companies are ever likely to be awkward if you use Linux / 'non-standard' filesystems? gumb -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am Samstag, 22. Juli 2017, 11:02:29 CEST schrieb gumb:
[...] Suspecting drive failure, I've tried running S.M.A.R.T. tests using GSmartControl. Attempting the extended test which requires about 6 hours, three times it has stopped saying 'User aborted', but I never touched it, and I ensured there was no power-saving that would have messed it up. Twice it only made it to 10~20%, the other time it got 80% through overnight but then bottled out again. Here's the log file: http://paste.opensuse.org/4cc41f6b
Honestly, I don't know to reliably disable all available power-saving options. Especially if the drive is connected via USB. A quick google search confirms the problem you're seeing, e.g.: https://ddumont.wordpress.com/2010/03/15/workaround-for-aborted-smart-test-f...
I know these SMART tests have a tendency to show things like 'old-age' and 'pre-fail' on almost any drive, so I don't know if that's really relevant, but does anything spring out as worrisome here? [...]
Looks fine. The "Multi_Zone_Error_Rate" raw value is not 0 but still very low. However, some attributes are only updated "offline". But your auto offline data collection is disabled and you never completed an offline test. I would try to enable the former by: smartctl --smart=on --offlineauto=on --saveauto=on /dev/xxx Gruß Jan -- Law, without force is impotent. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Hello, On Sat, 22 Jul 2017, Jan Ritzerfeld wrote:
Am Samstag, 22. Juli 2017, 11:02:29 CEST schrieb gumb:
[...] Suspecting drive failure, I've tried running S.M.A.R.T. tests using GSmartControl. Attempting the extended test which requires about 6 hours, three times it has stopped saying 'User aborted', but I never touched it, and I ensured there was no power-saving that would have messed it up. Twice it only made it to 10~20%, the other time it got 80% through overnight but then bottled out again. Here's the log file: http://paste.opensuse.org/4cc41f6b
Honestly, I don't know to reliably disable all available power-saving options. Especially if the drive is connected via USB. A quick google search confirms the problem you're seeing, e.g.: https://ddumont.wordpress.com/2010/03/15/workaround-for-aborted-smart-test-f...
Use 'hdparm -B 254 /dev/sdX' for these drives (after connecting/booting). I had this in my boot.local: hdparm -B 254 /dev/disk/by-id/ata-ST3000DM001-{SERIAL_NO} You could also use the wwn-id, i.e. in your case: hdparm -B 254 /dev/disk/by-id/wwn-0x50004cf2119c074b
I know these SMART tests have a tendency to show things like 'old-age' and 'pre-fail' on almost any drive, so I don't know if that's really relevant, but does anything spring out as worrisome here? [...]
Looks fine. The "Multi_Zone_Error_Rate" raw value is not 0 but still very low.
Yep, I think that's perfectly normal for those Samsung drives, I have 5 Samsung HD204[WU]I, at least one already labeled as Seagate, but still identifying as Samsung, with Power_On_Hours between 32885 and 37979 and Multi_Zone_Error_Rate between 24 and 578. That "worst" one went from 532 at 30082hrs (2016-09-20) to 578 at 34673 when I last booted today (2017-07-21 23:31) to 579 at 34689hrs right now (2017-07-22 15:19). Oh, and it's probably the most heavily used disk too in this box. HTH, -dnh -- Bugzilla beißt nicht und ist viel, viel netter als ich. -- Lars Müller -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Well, seems like the "lazy_itable_init" feature of mkfs.ext4 that Jan mentioned on the other thread (https://lists.opensuse.org/opensuse/2017-07/msg00867.html) was all that was causing the issue in the end. The drive started behaving shortly after I propagated this panic post this morning. It just needed more time. At least I hope that's all it was. Cheers, gumb -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 23/07/17 00:11, gumb wrote:
Well, seems like the "lazy_itable_init" feature of mkfs.ext4 that Jan mentioned on the other thread (https://lists.opensuse.org/opensuse/2017-07/msg00867.html) was all that was causing the issue in the end.
The drive started behaving shortly after I propagated this panic post this morning. It just needed more time. At least I hope that's all it was.
Hrmph. Just out of curiosity I thought I'd verify the second partition using the router's tool. I don't know what purpose that serves exactly, but this time it comes up with an unspecified error. I don't see anywhere to find out further info or view a log. The first partition still verifies okay. Tried unmounting and mounting again but nope, it's not happy. Inclined to leave it alone as it appears to be doing its job in every other respect. Just some obscure, redundant brain cell that has just been reactivated in order to maintain that minor niggle in the back of my mind now. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2017-07-22 11:02, gumb wrote:
Following on from my post last night in another thread of mine: https://lists.opensuse.org/opensuse/2017-07/msg00853.html
This is a new problem so I don't want that post going off-topic. I just wonder if anyone else has experience with this drive model, or more general thoughts.
I have a Transcend Storejet 25M3 2TB external SATA USB drive, which contains a Seagate Samsung SpinPoint M9T, model ST2000LM003 HN-M201RAD inside. Here's some details on it: https://us.transcend-info.com/Products/No-284
I'm partial to buying the HD and box separately, then I assemble them together.
ext4 instead. Using the router's built-in partition verification tool (non-destructive), the first ext4 partition check passed in under a minute, but the second xfs partition just hangs and grinds away at the drive for hours, eventually forcing me to disconnect using the router's front panel display and reboot.
xfs check is very slow, when forced to do it. It checks everything. The ext4 test is fast because it only does a superficial test, and because the partition is smaller.
Suspecting drive failure, I've tried running S.M.A.R.T. tests using GSmartControl. Attempting the extended test which requires about 6 hours, three times it has stopped saying 'User aborted', but I never touched it, and I ensured there was no power-saving that would have messed it up. Twice it only made it to 10~20%, the other time it got 80% through overnight but then bottled out again. Here's the log file: http://paste.opensuse.org/4cc41f6b
Running the surface test of the SMART test makes access to the disk by the applications to stall. It could be possible that the OS decides to abort the test, but I have never heard of such a "feature". You could just connect the box to your computer and run the test from there. I don't see anything important in the output, except that it did not finish. I would use the computer to do the test. Partitions umounted. As to the noise, there have been some suggestion on the other thread. I would reformat ext4 with those lazy features disabled. -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)
Hello, On Sat, 22 Jul 2017, Carlos E. R. wrote:
On 2017-07-22 11:02, gumb wrote:
Following on from my post last night in another thread of mine: https://lists.opensuse.org/opensuse/2017-07/msg00853.html
This is a new problem so I don't want that post going off-topic. I just wonder if anyone else has experience with this drive model, or more general thoughts.
I have a Transcend Storejet 25M3 2TB external SATA USB drive, which contains a Seagate Samsung SpinPoint M9T, model ST2000LM003 HN-M201RAD inside. Here's some details on it: https://us.transcend-info.com/Products/No-284
I'm partial to buying the HD and box separately, then I assemble them together.
Or buy a disk boxed (if it's exactly specified which), unbox it and use it internally ;) BTDT. $ df -hT | sort | awk \ 'BEGIN{print "Filesystem Type Size";}\ $2 !~ /tmp|rootfs|^Type/ && ! s[$1] { \ printf("%s %10s %s\n", $1, $2, $3); s[$1]++; }' ### the s[$1] stuff is because of bind-mounts Filesystem Type Size /dev/loop0 reiserfs 8.0G - looped newsspool /dev/loop1 reiserfs 4.0G - looped for a gentoo portage-tree /dev/sda1 ext4 52G - SSD suse /, ~46G used, stuff linked out /dev/sda2 ext4 52G - SSD gentoo /, ~40G used, stuff linked out /dev/sdb1 ext3 30G \ /dev/sdb3 ext3 30G | /dev/sdb5 ext3 1.8T | /dev/sdc1 ext3 1.9T | /dev/sdd1 ext3 1.9T | /dev/sde1 ext3 3.6T | /dev/sdf1 ext3 3.6T | DATA /dev/sdg2 ext3 9.9G | /dev/sdg3 ext3 1.9T | /dev/sdh1 ext3 40G | /dev/sdh2 ext3 40G | /dev/sdh3 ext3 15G | /dev/sdh5 ext3 5.0G | /dev/sdh6 ext3 1.8T / $ df -T | awk '$1 ~ /^\/dev\/sd/ && ! s[$1] { sum+=$3; s[$1]++; } \ END{printf("%.2f TiB\n", sum/1024/1024/1024); }' 16.35 TiB (again, the s[$1] is for pruning bind-mounts). Anyone see a pattern? :) Got some more external drives too, from 750G to 8T, all ext3. But, yes, it's now been some years that I've been using ext4, and I do trust it by now. Though it's not been tortured as some of my ext3 FSen when I had HW problems (hard shutdown by holding the powerbutton, even sysrq didn't work anymore or even the box just rebooting "spontaneously").
ext4 instead. Using the router's built-in partition verification tool (non-destructive), the first ext4 partition check passed in under a minute, but the second xfs partition just hangs and grinds away at the drive for hours, eventually forcing me to disconnect using the router's front panel display and reboot.
xfs check is very slow, when forced to do it. It checks everything. The ext4 test is fast because it only does a superficial test, and because the partition is smaller.
Ah, yes, I forgot in my other mail: XFS also needs _oodles_ of RAM for the check. ISTR at least 1GiB RAM per TiB on the FS. Anyway, I tried XFS once. Formatted a partition, tried xfs_repair and it barfed out on "not enough RAM". I neither have enough RAM nor enough free space on disk for swapping. Ahh, yes, not that but something: http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage... http://oss.sgi.com/archives/xfs/2010-05/msg00221.html
From the latter (yes, it's old, but AFAIK xfs_repair still needs tons of RAM):
"I just ran xfs_check on an empty 51TB filesystem w/ 821 AGs to get an idea of how much RAM an older xfs_repair will use (as it have 3.1.2 installed on my test machines). It is allocating about 115GB of virtual memory space before consuming all the RAM+swap in the machine before being OOM-killed." Thus XFS went down the drain for my box and purposes ;) Anyone got recent Mem-usage data on xfs_repair on a fullish, say, 4T volume? Lots of disk but little (non-ECC) RAM is not a use-case XFS (nor ZFS) were intended for... -dnh -- If ignorance is bliss, why aren't there more happy people? -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
Am Samstag, 22. Juli 2017, 15:18:50 CEST schrieb Carlos E. R.:
[...] Running the surface test of the SMART test makes access to the disk by the applications to stall. It could be possible that the OS decides to abort the test, but I have never heard of such a "feature". [...]
Some power-saving option of the OS might put the drive or the USB interface to sleep if there is no activity for some period of time. And a running smart test is not visible for the OS. The real fun begins if you want to securely erase a USB disk via ATA Secure Erase: | [...] some "intelligent" interfaces such as USB or firewire to PATA/SATA | bridges, SAS controllers or hardware RAID controllers may try to reset | devices which they have decided are no longer responding. [...] https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase Gruß Jan -- Lead, follow, or get out of the way. -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
On 2017-07-22 17:35, Jan Ritzerfeld wrote:
Am Samstag, 22. Juli 2017, 15:18:50 CEST schrieb Carlos E. R.:
[...] Running the surface test of the SMART test makes access to the disk by the applications to stall. It could be possible that the OS decides to abort the test, but I have never heard of such a "feature". [...]
Some power-saving option of the OS might put the drive or the USB interface to sleep if there is no activity for some period of time. And a running smart test is not visible for the OS.
Ahhhh...!
The real fun begins if you want to securely erase a USB disk via ATA Secure Erase: | [...] some "intelligent" interfaces such as USB or firewire to PATA/SATA | bridges, SAS controllers or hardware RAID controllers may try to reset | devices which they have decided are no longer responding. [...] https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
:-) -- Cheers / Saludos, Carlos E. R. (from 42.2 x86_64 "Malachite" at Telcontar)
On 22/07/17 15:18, Carlos E. R. wrote:
On 2017-07-22 11:02, gumb wrote:
I have a Transcend Storejet 25M3 2TB external SATA USB drive, which contains a Seagate Samsung SpinPoint M9T, model ST2000LM003 HN-M201RAD inside. Here's some details on it: https://us.transcend-info.com/Products/No-284
I'm partial to buying the HD and box separately, then I assemble them together.
I rather liked the casing for this one. It has a green openSusesque band around the top. :) -- To unsubscribe, e-mail: opensuse+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse+owner@opensuse.org
participants (4)
-
Carlos E. R.
-
David Haller
-
gumb
-
Jan Ritzerfeld