On 30/07/2020 21.06, David Haller wrote:
Hello,
On Thu, 30 Jul 2020, Yamaban wrote:
On Thu, 30 Jul 2020 15:30, Carlos E. R.
wrote: [..] <snip> the next lines follow directly. but the identifier is different: before: ST4000DM004-2CV104 after: ST4000DM004-2CV1 The first is via scsi, the second via ata, compare:
# lsscsi -g [9:0:0:0] disk ATA WDC WD40EFRX-68N 82.0 /dev/sdh /dev/sg7 # hdparm -i /dev/sdh | grep Model Model=WDC WD40EFRX-68N32N0 [..] # sginfo -i /dev/sg7 |grep Prod Product: WDC WD40EFRX-68N
As you see, sginfo (the SCSI IDENTIFY) truncates the model to 16 chars, ata allows at least 22 chars (that's the longest I have).
[..]
first: look at the links in this dir: /dev/disk/by-path esp the lines that contain ata3: :> ls -l /dev/disk/by-path |grep ata3
that gives you an idea which drive causes that problem. for more info what drive that is, use the "sd?" drive-id and look at the content of "/dev/disk/by-id" with that id e.g. "sdd" :> ls -l /dev/disk/by-id |grep sdd should give something similar to "ata-Seagate..........." with the product-name ST4000DM004-2CV1 in it, to exactly identify whitch drive.
It's easier to use my script ataid_to_drive.sh[1] to translate the 'ataX[.YY]' ids to /dev/sd* or SCSI HOST:CTRL:ID:LUN ids, and I've not yet found any other tool that does that (not even 'lsblk -O -a' ;)
Telcontar:~ # ataid_to_drive.sh ata3.00 ata3.00 is: Telcontar:~ # It has no instructions, so I don't know what to feed it :-?
If not stop snapper (-timer via systemd), and do so NOW. Power down. Disconnect Drive, maybe also remove it at the same time
I wonder if snapper somehow does an explicit 'ATA_CMD_FLUSH_EXT' (that's the easiest to grep for in the kernel-sources) and the drive barfs on that command and only understands 'ATA_CMD_FLUSH'... I've not looked at the specs or differences...
Done (with 'mc'): /usr/src/linux/include/trace/events/libata.h 2813/11773 23% ata_opcode_name(ATA_CMD_FLUSH_EXT), \ /usr/src/linux/include/linux/ata.h 8458/33975 24% ATA_CMD_FLUSH_EXT = 0xEA, /usr/src/linux/drivers/scsi/hisi_sas/hisi_sas_main.c 3469/86785 3% case ATA_CMD_FLUSH_EXT: /usr/src/linux/drivers/ide/ide-pm.c 5098/7455 68% cmd.tf.command = ATA_CMD_FLUSH_EXT; /usr/src/linux/drivers/ide/ide-disk.c 12562/19789 63% cmd->tf.command = ATA_CMD_FLUSH_EXT; /usr/src/linux/drivers/ide/ide-disk.c 15685/19789 79% cmd.tf.command = ATA_CMD_FLUSH_EXT; /usr/src/linux/drivers/ata/libata-scsi.c 42816/126K 33% tf->command = ATA_CMD_FLUSH_EXT; /usr/src/linux/drivers/ata/libata-eh.c 6561/109K 5% { .commands = CMDS(ATA_CMD_FLUSH, ATA_CMD_FLUSH_EXT), .timeouts = ata_eh_flush_timeouts }, /usr/src/linux/drivers/ata/libata-eh.c 62785/109K 56% { ATA_CMD_FLUSH_EXT, "FLUSH CACHE EXT" }, /usr/src/linux/drivers/ata/libata-eh.c 92983/109K 83% if (qc->dev != dev || (qc->tf.command != ATA_CMD_FLUSH_EXT && qc->tf.command != ATA_CMD_FLUSH)) return 0; Anything interesting there?
Replace the drive with a new one.
Sorry, no better answer. Either the drive has a firmware bug, or a real hw failure the s.m.a.r.t system can not identify.
Apropos: how about the 'smartctl -A' output?
Sure. I intend to trigger a long test before going to sleep tonight, though. Or perhaps, download the seagate test tool and boot it, running the test from CPU instead of disk firmware because I think it tests the cable. Telcontar:~ # smartctl -A /dev/sdc smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.12.14-lp151.28.52-default] (SUSE RPM) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 081 062 006 Pre-fail Always - 139054323 3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 908 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 088 060 045 Pre-fail Always - 701010635 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10395 (128 204 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 906 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 067 061 040 Old_age Always - 33 (Min/Max 25/34) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 314 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1138 194 Temperature_Celsius 0x0022 033 040 000 Old_age Always - 33 (0 16 0 0 0) 195 Hardware_ECC_Recovered 0x001a 081 064 000 Old_age Always - 139054323 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 10360h+29m+01.373s 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 16641668645 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 29138379095 Telcontar:~ # -- Cheers / Saludos, Carlos E. R. (from 15.1 x86_64 at Telcontar)