On 10/07/2014 08:34 AM, Ludwig Nussel wrote:
Hi,
Running current Factory kernel (3.16.3-1.gd2bbe7f-desktop) I have the following messages in dmesg:
[20727.025399] sas: Enter sas_scsi_recover_host busy: 6 failed: 6 [20727.025407] sas: trying to find task 0xffff8800375776c0 [20727.025410] sas: sas_scsi_find_task: aborting task 0xffff8800375776c0 [20727.025418] isci 0000:05:00.0: isci_task_abort_task: dev = (null) (STP/SATA <NULL>), task = ffff8800375776c0, old_request == (null) [20727.025421] isci 0000:05:00.0: isci_task_abort_task: abort task not needed for ffff8800375776c0 [20727.025425] isci 0000:05:00.0: isci_task_abort_task: Done; dev = (null), task = ffff8800375776c0 , old_request == (null) [20727.025428] sas: sas_scsi_find_task: task 0xffff8800375776c0 is done [20727.025430] sas: sas_eh_handle_sas_errors: task 0xffff8800375776c0 is done [20727.025433] sas: trying to find task 0xffff880037577440 [20727.025435] sas: sas_scsi_find_task: aborting task 0xffff880037577440 [20727.025439] isci 0000:05:00.0: isci_task_abort_task: dev = (null) (STP/SATA <NULL>), task = ffff880037577440, old_request == (null) [20727.025442] isci 0000:05:00.0: isci_task_abort_task: abort task not needed for ffff880037577440 [20727.025446] isci 0000:05:00.0: isci_task_abort_task: Done; dev = (null), task = ffff880037577440 , old_request == (null) [20727.025448] sas: sas_scsi_find_task: task 0xffff880037577440 is done [20727.025450] sas: sas_eh_handle_sas_errors: task 0xffff880037577440 is done [20727.025452] sas: trying to find task 0xffff880037577940 [20727.025454] sas: sas_scsi_find_task: aborting task 0xffff880037577940 ... [20727.025528] sas: ata7: end_device-6:0: cmd error handler [20727.025602] sas: ata7: end_device-6:0: dev error handler [20727.025615] ata7.00: exception Emask 0x0 SAct 0x7e0 SErr 0x0 action 0x6 frozen [20727.025620] ata7.00: failed command: WRITE FPDMA QUEUED [20727.025628] ata7.00: cmd 61/40:00:d8:03:1b/00:00:45:00:00/40 tag 5 ncq 32768 out res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [20727.025631] ata7.00: status: { DRDY } That's an NCQ failure, most likely TLER issue (time-limited error recovery). IE the device encountered an error which the internal error recovery couldn't fix up.
And yes, the ATA stack doesn't handle that one well ...
... [20727.025688] ata7.00: status: { DRDY } [20727.025694] ata7: hard resetting link [20727.219155] ata7.00: configured for UDMA/133 [20727.219164] ata7.00: device reported invalid CHS sector 0 [20727.219167] ata7.00: device reported invalid CHS sector 0 [20727.219170] ata7.00: device reported invalid CHS sector 0 [20727.219173] ata7.00: device reported invalid CHS sector 0 [20727.219176] ata7.00: device reported invalid CHS sector 0 [20727.219178] ata7.00: device reported invalid CHS sector 0 [20727.219212] ata7: EH complete [20727.219262] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1 [40650.589614] perf interrupt took too long (2520 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
Is the disk dying (smartctl output attached) or is it a kernel bug?
It's on it way out: 1 Raw_Read_Error_Rate 0x000f 112 099 006 Pre-fail Always - 46434016 A high raw read error rate _is_ worrying. 7 Seek_Error_Rate 0x000f 073 060 030 Pre-fail Always - 17265656910 And a high seek error rate even more so. Get a new disk. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org