[Bug 1185536] New: odd smartd message: "Device: /dev/nvme0, number of Error Log entries increased ..."
http://bugzilla.opensuse.org/show_bug.cgi?id=1185536 Bug ID: 1185536 Summary: odd smartd message: "Device: /dev/nvme0, number of Error Log entries increased ..." Classification: openSUSE Product: openSUSE Distribution Version: Leap 15.2 Hardware: Other OS: Other Status: NEW Severity: Minor Priority: P5 - None Component: Basesystem Assignee: screening-team-bugs@suse.de Reporter: Ulrich.Windl@rz.uni-regensburg.de QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- On each boot smartd reports a message like this: May 02 14:35:46 localhost smartd[1754]: Device: /dev/nvme0, number of Error Log entries increased from 362 to 363 May 02 14:35:46 localhost smartd[1754]: Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZALQ512HALU_000L1-S4YCNF1NA09452.nvme.state However when inspecting the error log, I find no error: # nvme error-log /dev/nvme0n1 Error Log Entries for device:nvme0n1 entries:64 ................. Entry[ 0] ................. error_count : 0 sqid : 0 cmdid : 0 status_field : 0(SUCCESS: The command completed successfully) parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. Entry[ 1] ................. error_count : 0 sqid : 0 cmdid : 0 status_field : 0(SUCCESS: The command completed successfully) parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. Entry[ 2] ...and so on... # smartctl -l error /dev/nvme0 smartctl 7.0 2019-05-21 r4917 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF SMART DATA SECTION === Error Information (NVMe Log 0x01, max 64 entries) No Errors Logged smartctl -x reports: === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 27 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 456,995 [233 GB] Data Units Written: 953,428 [488 GB] Host Read Commands: 4,132,869 Host Write Commands: 4,692,798 Controller Busy Time: 17 Power Cycles: 79 Power On Hours: 10 Unsafe Shutdowns: 9 Media and Data Integrity Errors: 0 Error Information Log Entries: 363 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 27 Celsius Error Information (NVMe Log 0x01, max 64 entries) No Errors Logged -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185536 http://bugzilla.opensuse.org/show_bug.cgi?id=1185536#c2 --- Comment #2 from Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> --- (In reply to Stanislav Brabec from comment #1) Thanks for the explanation! I applied this change, then restarted smartd.service: diff -u -r1.1 ./smartd.conf --- ./smartd.conf 2020/05/16 17:43:39 1.1 +++ ./smartd.conf 2021/05/12 06:26:22 @@ -21,7 +21,8 @@ # (Takes several minutes.) # -s L/: Run Extended Self Test every first Sunday in the # month. (Start earlier, it could take tens of hours.) -DEFAULT -d removable -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01) +#DEFAULT -d removable -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01) +DEFAULT -d nvme -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01) # The word DEVICESCAN will cause any remaining lines in this # configuration file to be ignored: it tells smartd to scan for all (I also had tried to remove the "DEVICESCAN" (thinking DEFAULT would cover the devices to examine) on line 33, but that seems to break startup. So I reverted that change) I'll have a look at next boot, because restarting smartd did not increase the "error count". One remark on the SSD statistics: I received the notebook with Windows 10 preconfigured, so I' don't know what had happened in the past. Maybe the battery just ran out of power a few times... -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185536 http://bugzilla.opensuse.org/show_bug.cgi?id=1185536#c3 --- Comment #3 from Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> --- (In reply to Ulrich Windl from comment #2)
-DEFAULT -d removable -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01) +#DEFAULT -d removable -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01) +DEFAULT -d nvme -s (S/../.././03|L/../(01|02|03|04|05|06|07)/7/01)
The change above did not change things (one cold boot since then): May 17 07:38:32 localhost smartd[1794]: Device: /dev/nvme0, number of Error Log entries increased from 373 to 374 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1185536 http://bugzilla.opensuse.org/show_bug.cgi?id=1185536#c4 Stanislav Brabec <sbrabec@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(Ulrich.Windl@rz.u | |ni-regensburg.de) | --- Comment #4 from Stanislav Brabec <sbrabec@suse.com> --- This does not look as a smartd issue. smartd correctly detects NVMe device, so configuration changes don't help. smartd just report the ugly fact that something attempts to access NVMe disc using an incompatible commands. You can only suppress the message in the smard. It is very probably not a hardware issue as well. But it is probably a software or software configuration issue. There is a piece of software that uses incompatible commands. It could be anything. Here are some possibilities: - A HDD monitoring utility (e. g. widget) using SCSI/SATA protocol. - BIOS - Some type of autodetection during the boot process. I am able to reproduce this issue. In my case, every reboot consistently increases the number by 3. So I have to check, which part of the reboot causes that behavior. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@suse.com