[opensuse-kernel] Any complaints about 4.4.57-18.3-default stability?
Hi! On my OpenSUSE 42.2 I updated the kernel on Monday Linux linux.suse 4.4.57-18.3-default #1 SMP Thu Mar 30 06:39:47 UTC 2017 (39c8557) x86_64 x86_64 x86_64 GNU/Linux On Tuesday after ~5 hours light interactive usage the machine hung with all LEDs blinking. Some machine check? On Wednesday ~10 usage hours and 1 suspend to disk later I got an xfs internal error (during some heavy file operations). Apr 04 11:17:32 linux kernel: Linux version 4.4.57-18.3-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Mar 30 06:39:47 UTC 2017 (39c8557) ... Apr 05 16:42:10 linux.suse nm-dispatcher[11697]: Dispatching action 'dhcp4-change' for eth0 Apr 05 16:42:30 linux.suse kernel: XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file ../fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x3ce/0x820 [xfs] Apr 05 16:42:30 linux.suse kernel: CPU: 2 PID: 11736 Comm: git Tainted: G O 4.4.57-18.3-default #1 Apr 05 16:42:30 linux.suse kernel: Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A16 12/05/2013 Apr 05 16:42:30 linux.suse kernel: 0000000000000000 ffffffff81328787 ffff8800bb8ceb78 ffff8800c932bc60 Apr 05 16:42:30 linux.suse kernel: ffffffffa0ad4bcc ffff8800c930d000 00000000cb19a540 00000000ffffffff Apr 05 16:42:30 linux.suse kernel: 0000000000000000 0400000000c32000 ffff880108319400 ffffffffa0abb3e6 Apr 05 16:42:30 linux.suse kernel: Call Trace: Apr 05 16:42:30 linux.suse kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Apr 05 16:42:30 linux.suse kernel: [<ffffffff81328787>] dump_stack+0x5c/0x85 Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0ad4bcc>] xfs_btree_insert+0x17c/0x190 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0ab984e>] xfs_free_ag_extent+0x3ce/0x820 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0abace1>] xfs_free_extent+0xd1/0x100 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b26171>] xfs_trans_free_extent+0x21/0x50 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0af3668>] xfs_bmap_finish+0xf8/0x120 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0b0ba>] xfs_itruncate_extents+0x11a/0x260 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0b281>] xfs_inactive_truncate+0x81/0xe0 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0c02f>] xfs_inactive+0x13f/0x160 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffff81220191>] evict+0xc1/0x1a0 Apr 05 16:42:30 linux.suse kernel: [<ffffffff812154ef>] do_unlinkat+0x18f/0x2b0 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8160af32>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:30 linux.suse kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:30 linux.suse kernel: Apr 05 16:42:30 linux.suse kernel: Leftover inexact backtrace: Apr 05 16:42:30 linux.suse kernel: XFS (dm-4): Internal error xfs_trans_cancel at line 990 of file ../fs/xfs/xfs_trans.c. Caller xfs_inactive_truncate+0xb1/0xe0 [xfs] Apr 05 16:42:31 linux.suse kernel: CPU: 2 PID: 11736 Comm: git Tainted: G O 4.4.57-18.3-default #1 Apr 05 16:42:31 linux.suse kernel: Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A16 12/05/2013 Apr 05 16:42:31 linux.suse kernel: 0000000000000000 ffffffff81328787 ffff8800bf6ec880 0000000000000001 Apr 05 16:42:31 linux.suse kernel: ffffffffa0b15ced Apr 05 16:42:31 linux.suse kernel: ffff880025ae3400 00000000ffffff8b ffffffffa0b36640 Apr 05 16:42:31 linux.suse kernel: ffffffffa0b0b2b1 ffff8800bf6ec880 ffff880025ae3400 0000000000000001 Apr 05 16:42:31 linux.suse kernel: Call Trace: Apr 05 16:42:31 linux.suse kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Apr 05 16:42:31 linux.suse kernel: [<ffffffff81328787>] dump_stack+0x5c/0x85 Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b15ced>] xfs_trans_cancel+0xad/0xd0 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b0b2b1>] xfs_inactive_truncate+0xb1/0xe0 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b0c02f>] xfs_inactive+0x13f/0x160 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffff81220191>] evict+0xc1/0x1a0 Apr 05 16:42:31 linux.suse kernel: [<ffffffff812154ef>] do_unlinkat+0x18f/0x2b0 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8160af32>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:31 linux.suse kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:31 linux.suse kernel: Apr 05 16:42:31 linux.suse kernel: Leftover inexact backtrace: Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 991 of file ../fs/xfs/xfs_trans.c. Return address = 0xffffffffa0b15d06 Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): Please umount the filesystem and rectify the problem(s) Apr 05 16:42:33 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:43:03 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:43:33 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:44:03 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:44:34 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. (xfs_repair with dropping logs was required to recover) There is 1 xfs change in the changelog. No idea whether it even theoretically could cause this problem. But I guess many kinds of memory corruption caused by any other change could also cause an XFS corruption. Yes, the machine is old and it might break some day. S.M.A.R.T selftest passes, haven't had time to run memtest yet. However, that the problems occured just hours after a kernel update makes me a bit suspicious. Earlier I had ~1 crash per year on this machine. Any other observations of instability with the new kernel? (No answer also is an answer in this case.) Regards, Uwe Geuder Nomovok Ltd. Tampere, Finland uwe.gXuder@nomovok.com (bot check: correct 1 obvious spelling error) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Hi, On Wed 05-04-17 23:58:19, Uwe Geuder wrote:
On my OpenSUSE 42.2 I updated the kernel on Monday
Linux linux.suse 4.4.57-18.3-default #1 SMP Thu Mar 30 06:39:47 UTC 2017 (39c8557) x86_64 x86_64 x86_64 GNU/Linux
On Tuesday after ~5 hours light interactive usage the machine hung with all LEDs blinking. Some machine check?
May be just hard kernel panic... But without more info it is impossible to debug.
On Wednesday ~10 usage hours and 1 suspend to disk later I got an xfs internal error (during some heavy file operations).
Please file a bug for this. Thanks! Honza
Apr 04 11:17:32 linux kernel: Linux version 4.4.57-18.3-default (geeko@buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Thu Mar 30 06:39:47 UTC 2017 (39c8557) ... Apr 05 16:42:10 linux.suse nm-dispatcher[11697]: Dispatching action 'dhcp4-change' for eth0 Apr 05 16:42:30 linux.suse kernel: XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3156 of file ../fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x3ce/0x820 [xfs] Apr 05 16:42:30 linux.suse kernel: CPU: 2 PID: 11736 Comm: git Tainted: G O 4.4.57-18.3-default #1 Apr 05 16:42:30 linux.suse kernel: Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A16 12/05/2013 Apr 05 16:42:30 linux.suse kernel: 0000000000000000 ffffffff81328787 ffff8800bb8ceb78 ffff8800c932bc60 Apr 05 16:42:30 linux.suse kernel: ffffffffa0ad4bcc ffff8800c930d000 00000000cb19a540 00000000ffffffff Apr 05 16:42:30 linux.suse kernel: 0000000000000000 0400000000c32000 ffff880108319400 ffffffffa0abb3e6 Apr 05 16:42:30 linux.suse kernel: Call Trace: Apr 05 16:42:30 linux.suse kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Apr 05 16:42:30 linux.suse kernel: [<ffffffff81328787>] dump_stack+0x5c/0x85 Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0ad4bcc>] xfs_btree_insert+0x17c/0x190 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0ab984e>] xfs_free_ag_extent+0x3ce/0x820 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0abace1>] xfs_free_extent+0xd1/0x100 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b26171>] xfs_trans_free_extent+0x21/0x50 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0af3668>] xfs_bmap_finish+0xf8/0x120 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0b0ba>] xfs_itruncate_extents+0x11a/0x260 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0b281>] xfs_inactive_truncate+0x81/0xe0 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffffa0b0c02f>] xfs_inactive+0x13f/0x160 [xfs] Apr 05 16:42:30 linux.suse kernel: [<ffffffff81220191>] evict+0xc1/0x1a0 Apr 05 16:42:30 linux.suse kernel: [<ffffffff812154ef>] do_unlinkat+0x18f/0x2b0 Apr 05 16:42:30 linux.suse kernel: [<ffffffff8160af32>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:30 linux.suse kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:30 linux.suse kernel: Apr 05 16:42:30 linux.suse kernel: Leftover inexact backtrace: Apr 05 16:42:30 linux.suse kernel: XFS (dm-4): Internal error xfs_trans_cancel at line 990 of file ../fs/xfs/xfs_trans.c. Caller xfs_inactive_truncate+0xb1/0xe0 [xfs] Apr 05 16:42:31 linux.suse kernel: CPU: 2 PID: 11736 Comm: git Tainted: G O 4.4.57-18.3-default #1 Apr 05 16:42:31 linux.suse kernel: Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A16 12/05/2013 Apr 05 16:42:31 linux.suse kernel: 0000000000000000 ffffffff81328787 ffff8800bf6ec880 0000000000000001 Apr 05 16:42:31 linux.suse kernel: ffffffffa0b15ced Apr 05 16:42:31 linux.suse kernel: ffff880025ae3400 00000000ffffff8b ffffffffa0b36640 Apr 05 16:42:31 linux.suse kernel: ffffffffa0b0b2b1 ffff8800bf6ec880 ffff880025ae3400 0000000000000001 Apr 05 16:42:31 linux.suse kernel: Call Trace: Apr 05 16:42:31 linux.suse kernel: [<ffffffff81019ea9>] dump_trace+0x59/0x320 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8101a26a>] show_stack_log_lvl+0xfa/0x180 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8101b011>] show_stack+0x21/0x40 Apr 05 16:42:31 linux.suse kernel: [<ffffffff81328787>] dump_stack+0x5c/0x85 Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b15ced>] xfs_trans_cancel+0xad/0xd0 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b0b2b1>] xfs_inactive_truncate+0xb1/0xe0 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffffa0b0c02f>] xfs_inactive+0x13f/0x160 [xfs] Apr 05 16:42:31 linux.suse kernel: [<ffffffff81220191>] evict+0xc1/0x1a0 Apr 05 16:42:31 linux.suse kernel: [<ffffffff812154ef>] do_unlinkat+0x18f/0x2b0 Apr 05 16:42:31 linux.suse kernel: [<ffffffff8160af32>] entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:31 linux.suse kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 Apr 05 16:42:31 linux.suse kernel: Apr 05 16:42:31 linux.suse kernel: Leftover inexact backtrace: Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 991 of file ../fs/xfs/xfs_trans.c. Return address = 0xffffffffa0b15d06 Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem Apr 05 16:42:31 linux.suse kernel: XFS (dm-4): Please umount the filesystem and rectify the problem(s) Apr 05 16:42:33 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:43:03 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:43:33 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:44:03 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned. Apr 05 16:44:34 linux.suse kernel: XFS (dm-4): xfs_log_force: error -5 returned.
(xfs_repair with dropping logs was required to recover)
There is 1 xfs change in the changelog. No idea whether it even theoretically could cause this problem. But I guess many kinds of memory corruption caused by any other change could also cause an XFS corruption.
Yes, the machine is old and it might break some day. S.M.A.R.T selftest passes, haven't had time to run memtest yet.
However, that the problems occured just hours after a kernel update makes me a bit suspicious. Earlier I had ~1 crash per year on this machine. Any other observations of instability with the new kernel? (No answer also is an answer in this case.)
Regards,
Uwe Geuder Nomovok Ltd. Tampere, Finland uwe.gXuder@nomovok.com (bot check: correct 1 obvious spelling error) -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
-- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
Someone else posted problems on the opensuse list about the new crc option, a few days ago on Leap 42.2, but they didn't give enough detail to tell if it was the same bug. -linda -------- Original Message -------- Subject: XFS and CRC on Leap 42.2 Date: Sun, 2 Apr 2017 15:56:05 -0400 From: Ciro Iriarte To: OpenSuse List Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1. Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode. Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/ Regards, -- Ciro Iriarte http://iriarte.it -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
On Fri, 07 Apr 2017 00:18:45 +0200, L A Walsh wrote:
Someone else posted problems on the opensuse list about the new crc option, a few days ago on Leap 42.2, but they didn't give enough detail to tell if it was the same bug.
If anyone has a problem with the last update kernel, please try KOTD available in OBS Kernel:openSUSE-42.2 repo, which is the build from the latest git snapshot. http://download.opensuse.org/repositories/Kernel:/openSUSE-42.2/ And, don't hesitate to report to openSUSE bugzilla. (But the test with KOTD will be asked in anyway there :) If the reported problem is serious, we can re-submit the newer update kernel quickly, just submitting from the working KOTD. thanks, Takashi
-linda
-------- Original Message -------- Subject: XFS and CRC on Leap 42.2 Date: Sun, 2 Apr 2017 15:56:05 -0400 From: Ciro Iriarte To: OpenSuse List
Hi!, can anyone comment on XFS stability on Leap 42.2 using crc=1?. I've read that crc=1 should be the new default. I created a XFS filesystem with explicit crc=1.
Everything was working fine until I used that filesystem as a NFS repository for ESXi, tried to move some VMs to it and had a complete hang, hours later the filesystem went to read only mode.
Trying to repair the FS I get CRC errors on the pending changelog. I can purge the log, as the entries reference the directories of the VMs I tried to move but I'm a little worried about keeping it like that. I've just moved from BTRFS trying to avoid this issues :/
Regards,
-- Ciro Iriarte http://iriarte.it
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
-- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org
participants (4)
-
Jan Kara
-
L A Walsh
-
Takashi Iwai
-
Uwe Geuder