[Bug 1011250] New: unable to handle kernel NULL pointer dereference
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 Bug ID: 1011250 Summary: unable to handle kernel NULL pointer dereference Classification: openSUSE Product: openSUSE Distribution Version: Leap 42.2 Hardware: x86-64 OS: Other Status: NEW Severity: Major Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: jweberhofer@weberhofer.at QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- Since upgrading to Leap 42.2 I see those exceptions a few times every day. The bad thing is, that I loose connections to several NFS shares andy I have to reboot the workstations. Nov 21 12:06:25 c-web3 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 Nov 21 12:06:25 c-web3 kernel: IP: [<ffffffffa053a8b0>] rpc_pipe_read+0x110/0x170 [sunrpc] Nov 21 12:06:25 c-web3 kernel: PGD 0 Nov 21 12:06:25 c-web3 kernel: Oops: 0002 [#1] SMP Nov 21 12:06:25 c-web3 kernel: Modules linked in: cmac ecb rfcomm fuse rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache af_packet bnep iscsi_ibft Nov 21 12:06:25 c-web3 kernel: fjes mei_me mei acpi_pad processor auth_rpcgss sunrpc btrfs xor raid6_pq sd_mod hid_generic usbhid crc32c_intel serio_raw i Nov 21 12:06:25 c-web3 kernel: CPU: 0 PID: 955 Comm: rpc.gssd Not tainted 4.4.27-2-default #1 Nov 21 12:06:25 c-web3 kernel: Hardware name: HP HP EliteDesk 800 G2 DM 65W/8056, BIOS N21 Ver. 02.20 08/08/2016 Nov 21 12:06:25 c-web3 kernel: task: ffff88042aeb4240 ti: ffff880420288000 task.ti: ffff880420288000 Nov 21 12:06:25 c-web3 kernel: RIP: 0010:[<ffffffffa053a8b0>] [<ffffffffa053a8b0>] rpc_pipe_read+0x110/0x170 [sunrpc] Nov 21 12:06:25 c-web3 kernel: RSP: 0018:ffff88042028be38 EFLAGS: 00010212 Nov 21 12:06:25 c-web3 kernel: RAX: ffff880422b3a1c8 RBX: ffff88017df12c08 RCX: 0000000000000000 Nov 21 12:06:25 c-web3 kernel: RDX: ffff88017df12c08 RSI: 0000000000000000 RDI: ffff880422b3a1c8 Nov 21 12:06:25 c-web3 kernel: RBP: ffff880422b3a100 R08: 0000000000000000 R09: 0000000000000000 Nov 21 12:06:25 c-web3 kernel: R10: 00007fac27c40670 R11: 0000000000000246 R12: ffff88042ed7ec00 Nov 21 12:06:25 c-web3 kernel: R13: ffff8800b69e3da8 R14: 000055981dee9d70 R15: 0000000000000800 Nov 21 12:06:25 c-web3 kernel: FS: 00007fac2878f840(0000) GS:ffff88043f400000(0000) knlGS:0000000000000000 Nov 21 12:06:25 c-web3 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 21 12:06:25 c-web3 kernel: CR2: 0000000000000008 CR3: 00000004207b3000 CR4: 00000000003406f0 Nov 21 12:06:25 c-web3 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 21 12:06:25 c-web3 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 21 12:06:25 c-web3 kernel: Stack: Nov 21 12:06:25 c-web3 kernel: ffff880422b3a1c8 ffff88042ed7ec00 ffff88042028bf28 ffff88042028bf28 Nov 21 12:06:25 c-web3 kernel: 0000000000000800 000055981d6cf580 0000000000000000 ffffffff81204be3 Nov 21 12:06:25 c-web3 kernel: ffff880000000000 ffff88042ed7ec00 0000000000000000 0000000000000000 Nov 21 12:06:25 c-web3 kernel: Call Trace: Nov 21 12:06:25 c-web3 kernel: [<ffffffff81204be3>] __vfs_read+0x23/0xe0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8120520a>] vfs_read+0x7a/0x120 Nov 21 12:06:25 c-web3 kernel: [<ffffffff81205f72>] SyS_read+0x42/0xa0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff816093f2>] entry_SYSCALL_64_fastpath+0x16/0x71 Nov 21 12:06:25 c-web3 kernel: DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71 Nov 21 12:06:25 c-web3 kernel: Leftover inexact backtrace: Nov 21 12:06:25 c-web3 kernel: Code: 93 48 8d 85 c8 00 00 00 48 89 c7 48 89 04 24 e8 a7 e7 0c e1 48 8b 55 00 48 8b 04 24 48 39 d5 74 3f 48 8b 4a 08 48 8b 3 Nov 21 12:06:25 c-web3 kernel: RIP [<ffffffffa053a8b0>] rpc_pipe_read+0x110/0x170 [sunrpc] Nov 21 12:06:25 c-web3 kernel: RSP <ffff88042028be38> Nov 21 12:06:25 c-web3 kernel: CR2: 0000000000000008 Nov 21 12:06:25 c-web3 kernel: ---[ end trace 716ccaec05398c65 ]--- Nov 21 12:06:25 c-web3 kernel: BUG: sleeping function called from invalid context at ../include/linux/sched.h:2862 Nov 21 12:06:25 c-web3 kernel: in_atomic(): 1, irqs_disabled(): 1, pid: 955, name: rpc.gssd Nov 21 12:06:25 c-web3 kernel: CPU: 0 PID: 955 Comm: rpc.gssd Tainted: G D 4.4.27-2-default #1 Nov 21 12:06:25 c-web3 kernel: Hardware name: HP HP EliteDesk 800 G2 DM 65W/8056, BIOS N21 Ver. 02.20 08/08/2016 Nov 21 12:06:25 c-web3 kernel: 0000000000000000 ffffffff81327657 ffff88042aeb4240 00000000000003bb Nov 21 12:06:25 c-web3 kernel: ffffffff8108dcf1 0000000000000046 0000000000000009 00000000000003bb Nov 21 12:06:25 c-web3 kernel: ffffffff810812b6 0000000000030001 ffffffff00000000 0000000000000010 Nov 21 12:06:25 c-web3 kernel: Call Trace: Nov 21 12:06:25 c-web3 kernel: [<ffffffff81019e69>] dump_trace+0x59/0x320 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8101a22a>] show_stack_log_lvl+0xfa/0x180 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8101afd1>] show_stack+0x21/0x40 Nov 21 12:06:25 c-web3 kernel: [<ffffffff81327657>] dump_stack+0x5c/0x85 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8108dcf1>] exit_signals+0x21/0x130 Nov 21 12:06:25 c-web3 kernel: [<ffffffff810812b6>] do_exit+0xb6/0xba0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8101a8ec>] oops_end+0x9c/0xd0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff81065b97>] no_context+0x107/0x370 Nov 21 12:06:25 c-web3 kernel: [<ffffffff810669ab>] do_page_fault+0x2b/0x70 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8160b848>] page_fault+0x28/0x30 Nov 21 12:06:25 c-web3 kernel: DWARF2 unwinder stuck at page_fault+0x28/0x30 Nov 21 12:06:25 c-web3 kernel: Nov 21 12:06:25 c-web3 kernel: Leftover inexact backtrace: Nov 21 12:06:25 c-web3 kernel: [<ffffffffa053a8b0>] ? rpc_pipe_read+0x110/0x170 [sunrpc] Nov 21 12:06:25 c-web3 kernel: [<ffffffffa053a899>] ? rpc_pipe_read+0xf9/0x170 [sunrpc] Nov 21 12:06:25 c-web3 kernel: [<ffffffff81204be3>] ? __vfs_read+0x23/0xe0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff81205119>] ? rw_verify_area+0x49/0xc0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff8120520a>] ? vfs_read+0x7a/0x120 Nov 21 12:06:25 c-web3 kernel: [<ffffffff81205f72>] ? SyS_read+0x42/0xa0 Nov 21 12:06:25 c-web3 kernel: [<ffffffff816093f2>] ? entry_SYSCALL_64_fastpath+0x16/0x71 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c1 Takashi Iwai <tiwai@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jweberhofer@weberhofer.at, | |nfbrown@suse.com, | |tiwai@suse.com Flags| |needinfo?(jweberhofer@weber | |hofer.at) --- Comment #1 from Takashi Iwai <tiwai@suse.com> --- Could you give the full kernel messages? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c2 --- Comment #2 from Neil Brown <nfbrown@suse.com> --- Looks like this code: if (!list_empty(&pipe->pipe)) { msg = list_entry(pipe->pipe.next, struct rpc_pipe_msg, list); in rpc_pipe_read(). pipe->pipe.next appears to be NULL. Are you using kerberos for NFS authentication? Are you doing anything else that might be considered "non-standard"? containers? automounts? anything. use-after-free seems most likely to me, but the code looks solid, and there are no upstream patches that might relate to this. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c3 Johannes Weberhofer <jweberhofer@weberhofer.at> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(jweberhofer@weber | |hofer.at) | --- Comment #3 from Johannes Weberhofer <jweberhofer@weberhofer.at> --- (In reply to Takashi Iwai from comment #1)
Could you give the full kernel messages?
How can I? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c4 --- Comment #4 from Johannes Weberhofer <jweberhofer@weberhofer.at> --- (In reply to Neil Brown from comment #2)
Looks like this code: if (!list_empty(&pipe->pipe)) { msg = list_entry(pipe->pipe.next, struct rpc_pipe_msg, list);
in rpc_pipe_read(). pipe->pipe.next appears to be NULL.
Are you using kerberos for NFS authentication? Are you doing anything else that might be considered "non-standard"? containers? automounts? anything.
use-after-free seems most likely to me, but the code looks solid, and there are no upstream patches that might relate to this.
Yes: LDAP, kerberos, NFS and autmount. Alltogehther. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c5 --- Comment #5 from Takashi Iwai <tiwai@suse.com> --- (In reply to Johannes Weberhofer from comment #3)
(In reply to Takashi Iwai from comment #1)
Could you give the full kernel messages?
How can I?
Just give the output of dmesg or journalctl. (But don't paste but attach to Bugzilla.) -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c16 --- Comment #16 from Johannes Weberhofer <jweberhofer@weberhofer.at> --- I haven't seen any of the problems in the last ten days, so I think this issue has been fixed. Thank you! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c21 olga kornievskaia <kolga@netapp.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kolga@netapp.com --- Comment #21 from olga kornievskaia <kolga@netapp.com> --- (In reply to Neil Brown from comment #17)
Thanks for your testing, and for reporting in the first place. I've submitted the patch to the SLE12-SP2 kernel, which feeds into Leap-42.2. So it should be in the next maintenance update. The patch has been sent upstream too and looks likely to be accepted.
Is there a plan to backport this to SLE11-SP4? I have a report of this on UCS-bld2-20-125:/var/crash/2018-02-22-11:33 # uname -a Linux UCS-bld2-20-125 4.4.21-69-default #1 SMP Tue Oct 25 10:58:20 UTC 2016 (9464f67) x86_64 x86_64 x86_64 GNU/Linux [ 2837.033389] CPU: 3 PID: 4618 Comm: rpc.gssd Tainted: G I 4.4.21-69-default #1 [ 2837.033390] Hardware name: Cisco Systems Inc N20-B6620-1/N20-B6620-1, BIOS S5500.1.3.1c.0.052020101544 05/20/2010 [ 2837.033391] task: ffff880a34a39880 ti: ffff880a460d4000 task.ti: ffff880a460d4000 [ 2837.033392] RIP: 0010:[<ffffffffa078febd>] [ 2837.033428] [<ffffffffa078febd>] rpc_pipe_read+0x10d/0x160 [sunrpc] I checked out rpm-4.4.21-89 (or the -90-dirty) I don't see that the fix went in there. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c22 --- Comment #22 from Paul Zirnik <paul.zirnik@suse.com> --- Sure it is SLES11-SP4 ? latest SLES11-SP4 kernel is 3.0.101-108.35.1 The version 4.4.21-69 looks like SLES12-SP2 GA the fix went in with version >= 4.4.38-93.1, latest version is 4.4.114-92.67.1 .... so just applying updates will help already or did i missed something ? -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.opensuse.org/show_bug.cgi?id=1011250 http://bugzilla.opensuse.org/show_bug.cgi?id=1011250#c23 --- Comment #23 from olga kornievskaia <kolga@netapp.com> --- (In reply to Paul Zirnik from comment #22)
Sure it is SLES11-SP4 ?
latest SLES11-SP4 kernel is 3.0.101-108.35.1
The version 4.4.21-69 looks like SLES12-SP2 GA the fix went in with version
= 4.4.38-93.1, latest version is 4.4.114-92.67.1 .... so just applying updates will help already or did i missed something ?
I apologize I wrongly assumed 4.4.21-69 was SLE11-SP4. Thank you for clarifying all the current kernel versions. -- You are receiving this mail because: You are on the CC list for the bug.
participants (1)
-
bugzilla_noreply@novell.com