On Monday 05 October 2015 14.59:38 Michal Kubecek wrote:
On Sun, Oct 04, 2015 at 07:20:19PM +0200, Bruno Friedmann wrote:
Hi All, I'm trying to understand why since the 4.2x series hit the kernel I've one machine getting kernel crash ...
Today I was enough lucky the grab one trace. ... Oct 04 18:39:59 yoda kernel: CPU: 0 PID: 3998 Comm: named Tainted: G D 4.2.3-1.gef1562d-default #1
In general, it's preferrable to show the first oops as the others may be just follow-ups. However, in this case the log shows that first oops looks the same.
Oct 04 18:39:59 yoda kernel: RIP: 0010:[<ffffffffa082b4b6>] [<ffffffffa082b4b6>] __nf_conntrack_alloc+0x76/0x320 [nf_conntrack] Oct 04 18:39:59 yoda kernel: RSP: 0018:ffff8807dce3f908 EFLAGS: 00010282 Oct 04 18:39:59 yoda kernel: RAX: ffff88017c56e240 RBX: 0000000000000000 RCX: ffff8807dce3f9c8 Oct 04 18:39:59 yoda kernel: RDX: 0000000000000000 RSI: ffffe8ffffc13be8 RDI: 0000000000000202 Oct 04 18:39:59 yoda kernel: RBP: ffff8807dce3f948 R08: 0000000000000020 R09: 0000000099b8b092 Oct 04 18:39:59 yoda kernel: R10: 0000000000000024 R11: 000000001e5d2ac0 R12: ffffffff81ed3e40 Oct 04 18:39:59 yoda kernel: R13: ffff8807dce3f9a0 R14: ffff8807dce3f9c8 R15: ffff88017c56e240 Oct 04 18:39:59 yoda kernel: FS: 00007f8f203fd700(0000) GS:ffff88082ec00000(0000) knlGS:0000000000000000 Oct 04 18:39:59 yoda kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 04 18:39:59 yoda kernel: CR2: ffff88017c56e244 CR3: 00000007e626b000 CR4: 00000000000406f0
This is strange... it happened here:
ct = kmem_cache_alloc(net->ct.nf_conntrack_cachep, gfp); if (ct == NULL) { atomic_dec(&net->ct.count); return ERR_PTR(-ENOMEM); } spin_lock_init(&ct->lock);
Apparently ct is not null but points to an unmapped page. This looks like some corruption of the slab cache. You might try mainline commit
9cf94eab8b30 netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9c...)
but I can't say if the problem addressed by it could cause this kind of outcome.
Michal Kubeček
Thanks for the pointer, the error nf_conntrack: table full, dropping packet was seen a lot in the 4.2x kernel I've tried when starting normally the server. Here the crash happened in named which was the only network service running. So the the good news, is there's already a fix, and somebody else have seen that kind of problem. I'm not sure to be able to test it before next week-end (including rebuilding the kernel). If this commit follow its way to 4.2.4 in the same time, I can way until it hits build.o.o Do you think it is gainful to still report upstream, Or at least open a bug on b.o.o (just to keep a pointer too?) -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member & Board, fsfe fellowship GPG KEY : D5C9B751C4653227 irc: tigerfoot -- To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org