Re: [opensuse-kernel] Spurious crash since 4.2 kernel series

6 Oct 2015

      On Monday 05 October 2015 14.59:38 Michal Kubecek wrote:
...
On Sun, Oct 04, 2015 at 07:20:19PM +0200, Bruno Friedmann wrote:
...
Hi All, I'm trying to understand why since the 4.2x series hit the kernel 
I've one machine getting kernel crash ...
Today I was enough lucky the grab one trace.
... 
Oct 04 18:39:59 yoda kernel: CPU: 0 PID: 3998 Comm: named Tainted: G      D         4.2.3-1.gef1562d-default #1
In general, it's preferrable to show the first oops as the others may be
just follow-ups. However, in this case the log shows that first oops
looks the same.
...
Oct 04 18:39:59 yoda kernel: RIP: 0010:[<ffffffffa082b4b6>]  [<ffffffffa082b4b6>] __nf_conntrack_alloc+0x76/0x320 [nf_conntrack]
Oct 04 18:39:59 yoda kernel: RSP: 0018:ffff8807dce3f908  EFLAGS: 00010282
Oct 04 18:39:59 yoda kernel: RAX: ffff88017c56e240 RBX: 0000000000000000 RCX: ffff8807dce3f9c8
Oct 04 18:39:59 yoda kernel: RDX: 0000000000000000 RSI: ffffe8ffffc13be8 RDI: 0000000000000202
Oct 04 18:39:59 yoda kernel: RBP: ffff8807dce3f948 R08: 0000000000000020 R09: 0000000099b8b092
Oct 04 18:39:59 yoda kernel: R10: 0000000000000024 R11: 000000001e5d2ac0 R12: ffffffff81ed3e40
Oct 04 18:39:59 yoda kernel: R13: ffff8807dce3f9a0 R14: ffff8807dce3f9c8 R15: ffff88017c56e240
Oct 04 18:39:59 yoda kernel: FS:  00007f8f203fd700(0000) GS:ffff88082ec00000(0000) knlGS:0000000000000000
Oct 04 18:39:59 yoda kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 04 18:39:59 yoda kernel: CR2: ffff88017c56e244 CR3: 00000007e626b000 CR4: 00000000000406f0
This is strange... it happened here:
ct = kmem_cache_alloc(net->ct.nf_conntrack_cachep, gfp);
        if (ct == NULL) {
                atomic_dec(&net->ct.count);
                return ERR_PTR(-ENOMEM);
        }
        spin_lock_init(&ct->lock);
Apparently ct is not null but points to an unmapped page. This looks
like some corruption of the slab cache. You might try mainline commit
9cf94eab8b30  netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths
  (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9c...)
but I can't say if the problem addressed by it could cause this kind of
outcome.
Michal Kubeček
Thanks for the pointer, the error nf_conntrack: table full, dropping packet 
was seen a lot in the 4.2x kernel I've tried when starting normally the server. 

Here the crash happened in named which was the only network service running.

So the the good news, is there's already a fix, and somebody else have seen
that kind of problem.

I'm not sure to be able to test it before next week-end (including rebuilding
the kernel). If this commit follow its way to 4.2.4 in the same time,
I can way until it hits build.o.o

Do you think it is gainful to still report upstream,
Or at least open a bug on b.o.o (just to keep a pointer too?)

-- 

Bruno Friedmann 
Ioda-Net Sàrl www.ioda-net.ch

 openSUSE Member & Board, fsfe fellowship
 GPG KEY : D5C9B751C4653227
 irc: tigerfoot

-- 
To unsubscribe, e-mail: opensuse-kernel+unsubscribe@opensuse.org
To contact the owner, e-mail: opensuse-kernel+owner@opensuse.org