[Bug 1170822] New: xfs related crash while building llvm
http://bugzilla.suse.com/show_bug.cgi?id=1170822 Bug ID: 1170822 Summary: xfs related crash while building llvm Classification: openSUSE Product: openSUSE Tumbleweed Version: Current Hardware: x86-64 OS: Other Status: NEW Severity: Normal Priority: P5 - None Component: Kernel Assignee: kernel-maintainers@forge.provo.novell.com Reporter: idonmez@suse.com QA Contact: qa-bugs@suse.de Found By: --- Blocker: --- While building (and sometimes while installing the build) I can reliably crash the kernel, here is the info & backtrace from crash(1):
crash /usr/lib/debug/boot/vmlinux-5.6.6-1-default.debug /boot/vmlinux-5.6.6-1-default.xz /var/crash/2020-04-29-14\:30/vmcore
crash 7.2.8 Copyright (C) 2002-2020 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [646MB]: patching 113108 gdb minimal_symbol values KERNEL: /boot/vmlinux-5.6.6-1-default.xz DEBUGINFO: /usr/lib/debug/boot/vmlinux-5.6.6-1-default.debug DUMPFILE: /var/crash/2020-04-29-14:30/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Wed Apr 29 14:30:45 2020 UPTIME: 01:23:51 LOAD AVERAGE: 10.83, 10.31, 10.11 TASKS: 273 NODENAME: havana RELEASE: 5.6.6-1-default VERSION: #1 SMP Wed Apr 22 04:15:55 UTC 2020 (c11f000) MACHINE: x86_64 (3491 Mhz) MEMORY: 15.9 GB PANIC: "kernel BUG at mm/filemap.c:1318!" PID: 170 COMMAND: "kworker/4:1" TASK: ffff97e5c73b9ec0 [THREAD_INFO: ffff97e5c73b9ec0] CPU: 4 STATE: TASK_RUNNING (PANIC) crash> bt PID: 170 TASK: ffff97e5c73b9ec0 CPU: 4 COMMAND: "kworker/4:1" #0 [ffffb12f80507af0] machine_kexec at ffffffffa966dce1 #1 [ffffb12f80507b48] __crash_kexec at ffffffffa974f9fd #2 [ffffb12f80507c10] crash_kexec at ffffffffa9750795 #3 [ffffb12f80507c20] oops_end at ffffffffa9634ed2 #4 [ffffb12f80507c40] do_trap at ffffffffa963149b #5 [ffffb12f80507c90] do_invalid_op at ffffffffa9631ee7 #6 [ffffb12f80507cb0] invalid_op at ffffffffaa000d68 [exception RIP: end_page_writeback+95] RIP: ffffffffa9819e4f RSP: ffffb12f80507d68 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97e5c42e0d48 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff97e8effe7000 RDI: 0000000000000000 RBP: ffffd4df47f80a80 R8: 0000000000000001 R9: ffff97e8effe7000 R10: 00000000000353c0 R11: ffffffffffffffc0 R12: 0000000000000000 R13: 0000000000000000 R14: ffff97e892522180 R15: ffffd4df47f80a80 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #7 [ffffb12f80507d70] iomap_finish_ioend at ffffffffa996125e #8 [ffffb12f80507dd8] iomap_finish_ioends at ffffffffa9961393 #9 [ffffb12f80507e08] xfs_end_ioend at ffffffffc08a646d [xfs] #10 [ffffb12f80507e40] xfs_end_io at ffffffffc08a6fdc [xfs] #11 [ffffb12f80507e78] process_one_work at ffffffffa96b9433 #12 [ffffb12f80507eb8] worker_thread at ffffffffa96b964d #13 [ffffb12f80507f10] kthread at ffffffffa96bfd09 #14 [ffffb12f80507f50] ret_from_fork at ffffffffaa0001ff Let me know if I can provide more information. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c1 --- Comment #1 from Ismail Dönmez <idonmez@suse.com> --- FWIW I can crash it with kernel-vanilla too, but the dump is from kernel-default. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c2 Anthony Iliopoulos <ailiopoulos@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ailiopoulos@suse.com Assignee|kernel-maintainers@forge.pr |ailiopoulos@suse.com |ovo.novell.com | --- Comment #2 from Anthony Iliopoulos <ailiopoulos@suse.com> --- Thanks for the report, Ismail. If you could place the kdump (either from kernel-default or vanilla) somewhere, and point me to a path to the debug rpms, I can have a look. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 Marcus Rückert <mrueckert@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mrueckert@suse.com -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c3 --- Comment #3 from Ismail Dönmez <idonmez@suse.com> --- I have put everything under /mounts/users-space/idoenmez/bsc-1170822 -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c4 --- Comment #4 from Anthony Iliopoulos <ailiopoulos@suse.com> --- (In reply to Ismail Dönmez from comment #3)
I have put everything under /mounts/users-space/idoenmez/bsc-1170822
thanks! I just need a chmod g+r /mounts/users-space/idoenmez/bsc-1170822/vmcore as it's currently owner-readable only. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c5 --- Comment #5 from Ismail Dönmez <idonmez@suse.com> --- (In reply to Anthony Iliopoulos from comment #4)
(In reply to Ismail Dönmez from comment #3)
I have put everything under /mounts/users-space/idoenmez/bsc-1170822
thanks! I just need a chmod g+r /mounts/users-space/idoenmez/bsc-1170822/vmcore as it's currently owner-readable only.
Oops, fixed! -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c8 --- Comment #8 from Anthony Iliopoulos <ailiopoulos@suse.com> --- Thanks Ismail, I still wasn't able to reproduce this locally unfortunately. Given that you can trigger this reliably on your machine, is there any chance you recall the approximate last kernel this wasn't an problem? We could try to maybe bisect and pinpoint the culprit, restricting the bisection to fs/iomap and fs/xfs (I still suspect the iomap writeback refactoring in v5.5-rc1, but I cannot verify since it doesn't reproduce on my hardware). The bisection between v5.4..v5.6.6 for fs/{iomap,xfs} seems to be just 7 steps. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c10 --- Comment #10 from Ismail Dönmez <idonmez@suse.com> ---
I can (un)fortunately can reproduce this still :-) I guess you want me to checkout kernel-source for our packages from github and do a git bisect, starting from v5.4?
FWIW I reproduced the bug and kdump somehow didn't kick in and the machine is now unreachable. It'll be offline until I can get it rebooted again. Meanwhile I'll wait for your answer to above. -- You are receiving this mail because: You are on the CC list for the bug.
http://bugzilla.suse.com/show_bug.cgi?id=1170822 http://bugzilla.suse.com/show_bug.cgi?id=1170822#c11 --- Comment #11 from Anthony Iliopoulos <ailiopoulos@suse.com> --- (In reply to Ismail Dönmez from comment #9)
I can (un)fortunately can reproduce this still :-) I guess you want me to checkout kernel-source for our packages from github and do a git bisect, starting from v5.4?
You mentioned this being reproducible on vanilla too, correct? In that case, once your machine is back online, I'd first check if this is reproducible still on our latest vanilla build. If you can still trigger it there, it would certainly indicate an upstream bug and would probably be easier to bisect directly on the mainline upstream tree, as our vanilla tree isn't updated on every upstream commit (so a single kernel-source/vanilla commit will contain multiple upstream commits). I suppose we can easily build a vanilla 5.4 or so in ibs and also check if this works or not, to have a starting point for a bisection (unless you remember any older specific version that this wasn't reproducible). I'd probably restrict the bisection to fs and mm (hopefully the bug is contained there) to minimize the steps. I wonder if this would be reproducible in a large VM on top of the same machine, so that you could automate the bisect and save you some time. Ping me if you want before you start bisection to discuss any details. -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1170822 https://bugzilla.suse.com/show_bug.cgi?id=1170822#c12 Anthony Iliopoulos <ailiopoulos@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |idonmez@suse.com Flags| |needinfo?(idonmez@suse.com) --- Comment #12 from Anthony Iliopoulos <ailiopoulos@suse.com> --- Shall we close this one, or is this still reproducible on current kernels? -- You are receiving this mail because: You are on the CC list for the bug.
https://bugzilla.suse.com/show_bug.cgi?id=1170822 https://bugzilla.suse.com/show_bug.cgi?id=1170822#c13 Anthony Iliopoulos <ailiopoulos@suse.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |NORESPONSE --- Comment #13 from Anthony Iliopoulos <ailiopoulos@suse.com> --- closing due to inactivity, please reopen if this is reproducible in current kernels. -- You are receiving this mail because: You are on the CC list for the bug.
participants (2)
-
bugzilla_noreply@novell.com
-
bugzilla_noreply@suse.com